In 2018, Amazon Web Services released the first-generation GR**ITON chip based on the ARM architecture, creating a precedent for cloud computing vendors to develop their own general-purpose processors.
In the following years, more and more cloud vendors have realized that enterprise customers' pursuit of computing power performance on the cloud is endless, and the only way to meet this demand is to comprehensively reshape computing power starting from the underlying chip.
And the gr**iton series chips have not stopped their own pace. At the 2023 Re:Invent Summit held not long ago, Amazon Web Services released the latest generation of self-developed processors GR**ITON4 and R8G instances based on GR**ITON4.
Compared to the previous generation, the GR**ITon4 has more cores, higher memory bandwidth, and a significant improvement in performance and energy efficiency.
At the same time, gr**iton4 is optimized for real-world workloads, rather than running scores, which can really improve the actual user experience.
By the end of 2023, Amazon Web Services has used more than 2 million gr**iton processors and launched more than 150 gr**iton-based cloud hosts, with more than 50,000 users.
Among them, the top 100 EC2 users are also using cloud hosts based on Gr**Iton processors.
Overall, the promotion is still relatively sincere.
Judging from the published data, the computing performance of gr**iton4 is improved by 30% compared with gr**iton3 as a whole. Among them, the performance of running MySQL databases is improved by 40%, and the performance of running large J**A applications is improved by 45%.
One to four generations gr**iton
It can be seen from the naked eye that gr**iton4, like gr**iton3, uses chiplet technology, which is also a highly recommended technical route for arm.
Jeff Barr, Chief Evangelist of Amazon Web Services, mentioned in his blog that GR**ITON4 uses 96 Neoverse V2 cores, each core has 2MB of L2 cache, and 12 DDR5-5600 channels of memory.
In contrast, the number of cores of the previous generation GR**ITON3 was 64, and this time it was directly increased by 50%. The memory bandwidth has been increased from 307GB to 5367GB s, an overall improvement of more than 75%. The L2 cache has also been increased from 1MB to 2MB, which has helped a lot in terms of performance.
The Neoverse V2 core, also known as the "Demeter" core, is based on the ARMV9 architecture and is targeted at the HPC, cloud computing, and hyperscale data center markets.
According to official ARM data, the IPC of Neoverse V2 has been increased by 40% compared to the Neoverse V1 core based on the ARMV8 architecture. Previously, GR**ITON3 and GR**ITON3E used the ArmV8 architecture of the Neoverse V1 core.
As you can see from the introduction of D**id Brown, vice president of Amazon EC2, the design of Gr**Iton is optimized for real-world workloads, not for running scores.
To illustrate what "optimize for real workloads" is, he shared a radar chart that seems complex, but actually very simple.
The radar chart lists the main parameters involved in the CPU microarchitecture, which is divided into two parts: front-end and back-end.
The front-end is mainly the instruction-related part, while the back-end is mainly the functional execution unit. Of course, there is also a command controller between the front and back ends, which is responsible for distributing the instructions decoded by the front end to the execution unit.
CPU microarchitecture requires close collaboration between the front and back ends. If the front-end is not efficient, it will cause the back-end execution unit to wait for new instructions, resulting in a performance bottleneck. If the backend is not efficient, the execution speed is too slow, and new instructions cannot come in, it will also lead to performance bottlenecks.
There are also values in the radar chart, and the lower the number, the less dependent the workload is on this parameter characteristic, which means that this parameter characteristic of the CPU has less impact on the overall performance.
In other words, a low number indicates that the processor is more efficient for that workload, and a higher number indicates that the workload is more dependent on it.
With this radar chart, CPU designers can optimize for real-world workloads, rather than based on benchmark benchmark test results.
The diagram above illustrates a scenario in a benchmark test that magnifies the impact of certain parameter characteristics.
For example, we see here that the L3 cache has a particularly high value, which results in a high value of back-end stalls. At this point, the backend can no longer accept new instructions, which will create a performance bottleneck.
The three radar charts on the right of this graph show Cassandra, Grovy, and Nginx, which are affected by different parameter characteristics when handling real-world workloads.
It can be seen that these applications are affected by several different parameter characteristics at the same time. If you want to optimize for real-world workloads, you have to find a way to lower these numbers.
For example, what gr**iton4 does with gr**iton3.
As shown in the figure above, gr**iton4 decreases in multiple dimensions when running MySQL, resulting in a 40% performance improvement. Of course, this also has a lot to do with the improvement of specifications such as memory bandwidth.
In any case, the last radar chart shows gr**iton4 for real-world workload optimization.
In addition to the improvement of specifications, the optimization of real-world workloads, and the increasingly sophisticated security threats, GR**ITON4 also has new content in terms of security.
GR**iton4 not only inherits the security features of the previous generation of processors, but also adds an encrypted high-speed hardware interface to protect the security and integrity of data.
At the same time, gr**iton4 also adds a new branch target identification (bti-branch target identification) function, which can be used to ensure that the branch jumps to the correct destination address, preventing malware from using the jump command in the branch to jump to a segment other than the intent, thereby improving the security of the system.
As in previous years, the new GR**ITON4 processor comes with the **EC2 host with the processor.
The first EC2 released in preview is R8G, which is a memory-optimized instance that triples the number of VCPUs and memory capacity of the previous generation R7G.
This makes R8G more advantageous in scenarios with large data sets, such as high-performance databases and big data analytics. The release of R8G provides better price/performance and energy efficiency for memory-sensitive workloads.
In 2018, Amazon Web Services released the first generation of self-developed gr**iton processors, and Amazon Web Services became the first cloud vendor to develop self-developed general-purpose server processors, which is also a landmark event for the Arm server camp to stand up again.
In order to expand the benefits, the more powerful GR**ITON 2 was released, and then the GR**ITON 3 continued to improve in terms of performance and energy efficiency.
In fact, last year, Amazon Web Services also released an upgraded version of gr**iton3 - gr**iton3e, which mainly optimizes the performance in floating-point arithmetic and vector arithmetic scenarios, which are capabilities that are only paid more attention to in the field of high-performance computing.
Now it seems that the gr**iton processor released by Amazon Web Services in 2018 and the corresponding cloud host have indeed had a great impact on the server market.
Today, the range of applications for gr**iton chips has been greatly expanded. This can be seen from the fact that Amazon Web Services TOP100 customers are also using gr**iton instances. For example, SAP has reduced costs by 35% with faster analysis and reduced carbon emissions by 45% after using the Gr**Iton service.
Amazon Web Services, which has launched 4 generations of chips in 5 years, has used gr**iton to explore a development path for cloud vendors' self-developed chips: each generation of chips brings higher performance, richer choices, and lower costs, so as to meet the endless demand of enterprises for computing power performance on the cloud.
end