Running AI on the CPU can be very fragrant .

Mondo Digital Updated on 2024-01-30

As the parameter scale of AI large models continues to expand, the demand for computing power is also increasing dramatically. In order to meet this demand, all walks of life are actively developing and building large-scale computing infrastructure, resulting in a variety of specialized AI acceleration chips in short supply, which is not only difficult to procure but also costly. As a result, some companies have turned their attention to the most popular hardware products at present - CPUs (** processors). Recently, the emergence of the fifth-generation Intel Xeon Scalable processor has once again seen in the industry that the use of CPU can also improve the efficiency of AI, and running AI on the CPU can also be very "fragrant".

The new mission of the CPU in the field of AI

It is understood that compared with training, AI inference has a relatively small demand for computing resources, and for some businesses or industries with light inference tasks, it is more cost-effective to choose CPU than professional AI acceleration chips. At the same time, since CPUs are the most popular hardware today, most enterprises are happy to take advantage of the deployment of more broad, CPU-based IT infrastructure and architectures to avoid the deployment challenges of heterogeneous platforms. Introducing AI acceleration into traditional architectures is the new mission of CPUs in this era.

The 5th Gen Intel Xeon Scalable processors came into being. The processor increases the number of cores to 64 and is equipped with 320MB of L3 cache and 128MB of L2 cache. Both the single-core performance and the number of cores have been significantly improved compared to the previous generation of processors. At the same power consumption, the average performance of the 5th Gen Xeon Scalable processors is increased by 21%, the memory bandwidth is increased by up to 16%, and the cache capacity is nearly 3 times higher.

At the same time, each core of the fifth-generation Xeon Scalable processor is equipped with AI acceleration function, which improves training performance by 29% and inference power by 42% compared to the previous generation.

In terms of AI load handling capabilities, the fifth-generation Xeon Scalable processors have also been significantly improved. Starting with the Xeon Scalable processor, Intel Advanced Matrix Extensions (Intel AMX) was introduced as a built-in AI acceleration engine, an innovation that enables CPUs to handle AI workloads more efficiently. The Intel** X-512 instruction set is also built into the Xeon 5, which, along with faster cores and faster memory, further improves AI performance, enabling generative AI to perform more workloads without the need for separate AI-specific accelerators. With a leap in performance in natural language processing inference, it better enables enterprises to better support the responsiveness of workloads such as bots, chatbots, text, language translation, and more. With the processor, developers can reason about and tune large language models with up to 20 billion parameters, with a response latency of less than 100 milliseconds when running models with less than 20 billion parameters.

Escort cloud service providers

The explosion of generative AI has brought new opportunities to the cloud computing industry, but also brought challenges. As large models require huge computing power, cloud vendors need to upgrade the computing power of data centers as soon as possible to meet AI needs, and continue to reduce TCO (total cost of ownership) to provide users with the most reasonable computing resources. In addition, AI application development also involves the cloud storage and use of a large amount of privacy-sensitive data, and cloud vendors need to upgrade the existing hardware infrastructure to ensure the safety and reliability of this data and dispel users' worries.

The fifth-generation Intel Xeon Scalable processors build a good ecosystem for cloud service providers from both software and hardware aspects. On the hardware side, the Intel SGX TDX solution provides end-to-end hardware-level protection capabilities for data in the cloud. On the software side, Intel has provided optimizations for fifth-generation Xeon Scalable processors in the industry-standard framework of PyTorch, TensorFlow, and OpenVino toolkits, enabling cloud vendors and users to quickly leverage processor functions such as Intel AMX and break through the computing bottleneck of AI applications with a low barrier to entry.

The fifth-generation Intel Xeon Scalable processors provide solid computing power support for cloud service providers. It not only reduces operational costs, but also provides a strong barrier for data security. More importantly, it optimizes AI application development, so that cloud service providers can also experience the "sweetness" of running AI on CPUs.

Enterprises start the "early adopter" mode

Intel CEO Pat Gelsinger said at the 2023 Intel ON Technology Innovation Conference: "In this era of rapid development of artificial intelligence technology and industrial digital transformation, Intel maintains a high sense of responsibility to help developers make AI technology ubiquitous, making AI more accessible, more visible, transparent, and trustworthy." ”

It is understood that 70% of inference runs in data centers today use Intel Xeon Scalable processors. With the birth of the fifth-generation Xeon Scalable processors, some companies have started the "early adopter" mode, and their products have also significantly improved in terms of AI performance.

11.During the 11th period, JD Cloud successfully coped with the surge in business volume through the new generation of servers based on the fifth-generation Intel Xeon Scalable processors, and compared with the previous generation of servers, the performance of the whole machine was increased by 123%, the AI computer vision inference performance was increased to 138%, and the LLAMA 2 inference performance was increased to 151%. Easily hold the pressure of a 170% year-on-year increase in peak user visits and more than 1.4 billion intelligent customer service consultations.

Based on the fifth-generation Intel Xeon Scalable processors, the third-generation Flex Compute instances of Volcano Engine have increased the computing power by 39% and the application performance by up to 43%. On the basis of performance improvement, Volcano Engine has built a million-core elastic resource pool through its unique tidal resource pooling capability, which can provide a pay-as-you-go experience at a similar monthly cost and reduce the cost of cloud migration.

With the built-in accelerators of 5th Gen Xeon Scalable processors, you can deliver an average of 10x faster performance per watt and consume as little as 105W of power, while running workload-optimized, energy-efficient SKUs.

Equipped with fifth-generation Intel Xeon Scalable processors and built-in Intel AMX and Intel TDX acceleration engines, Alibaba Cloud has created an innovative practice of "generative AI model and data protection", which has significantly improved the security and AI performance of the 8th generation ECS instances, and kept the instances** unchanged for customers.

This includes a 25% increase in inference performance, a 20% increase in QAT encryption and decryption performance, a 25% increase in database performance, and a 15% increase in audio performance.

If you think of a data center as a supercomputer, the CPU is its "brain". The 5th Gen Intel Xeon Scalable processors, as the "super brains", play a vital role in the efficient operation of data centers and the realization of AI applications.

The era of AI landing has begun, and the "spring" of CPU is coming. Author丨Shen Cong Editor丨Zhang Xinyimei Editor丨Maria Producer丨Lian Xiaodong

Related Pages