An in-depth inventory of the four mainstream computing chips: CPU, GPU, ASIC, and FPGA, who will become the king of AI computing power.
The Evolution of Computing Tools and the Improvement of Social Productivity
From knotted rope counting to the advent of electronic computers, the development of computing tools has gone through a long history. The progress of computing tools has directly promoted the development of social productive forces. The more powerful the computing tools, the faster it takes to solve complex problems, which in turn accelerates the growth of social productivity. On the other hand, the popularization of the abacus played a non-negligible role in the economic prosperity of ancient China, and the invention of the steam engine directly led the wave of the industrial revolution.
Computing chips: the key driving force for the development of science and technology.
Every revolution in human industry has been accompanied by the quest for more powerful computing power, from Charles Babbage's concept of the "analytical machine" to Hermann Holles's mechanical calculator. The rapid development of electric power and electrical technology has created the conditions for the birth of the world's first computer, which has brought about a great leap in science and technology for more than half a century.
Today, the convenient life we are accustomed to is inseparable from the contribution of computing chips. From mobile phones to computers, from local to cloud, ubiquitous computing chips provide us with a steady stream of power.
Mainstream AI Computing Chips: Features and Functions.
Nowadays, mainstream AI computing chips are mainly divided into three categories: CPU, GPU, and ASIC, each with its own unique computing characteristics and functions.
1.CPU: The king of traditional general-purpose computing.
The CPU, or processor, is the heart of a computer. It operates according to the von Neumann architecture and consists of the main components such as combinators, controllers, and memory. The data is stored in memory, and the controller takes the data from the memory and hands it to the combinator for calculation, and then returns the result to the memory after the calculation is completed.
CPUs are characterized by being versatile and can handle various types of computing tasks, but their computing efficiency is not as high as that of chips designed specifically for specific tasks.
2.GPU: Graphics powerhouse.
GPUs, or graphics processing units, were originally used to speed up graphics rendering. In recent years, GPUs have excelled in areas such as deep learning and have been widely used in AI computing.
GPUs are characterized by having a large number of parallel computing units that can process a large amount of data at the same time, making them highly efficient in parallel computing tasks. However, GPUs are not as versatile as CPUs and are only suitable for specific types of computing tasks.
3.ASIC: The epitome of dedicated chips.
An ASIC, or application-specific integrated circuit, is a chip designed for a specific task. It implements algorithms in hardware to achieve extremely high computational efficiency and energy efficiency in specific tasks.
ASICs are highly targeted and only suitable for specific tasks, but they are far more computationally efficient and energy-efficient than CPUs and GPUs.
AI computing chips are the key driving force for the development of science and technology, and they are also the foundation of artificial intelligence. With the continuous development of AI technology, the demand for computing chips will continue to grow. In the future, AI computing chips will be further optimized and play a role in more fields. Looking back at human industrial processes, each industrial revolution has driven the quest for more powerful computing tools. In the first industrial revolution, Charles Babbage proposed the concept of "analysis machine", although it was not fully realized, but laid the foundation for the development of mechanical computing equipment. Later, Hermann Holles developed a mechanical calculator that could perform different calculations, which injected new vitality into the development of mechanical computing devices. The second industrial revolution and the great development of electric power and electricity created the conditions for the emergence of the world's first electronic computer, which brought about more than half a century of scientific and technological development. To this day, we are still enjoying the benefits of increasing computer performance. From mobile phones to computers, from local to cloud, ubiquitous computing power is always providing services for us, and today's life can be so convenient that it is inseparable from this small computing power chip. On this topic, let's take a closer look at the current mainstream AI computing chips, what are the differences in computing characteristics and functions?
CPU: The king of traditional general-purpose computing. Today, we all know that the CPU is the heart of the computer, but many people don't know exactly how the CPU runs. CPU, the full name of Central Processing Unit, is the ** processor. The invention of the modern electronic computer is based on the von Neumann architecture, which was born in the 1940s, which is mainly composed of five main parts: combinator (also called logic operation unit, ALU), controller, memory, input device, and output device. According to the von Neumann architecture, when data comes, it is put into memory first. The controller then takes the corresponding data from memory and hands it over to the combinator for calculation. After the calculation is complete, the result is returned to memory. The general architecture is shown in Figure 1, where the combinator and the controller make up the main functions of the CPU.
The limitations of the von Neumann architecture and its CPU requirements.
The von Neumann architecture adopts the serial operation method, that is, only one computing task can be performed at a time, resulting in the performance of personal computers cannot keep up with the needs of application software development, especially after the emergence of graphics window operating systems, the explosion of application development, put forward higher requirements for the application performance of personal computers.
The birth of GPUs and their revolutionary impact on graphics processing.
In 1999, NVIDIA introduced the industry's first GeForce 256 graphics card, the GPU, for graphics processing. The introduction of GPUs has changed the way graphics are processed on personal computers, enabling high-quality 3D graphics rendering for applications such as high-definition ** and large-scale games.
Features of GPU architecture and how it differs from CPU.
The GPU architecture is designed entirely for 3D graphics processing, with a large number of compute units and fewer control units, whereas the CPU has fewer compute units and more control units. GPUs win on scale, while CPUs rely on computing power and instruction sets to handle complex computing needs.
The importance of GPUs in the field of AI computing.
GPUs have unique advantages in the field of AI computing and are an important hardware foundation to support the training and inference of large AI models. The parallel computing power and high throughput of GPUs make them ideal for processing large-scale data and computation, which can significantly improve the training and inference speed of AI models.
Application scenarios of GPUs.
In addition to graphics processing for personal computers, GPUs are also widely used in 3D design and engineering** for high-end workstations, as well as in AI computing.
1 The calculation method above the von Neumann architecture calculates a complete processing process from data input to output, and the von Neumann system adopts the serial operation method. That is, only one computing task can be performed at a time, and the next instruction can be started only when the previous calculation instruction is completed and the data is stored. This is like queuing to enter the station, there is only one entrance and checkpoint, and only one queue is allowed, if there are many people entering the station, you need to wait for a long time to complete the pit stop, which is the first-in-first-out operation mode adopted by the CPU. Since the birth of personal computers, the hardware architecture and instruction set of CPUs are designed based on serial computing mode, and its advantage is good logic control, that is, the computing versatility is very good, in order to cope with various complex computing needs, in the era when the software does not have high requirements for computing performance, this design is advantageous. In the 80s of the last century, personal computers were mainly used for simple electronic processing and document printing, but in the 90s, with the rapid development of the Internet and the emergence of the graphics window operating system, there was a big explosion of application development. The author came into contact with personal computers in 1998, and the biggest feeling at that time was "slow", whether it was opening a web page or playing online games, there were frequent lags. It may also be that the configuration of the Internet café at that time was low, but at that time, the mainstream computer configuration used the Pentium 1, the running speed was only 60MHz, and the highest computer configuration was the Pentium 2, with a main frequency of 450MHz. Overall, CPU performance at the time was actually not keeping up with the development of application software. At the beginning, Intel should also be in a hurry to change the status quo, but limited by the level of manufacturing technology, it was very difficult to greatly improve CPU performance at that time, until 2000, when the Pentium IV was launched, and the main frequency reached 15GHz, the lack of performance has been alleviated. Until then, the only thing that would be feasible was to hand over the graphics computing of the computer and let the CPU do the application. At this time, NVIDIA seized the opportunity to take on the task of graphics processing, and in 1999 launched the industry's first GeForce 256 graphics card, which is what we call a GPU today, specializing in graphics processing. Perhaps Intel never expected that 20 years later, this little brother who had taken over the "leftovers" of computing power for himself had already surpassed himself and rode away.
GPUs are the leaders in high-performance computing
GPU, English for Graphics Processing Unit, also known as display core, visual processor, display chip. The core GPU technologies include a dual textured quad-pixel 256-bit rendering engine, cubic environment material mapping and vertex blending, hardware T&L (geometry conversion and lighting), texture compression, and bump mapping mapping. GPUs are processors that are created and tuned specifically for processing graphics data. In addition to being used as the core of the discrete graphics card for personal computers, providing high-quality 3D graphics rendering for high-definition** and large-scale games, professional graphics cards built on GPUs are also configured to do complex 3D design and engineering on high-end workstations**. However, the most important application scenario of GPUs is AI computing, which supports the training and inference of large AI models. So why can't the CPU work, it has to be the GPU? As mentioned earlier, the introduction of GPUs is to take over the graphics display processing that was originally responsible for the CPU. Therefore, the GPU architecture has its innate computing characteristics, that is, it is completely designed for 3D graphics processing, that is, it provides computing for a large number of real-time graphics and image display under the control instructions of the CPU. Since it is arranged by the CPU, there are relatively few control units for the GPU and a large number of computing units, as shown in Figure 2. If the CPU is a lone hero and can take charge on its own, then the GPU is a myriad of minions that rely on scale to win.
GPU vs. CPU architecture.
CPUs and GPUs differ in architecture. CPU instructions are complex, requiring resource scheduling, interrupt handling, memory management, etc., and there are many logical controls in the operation process, a large number of control units, limited number of computing units, and large performance limitations. GPUs do not require too many control units, and the chip space is reserved for computing units, which is suitable for parallel computing and large-scale data access, with high bandwidth and low latency.
GPU performance advantages.
A single addition and subtraction operation is done by one person and 1000 people at the same time, and obviously the latter is faster. GPU has strong parallel computing capabilities and is suitable for large-scale data processing, especially in graphics display, and GPU performance far exceeds that of CPU.
The rise of AI computing.
The application of artificial intelligence continues to deepen, and AI large model training and inference computing have become the mainstream of high-performance computing. GPUs occupy an advantage in the field of artificial intelligence computing and have become the first choice for major AI applications.
ASIC Challenges.
ASICs are a strong contender for GPUs. ASICs are designed for specific tasks and have a high energy efficiency ratio, outperforming GPUs in certain application areas.
Google TPU v5P
Google has released a multi-modal large model, Gemini, in which the Gemini Ultra version surpasses OpenAI's GPT-4 in some tests. At the same time, Google also released TPU V5P, which is claimed to be the strongest AI self-developed chip at present. The TPU is designed for tensor computing and is an "AI processing unit".
GPUs have had great success in the field of AI computing, but the emergence of ASICs has challenged their position. With the continuous development of AI applications, the competition between GPUs and ASICs will become more fierce in the future. 2 CPU and GPU architecture comparison, because the CPU instructions are relatively complex, it needs to do a good job in resource scheduling and control, support the operating system's interrupt processing, memory management, IO processing, etc., the operation process requires a large number of logical control, so there are more internal control units, which greatly squeezes the number of computing units, so that the computing performance is greatly limited, and it is also necessary to reserve space to establish a multi-level cache for data; GPU operations do not need to be considered for this, nor do they require too many control units, and most of the space on the chip is reserved for computing units, so it is suitable for parallel computing tasks and large-scale data access, usually with higher bandwidth and lower latency. Just imagine, there is a calculation task that requires 1000 addition and subtraction operations, do you say that one person can do it faster, or let 1000 people each calculate an addition and subtraction operation faster, the result is obvious. How to understand the work done by the GPU? Another example: now the resolution of computer monitors is getting higher and higher, in the case of 4K displays, the resolution has reached 3840*2160, that is, 8,294,400 pixels, according to RGB three-color display, the bytes of a single pixel have reached 24bit, that is, the display has to be refreshed once to process up to 19.9 billion bits. If you multiply it by the display refresh rate, the lowest refresh rate of high-end displays is 120Hz, and the GPU alone has to process 24 billion bits in one second. It can be seen that the graphics display processing alone has high requirements for computing performance. If you rely entirely on the CPU for processing, even the most powerful performance can be overwhelmed and severely slow down the computing efficiency of normal software applications. Of course, the current CPU has also developed multi-core and multi-threading, Intel's latest Xeon processor has 64 cores and 128 threads, but compared to Nvidia H100's 18432 CUDA cores, it is still a drop in the bucket. With the deepening of artificial intelligence applications, providing training and inference operations for AI large models has become the mainstream of high-performance computing development at present and in the future. Since NVIDIA started the AI computing layout more than a decade ago, GPUs have become the first choice for major AI applications. The CPU is limited by architectural issues and has become a supporting role in this AI competition, but despite the unlimited scenery of GPUs, there is still a strong competitor, and it is ASIC!
ASIC – GPUs face the strongest competitionIn December last year, Google officially announced the multimodal large model Gemini, which contains three versions, of which the Gemini Ultra version even completely defeated OpenAI's GPT-4 in most tests. At the same time, it also dropped another bombshell - TPU V5P, which is known as the most powerful AI self-developed chip now. TPU stands for Tensor Processing Unit. A tensor is a mathematical entity that contains multiple numbers (multidimensional arrays). At present, almost all machine learning systems use tensors as the basic data structure. Therefore, the tensor processing unit can be simply understood as the "AI processing unit".
Nowadays, domestic and foreign manufacturers such as ICG, Cambrian, Horizon, Alibaba, Intel, and NVIDIA are launching their own dedicated AI chips, including APU, DPU and other ASIC chips. Thanks to its small size, low power consumption, and high computing performance, ASIC chips are regarded as the most powerful challengers to GPUs.
Google TPU: Strong performance, designed for the large model Gemini.
Google TPU V5P is a dedicated AI chip released by Google last year, with strong performance comparable to NVIDIA's top-of-the-line H100 graphics card. It is based on Google's multi-modal large model Gemini, which is mainly used for its own products and services, rather than for external sales.
Features of TPU V5P include:
Each module has 8,960 chips, up from 4,096 in the previous generation of V4.
The throughput capacity is up to a staggering 4800Gbps.
In terms of memory and bandwidth, it has 95GB of high-bandwidth memory (HBM), which is far more than the 32GB of TPU V4.
When it comes to training large-scale language models, TPU V5P is particularly outperforming, with four times the performance of the A100 GPU.
FPGA: The best companion for CPU intelligent computing.
FPGA is a programmable logic gate array, which is a half-gap circuit chip compared with ASIC chips, which makes up for the shortcomings of full-gap circuit chips and overcomes the shortcomings of the limited number of gates of the original programmable devices.
Features of FPGAs include:
The user's algorithm can be implemented directly with the transistor circuit, without the need for translation through the guidance system.
Computationally more efficient, lower power consumption, and closer to IO.
Power consumption is typically between 1000-2000 watts, which is more energy-efficient than the 3000-8000 watts of a GPU.
The characteristics of FPGA determine that it is the best companion for CPU intelligent computing. It can form a cooperative working mode with the CPU, which can reduce the burden on the CPU and improve the computing speed, and the FPGA can be quantized according to the needs of the algorithm, which has higher computing efficiency.
ASIC vs. GPU: Each has its own advantages and disadvantages.
ASICs and GPUs are non-competitors and have their own advantages and disadvantages, respectively.
The computing power and computing efficiency of the ASIC chip can be quantified according to the needs of the algorithm, and it has high performance. However, the algorithm is fixed and may not be usable once the algorithm changes.
GPUs can run a wide variety of algorithms for greater flexibility. But its performance is not as good as that of ASIC chips.
Therefore, when choosing ASIC chips and GPUs, decisions need to be made based on specific needs. If high-performance dedicated AI chips are required, then ASIC chips are the first choice. If you need a more flexible chip, then a GPU is a better choice.
ASIC chips and FPGAs, as emerging chip technologies, are becoming the most powerful challengers to GPUs. They have broad applications in artificial intelligence, big data processing and other fields, and have bright prospects. 3 Google TPU Google TPU is a chip developed based on ASIC special chips and specially customized for a specific need. The computing power and computing efficiency of ASIC chips can be customized according to the needs of the algorithm, so ASIC has the following advantages compared with general-purpose chips: small size, low power consumption, high computing performance, high computing efficiency, and the larger the chip shipment, the lower the cost. However, the algorithm of the ASIC chip is fixed, and once the algorithm changes, it may not be available. With the continuous emergence of artificial intelligence algorithms, how to adapt to various algorithms is the biggest problem for ASIC special chips, if it is like GPU, through the architecture to adapt to various algorithms, then ASIC special chips will become the same general chips as CPUs and GPUs, and there will be no advantages in performance and power consumption. This means that playing AISC chips requires strong strength, including strong funds, strong technical strength and rich application scenarios. Please note here that when Google released the TPU V5P last year, it also released the multimodal large model Gemini, which has powerful functions in the fields of image, audio, ** and text. Unlike NVIDIA's open GPU purchase strategy, Google's high-end TPUs are mainly used for its own products and services, which is the key. In other words, Google's high-end TPU is a dedicated AI chip developed based on its own multi-modal large model Gemini, which is the best in its own products and services, and its performance is not inferior to GPU. According to publicly available information, TPU V5P has 8,960 chips per module, an increase from the 4,096 of the previous generation V4, and the total number of floating-point operations (FLOPS) in each block has increased by four times, and the throughput capacity has reached an astonishing 4,800Gbps. The new architecture is even better in terms of memory and bandwidth, with up to 95GB of high-bandwidth memory (HBM) far exceeding the 32GB of TPU V4. According to official data, Google's TPU V5P has four times the performance of the A100 GPU in training large-scale language models, which is not inferior to NVIDIA's top-of-the-line H100 graphics card. Of course, this is only a test done by Google based on its own Gemini model, and it must have been optimized and matched in the development stage. However, it shows the strong performance of TPU V5P, and it can also show that ASIC chips have the slightest advantage in the application of AI large models. Up to now, Google, Intel, and Nvidia have successively released APU, DPU and other ASIC chips, and domestic ICG, Cambrian, Bitmain, Horizon, Alibaba, etc. have also launched ASIC chips accelerated by deep neural networks. At present, GPU has a wide range of applications and the market is very mature, but it does not mean that other chips have no opportunities, and the development momentum of ASIC is still very strong, and it is becoming the most powerful challenger of GPU.
FPGA: The best companion for CPU intelligent computing.
FPGA stands for Programmable Gate Array, and its internal structure is composed of a large number of digital (or analog) circuits, which can implement various functions. FPGA is a kind of ASIC, but ASIC is a fully customized circuit chip, and FPGA is a semi-custom circuit chip, which solves the shortcomings of customized circuits and overcomes the shortcomings of the limited number of gates of the original programmable devices. There are two ways of data computing: one is to use the CPU or GPU instruction-based architecture to write the software required for computing, and the other is to design and manufacture a set of special circuits, such as ASIC and FPGA, for specific computing needs. The difference is that FPGAs are programmed using a hardware description language, which describes the logic that can be compiled directly into a combination of transistor circuits. So the FPGA actually implements the user's algorithm directly with the transistor circuit, without the translation through the instruction system. Compared with CPU and GPU data processing, which need to read instructions and complete instruction decoding first, FPGAs do not use instructions and software, and are devices that integrate software and hardware. The result is higher computational efficiency, lower power consumption, and closer to IO.
FPGA: A semi-customized chip that breaks through the bottleneck of AI computing power.
FPGAs (Field Programmable Gate Arrays) are uniquely positioned for deep learning applications with bit-level custom structures, pipelined parallel computing, and efficient energy consumption. In particular, FPGAs can be reprogrammed or upgraded even after the chip is manufactured.
FPGA vs CPU: Who is Faster and More Energy Efficient?
For example, a 3GHz CPU and a 200MHz FPGA require 30 clock cycles for the CPU to complete a specific operation, while the FPGA only needs one. Comparison of time consumption: The CPU is 10 nanoseconds, and the FPGA is only 5 nanoseconds, and the FPGA speed is better.
In terms of energy consumption comparison, FPGAs also perform well. To perform a deep learning operation, the CPU consumes 36 joules of energy, and the FPGA only needs 10 joules, with an energy saving ratio of up to 35 times.
AMD competes with Intel for the FPGA market.
In 2022, AMD acquired Lingsi, a veteran in the FPGA field, to complete the strategic layout of "CPU + GPU + FPGA" intelligent computing power. However, due to Intel's failure to compete with NVIDIA in the GPU field, it can only choose to acquire Altera, the second leader in the FPGA field, to form a "CPU + FPGA" intelligent computing combination.
FPGAs are unique.
Bit-level fine-grained custom structure.
Pipeline parallel computing capabilities.
Efficient energy consumption. Flexible architecture for model optimization**.
Chips can still be reprogrammed or upgraded after they have been manufactured.
In the AI era, the demand for computing power is endless.
Mainstream AI chips include:
General-purpose chips (represented by GPUs).
Dedicated chips (represented by ASICs).
Semi-customized chips (represented by FPGAs).
At present, the GPU market is the most mature and widely used. However, AI development is still in its infancy, and the demand for computing power is endless. Therefore, both ASIC and FPGA have a great chance of breaking the game in the future. 4 FPGA chip For example, if you take a CPU with a frequency of 3GHz and an FPGA with a frequency of 200MHz to do the operation, if you do a specific operation, the CPU needs 30 clock cycles, and the FPGA only needs one, then the time consumption is as follows: CPU: 30 3GHz 10ns; fpga:1/200mhz =5ns。That said, FPGAs do this particular computing faster than CPU blocks and can help speed up. In addition to higher computing efficiency, some organizations have compared the energy consumption of FPGA and CPU when executing deep learning algorithms. In a deep learning operation, using the CPU consumes 36 joules, while using the FPGA only consumes 10 joules, achieving 3About 5 times the energy saving ratio. By using FPGA acceleration and power saving, it is easier to run deep learning real-time computing on the mobile terminal. That's why AMD and Intel have gone to great lengths to acquire FPGA vendors. In terms of intelligent computing route layout, AMD is actually more complete than Intel. In the traditional CPU era, AMD was authorized by Intel's X86 architecture and became a CPU business in parallel with Intel, and then entered the GPU track through the acquisition of graphics card manufacturer ATI, becoming NVIDIA's biggest competitor, and by 2022, through the acquisition of the old Dalinx in the FPGA field, AMD finally completed the strategic layout of "CPU + GPU + FPGA" intelligent computing power. However, Intel failed to compete with NVIDIA in GPU for supercomputing products, so it could only choose to acquire Altera, the second leader in the FPGA field, in 2015, and finally formed a "CPU + FPGA" intelligent computing combination, although it is not preferred, but it does open up a new route for the development of new intelligent computing. Compared with CPUs and GPUs, FPGAs have unique advantages in deep learning applications due to their bit-level fine-grained custom architecture, pipelined parallel computing capabilities, and efficient energy consumption, and have great potential for large-scale server deployments or resource-constrained embedded applications. In addition, the flexibility of the FPGA architecture allows researchers to optimize models beyond fixed architectures such as GPUs**. Especially with FPGAs, even if the chip product has already been manufactured, it can still be reprogrammed or upgraded.
Write at the end
From the IBM Deep Blue computer defeating world champion Kasparov in a chess game in 1997, to Google Alphago's victory over Lee Sedol in a Go match in 2016, including later slashing world champion Ke Jie. Over the years, we have been struck again and again by the powerful capabilities of artificial intelligence. Especially in the past two years, the rapid development of AI applications has made us truly feel that the era of artificial intelligence has arrived. However, it is also recognized that the development of artificial intelligence is still in its infancy, and the demand for computing power for AI applications is endless. At present, there are three types of mainstream AL chips: general-purpose chips represented by GPUs, special-purpose chips represented by ASIC customization, and semi-customized chips represented by FPGAs, among which the GPU market is the most mature and widely used. However, in addition to powerful computing power, the development of the artificial intelligence industry also needs better algorithms and huge data support, and it is still unknown whether GPUs can continue to maintain their advantages in the competition of AI computing power. In our opinion, ASICs endorsed by Google and Huawei, as well as FPGAs endorsed by Intel and AMD, have a great chance of breaking the game in the future.
- What are your thoughts on this? -
- Welcome to leave a message** and share in the comment area. -