Advanced packaging chiplet technology and AI chip development.
Zhang Zhiwei, Tian Guo, Wang Shiquan.
Abstract:AI chips are integrated circuits that are specifically designed to accelerate artificial intelligence computing tasks. Over the past few decades, AI chips have undergone continuous evolution and breakthroughs, promoting the development of the field of artificial intelligence. The article describes the development history, mainstream technologies and application scenarios of AI chips, as well as the challenges and problems faced. Furthermore, the chiplet technology is proposed to integrate different functional modules into independent chiplets and fuse them on an AI chip to achieve higher computing power. The design not only allows individual modules to be developed and upgraded independently, but also allows them to be cleverly combined during the packaging process, allowing AI chips to continue to evolve with the continuous optimization of AI technology.
1 The history and current situation of AI chip development.
AI (Artificial Intelligence) chips are integrated circuits that are specifically designed to accelerate AI computing tasks. Over the past few decades, AI chips have undergone continuous evolution and breakthroughs, making great contributions to the development of the field of artificial intelligence.
1.1 AI chip evolution and major breakthroughs.
The history of AI chips can be traced back to the early 80s of the 20th century. The earliest AI computing tasks were completed by using general-purpose microprocessors for artificial intelligence computing, but due to the mismatch between computing requirements and general-purpose processor performance, the computing efficiency was not high. With the rapid development of the field of artificial intelligence, the demand for efficient computing is becoming more and more urgent, and the research of AI chips has gradually attracted attention. In the 90s of the 20th century, graphics processing units (GPUs) became the main accelerator of AI computing. GPUs excel at graphics rendering, but their architecture is not efficient for some specific AI computing tasks. However, the parallel computing power of GPUs has laid the foundation for the development of AI chips. With the rise of artificial intelligence, some specialized AI-accelerated hardware appeared in the late 20th and early 21st centuries, such as FPGAs (Field Programmable Gate Arrays) and ASICs (Application-Specific Integrated Circuits). These chips use customized architectures to better meet the needs of AI computing, but the high design and production costs limit their wide application.
In 2010, the rise of deep learning Xi promoted a major breakthrough in AI chip technology. The application of GPUs in deep learning Xi has been a great success, but in order to better adapt to the characteristics of deep learning Xi models, researchers have begun to explore new AI chip architectures. The advent of ASICs has further improved the performance and energy efficiency of AI computing, such as Google's TPU (Tensor Processing Unit) and NVIDIA's Tensor Cores.
1.2 Current mainstream AI chip technologies and their application scenarios.
At present, AI chip technology is showing a diversified development trend, mainly including the following types.
1) Graphics Processing Unit. GPUs became the mainstream accelerator of early AI computing due to their parallel computing capabilities. Modern GPUs excel in deep learning Xi training, and reasoning, and are widely used in computer vision, natural language processing, and other fields.
2) Tensor processing unit. TPU is a dedicated AI accelerator launched by Google that is specially optimized for tensor computing. TPU performs well in large-scale deep learning Xi model training and is widely used in cloud AI services.
3) Neural Processing Unit (NPU). NPU is a class of AI chips dedicated to neural network computing, which is widely used in smartphones and mobile devices to accelerate tasks such as image recognition and speech recognition.
4) Quantum chips. A quantum chip is a revolutionary AI chip that uses qubits to perform calculations. Although it is currently in its early stages, quantum chips have shown great potential to solve specific problems, such as optimization problems and cryptography.
5) Cranial nerve chips. Inspired by the structure of neurons in the human brain, neurochip development attempts to simulate connections and information transmission between neurons. Such chips have potential applications for simulating brain-inspired computing and intelligent machines.
Broadly speaking, chips that can run AI algorithms are called AI chips. CPUs, GPUs, FPGAs, NPUs, and ASICs can all execute AI algorithms, but there are huge differences in execution efficiency. CPUs can quickly perform complex mathematical calculations, but when performing multiple tasks at the same time, CPU performance begins to deteriorate, and it is currently basically confirmed in the industry that CPUs are not suitable for AI computing. The heterogeneous solution of CPU + XPU has become the standard configuration in large computing power scenarios, and GPU is the most widely used AI chip. At present, the types of AI chips widely recognized in the industry include GPUs, FPGAs, NPUs, etc. At present, mainstream AI chips are widely used in various fields, including but not limited to autonomous driving, intelligent voice assistants, medical image recognition, financial risk control, etc. With the continuous advancement of technology, the application scenarios of AI chips will be further expanded.
1.3 ChatGPT ignites the enthusiasm of the AI and semiconductor industries and capital markets.
According to a research report released by UBS, the number of monthly active users of ChatGPT has reached 100 million in January 2023, compared with the time it takes for the number of monthly active users of major popular platforms to exceed 100 million, ChatGPT only took 2 months (see Figure 1), becoming the fastest growing consumer app in history. In the capital markets, OpenAI, the research lab behind chatbot ChatGPT, is negotiating an existing stake in the form of a takeover offer, valuing the company at around $29 billion, making it one of the most valuable U.S. startups on the books with no revenue, according to people familiar with the matter. Domestic and foreign technology giants attach great importance to the technological wave caused by ChatGPT and actively deploy generative AI. At the same time, the global semiconductor capital market has also ushered in a significant **, with the Philadelphia Semiconductor Index having ** about 30% since January 2023 (see Figure 2).
1.4 GPU increment and market size in the short term.
Referring to the OpenAI algorithm, assuming that 100 million users per day, each person has 10 interactions, the length of each question is 50 words, and the computing power utilization rate is 30%, the daily demand of a single large language model (LLM) is expected to bring 2The increment of 130,000 A100 chips corresponds to a market size of 2$1.3 billion. Let's say 5 large companies launch such LLMs, the total increment is 1070,000 pieces.
A100 chip, corresponding to the market size of 10$700 million. Short-term server increment and market size: A single server contains 8 GPUs, so a single LLM brings 2 669 server requirements, corresponding to a market size of 3$3.9 billion, 5 large enterprises need a total of 13,345 units, corresponding to a market size of $2 billion. Long-term market space: Referring to Google, if there are 3 billion visits per day, it will take 106740,000 pieces of A100 chips, corresponding to 1330,000 servers DGX A100, bringing a market space of $20 billion. According to Verified Market Research, in 2020, the global GPU market size was 254$100 million (c. 1717.)200 million RMB). With growing demand, the global market is expected to reach $185.3 billion by 2027, growing at a CAGR of 3282%, as shown in Figure 3 (left). The independent GPU market size in Chinese mainland in 2020 was 47$3.9 billion, the share of GPU market manufacturers NVIDIA, INTEL, AMD, is %, as shown in Figure 3 (right), and it is expected to exceed 345 in 2027$5.7 billion.
1.5 Challenges and problems in the development of AI chips.
Although AI chips have made significant progress in the past few decades, they still face some challenges and problems in the process of their development.
1) Complex algorithms and models. With the emergence of complex algorithms such as deep learning Xi, it poses higher challenges to the computing power and storage requirements of AI chips. Some large-scale neural network models require massive computing resources to run efficiently, so how to achieve highly parallel and efficient computing in chip design is an urgent problem to be solved.
2) Energy consumption and heat dissipation issues. With the increase in the computing scale of AI chips, the problems of energy consumption and heat dissipation have become increasingly serious. High power consumption can cause the chip to heat up too much, which in turn affects computing performance and stability. Therefore, how to reduce energy consumption while ensuring performance and solve the heat dissipation problem is a difficult problem to be overcome in the development of AI chips.
3) Programmability and customization. Although general-purpose processors such as GPUs have certain applications in AI computing, their programmability is relatively weak and cannot fully meet the needs of various AI tasks. At the same time, although customized AI chips can provide more efficient computing performance, their development and production costs are high. How to find a balance between programmability and customization is an important topic in the development of AI chips.
4) Security and privacy issues. AI chips are widely used in smart devices and cloud services, but they also bring security and privacy issues. Some AI algorithms may face adversarial attacks, resulting in incorrect model outputs. At the same time, personal privacy protection has also become a major challenge for the application of AI chips.
5) International competition and policy constraints. Competition in the field of AI chips is becoming increasingly fierce, and many countries are increasing investment in technology research and development. In international competition, how to maintain technological leadership and deal with the policy restrictions on AI chip technology in different countries are all problems that need to be faced.
2 Overview of chiplet technology for advanced packaging.
2.1 Definition and characteristics of chiplet technology.
Chiplet is an advanced packaging technology that splits the functionality of a chip into multiple independent modules called chiplets. Each chiplet has a specific function, such as a processor core, memory controller, or other peripherals. These individual chiplets can be designed, tested, and produced separately and combined together during the packaging process to form a complete chip. This modular design makes chip development more flexible and scalable, while also improving production efficiency.
2.2 The main applications and development trends of chiplets.
Chiplet technology has a wide range of applications and good development trends in the modern semiconductor industry. One of the main application areas is high-performance computing, such as data centers and supercomputers. By combining multiple chiplets with specific functions, higher computing power and performance can be achieved. In addition, splitting the chip into multiple modules can also improve the reliability and maintainability of the overall chip. Another important application is in Internet of Things (IoT) devices and mobile devices. These devices often require the integration of multiple functions such as wireless communications, sensors, processors, and memory. By using chiplet technology, modules with different functions can be developed and upgraded independently, providing greater flexibility and scalability.
2.3 Comparison with traditional chip packaging.
Chiplet technology offers some significant advantages over traditional single-chip packaging. First, a higher overall chip integration can be achieved, as different modules can be combined over a smaller area. Second, the development cycle of the chip can be shorter (see Table 1) because the individual functional modules can be developed and tested simultaneously without waiting for the development of the entire chip to be completed. In addition, since different modules can be supplied by different manufacturers, a more diversified ** chain can be achieved (see Figure 4), which increases production efficiency and reduces costs. Chiplet technology is used to integrate chips from different wafer processes in different design houses into a system or subsystem.
3 AI chips combined with chiplets.
3.1 Chiplet solution to solve the problem of AI chip developmentWith the continuous development of AI applications, AI chips are facing some challenges, such as improved computing power, improved energy efficiency, and higher integration requirements. Among these challenges, chiplet technology can provide a solution. For example, the TSMC process and Xilinx's next-generation Virtex family of FPGAs (see Figure 5) are integrated on a silicon substrate. Higher computing power can be achieved by integrating different functional modules as independent chiplets on an AI chip. For example, the processor core, neural network accelerator, and memory controller are independent modules that can be developed and upgraded independently, while being combined during the packaging process to form a high-performance AI chip.
3.2 Case study of the combination of AI chips and chiplets.
GPU performance improvement and rich functions gradually meet the needs of AI computing. In 2010, NVIDIA proposed the FERMI architecture, which was the first complete GPU computing architecture, and many of the new concepts proposed in it are still used today. The Kepler architecture has a double-precision computing unit (FP64) on the hardware, and proposes GPU Direct technology, which bypasses the CPU system memory and directly interacts with other GPUs. The Pascal architecture uses the first generation of NVLink. The Volta architecture has begun to apply Tensor Cores, which is of great significance for AI computing acceleration. A brief review of the evolution of NVIDIA GPU hardware shows that the upgrade of basic features such as process and the increase in the number of computing cores continues to promote performance improvement, and the functional features included in each generation of architecture are also constantly enriched to better meet the needs of AI computing.
There are already some examples of actual AI chips combined with chiplet technology. An example of AMD's integration with Chiplet technology is AMD's adoption of a chiplet design in its Zen 2-based Ryzen 3000 series CPUs [6] . The design allows AMD to integrate more CPU cores into a single CPU. Similarly, AMD plans to apply chiplet technology to GPU design to address some of the challenges encountered in GPU manufacturing, such as lower yields and increased costs due to increased chip size. In this GPU's chiplet design, AMD uses a high-bandwidth interconnect (HBX) to facilitate communication between different chiplets, similar to the interconnect used in Zen 3 CPUs. This design solves the problem of parallelism being difficult to transfer across multiple chiplets in GPU computing workloads through a cross-connect known as HBX. This design makes it look like the CPU is communicating with a single large GPU, rather than with many small GPUs through a controller.
3.3 Prospects for the combination of AI chips and chiplets.
The combination of AI chips and chiplet technology will continue to develop and expand in the future. As AI applications continue to evolve, the need for higher computing power, lower power consumption, and higher levels of integration will continue to increase. Therefore, further improvement and development of chiplet technology, combined with AI chips, will be the direction of development in the future. In addition, as IoT devices become more widespread, the need for more flexible and scalable silicon solutions will increase. Therefore, combining AI chips with a variety of different chiplets to meet the needs of different IoT devices will become an important development direction in the future.
4 Conclusion. Chiplet technology is a modular approach to packaging that offers greater flexibility, scalability, and productivity. AI chips face several challenges, such as increased computing power, improved energy efficiency, and higher integration requirements.
In order to better develop the combination of AI chips and advanced packaging chiplet technology, the following suggestions are proposed.
1) Strengthen cooperation. Encourage cooperation between chip manufacturers, packaging technology vendors and research institutions to promote technology sharing and exchanges, so as to accelerate the development of AI chips and chiplet technology.
2) Technological innovation. We continue to invest in R&D and innovate advanced packaging chiplet technology to meet the increasing performance requirements of AI chips.
3) Standardization. Formulate relevant technical standards to ensure the interchangeability between chips and chiplets produced by different manufacturers, and promote the healthy development of the entire industry.
With the continuous expansion of artificial intelligence applications and technological advancement, the combination of AI chips and advanced packaging chiplet technology will be more widely used. This combination will not only be used in the field of high-performance computing, but will also be widely used in IoT devices, smartphones, and various other AI applications, bringing more convenience to people's lives and work.