Transforming AI Servers Explore innovative breakthroughs in interface interconnect chip technology

Mondo Technology Updated on 2024-02-05

According to Trendforce, AI servers shipped about 130,000 units, accounting for about 1% of total global server shipments. As major manufacturers such as Microsoft, Meta, and ByteDance launched generative AI-based products and services, the number of orders increased significantly. **Driven by continued demand for apps such as ChatGPT, the AI server market is expected to remain at 12. per year from 2023 to 2027, driven by continued demand for apps such as ChatGPT2% CAGR. Against this backdrop, the development of AI servers is particularly compelling.

The DGX H100 is the latest version of the NVIDIA DGX system released in 2022 and is also the core of the NVIDIA DGX Superpod. With 8 H100 GPUs and 640 billion transistors, the system delivers 6x the AI performance of the previous generation, especially in terms of the new FP8 accuracy. In addition, the DGX server can provide 900GB of bandwidth, demonstrating a significant increase in AI capabilities.

The DGX H100 server uses an IP network card, which can be used as both a network card and a PCIe expansion switch, which is PCIe 50 standard. In addition, the server includes CX7, which is provided as 2 cards, each containing 4 CX7 chips, and providing 2 800G OSFP ports. For GPU interconnect (H100), NVSower chips play a key role. Each GPU scales out 18 NVLints to achieve a bidirectional bandwidth of 50 GB s for each link, for a total of 900 GB of bidirectional bandwidth. These bandwidths are distributed across four built-in NVswitch chips, each corresponding to 4-5 OSFP optical modules. Each OSFP optical module uses 8 optical lanes with a transmission rate of 100 Gbps lanes, resulting in a total rate of 800 Gbps for high-speed data transmission.

A PCIe switch (also known as a PCIe hub) is a key component used to connect PCIe devices via the PCIe communication protocol. It overcomes the limitation of the number of PCIe lanes to a large extent by enabling multiple devices to connect to 1 PCIe port through expansion and aggregation capabilities. Today, PCIe switches are widely used in traditional storage systems and are becoming increasingly popular on a variety of server platforms, providing significant improvements in data transfer rates within the system.

Over time, advances in PCIe bus technology have meant a gradual increase in PCIe switch rates. Originally developed by Intel in 2001 as the third generation of IO technology"3gio"It was renamed in 2002 after evaluation by PCI-SIG"pci express"。PCIe 10 became an important milestone, supporting a transfer rate of 250MB s per channel and a total transfer rate of 25 gt/s。In 2022, PCI-SIG officially released PCIe 60 specification, increasing the total bandwidth to 64 GT s.

In an AI server, at least one retiming chip is required to ensure signal quality when the GPU and CPU are connected. Some AI servers choose to use multiple retiming chips, such as Astera Labs, which integrates four retiming chips into its AI accelerator configuration.

Currently, the PCIe retiming market has huge potential, with three leading brands and many potential competitors. Currently, Parade Technologies, Astera Labs, and Montage Technology are the major players in this booming market. It is worth noting that as an early adopter of PCIe deployments, Montage Technology is the only one in Chinese mainland capable of mass-producing PCIe 40 re-timed ** quotient. In addition, Montage Technology is available in PCIe 5Steady progress has also been made in the development of zero-retiming.

In addition, chip manufacturers such as Renesas, TI and Microchip are also actively involved in the development of PCIe retiming products. According to official information, Renesas offers 2 PCIe 30 retiming products, 89HT0816AP and 89HT0832P respectively. TI offers a 16Gbps 8-lane PCIe 40 Retiming Product - DS160PT801. In addition, in November 2020, Microchip Technology launched the XpressConnect family of retiming chips, which are designed to enable PCIe 50 at a 32gt s rate.

The world's major chip manufacturers attach great importance to the promotion of high-speed connectivity. Among them, Nvidia's NVLink, AMD's Infinity Fabric, and Intel's CXL have all made important contributions.

NVLink is a high-speed interconnect technology developed by Nvidia. It is designed to accelerate the data transfer rate between CPUs and GPUs and GPUs, improving system performance. From 2016 to 2022, NVLink has undergone many upgrades, and it has developed to ***In 2016, NVIDIA launched the first generation NVLink with the release of the Pascal GP100 GPU. NVLink uses high-speed signal interconnection (NVHS) technology, which is mainly used for signal transmission between GPUs and between GPUs and CPUs. GPUs are transmitted in the form of NRZ (non-return-to-zero) encoded with differential impedance electrical signals. The first-generation NVLink single-link achieves a bidirectional bandwidth of 40Gbs, and a single chip can support 4 links, with a total bidirectional bandwidth of 160GbS.

NVLink technology has gone through multiple iterations to drive innovation in high-speed interconnects. In 2017, the second-generation NVLink was launched based on the Volta architecture. It achieves a bidirectional bandwidth of 50Gbs per link, and each chip supports 6 links, with a total bidirectional bandwidth of 300GbS. In 2020, the third generation based on the Ampere architecture was released, with a total bidirectional bandwidth of 600 GB s. In 2022, *** based on the hopper architecture was launched. This iteration shifted to using PAM4-modulated electrical signals, maintaining 50Gbs of bidirectional bandwidth per link, and supporting 18 links per chip for a total bidirectional bandwidth of 900Gbs.

In 2018, Nvidia introduced the initial version of NVSowerv, which provides a solution to enhance bandwidth, reduce latency, and facilitate communication between multiple GPUs within a server. The first-generation NVCoon is manufactured using TSMC's 12nm FinFET process and has 18 NVLink 2s0 interface. By deploying 12 NVConos, 1 server can accommodate and optimize the interconnection rate between 16 V100 GPUs.

At present, NVSow has developed to the third generation and is manufactured using TSMC's 4N process. Each NVSower chip is equipped with 64 NVLink 40 port, so that the communication rate between GPUs reaches 900 Gb s. GPUs interconnected via NVLink Switch can collectively operate as a high-performance accelerator with deep learning capabilities.

The development of interface interconnect chip technologies such as PCIe chips, retiming chips, and NVSowers has greatly enhanced the ability to interact with CPUs and GPUs, as well as between GPUs. The interplay of these technologies highlights the dynamic landscape of AI servers, contributing to the advancement of high-performance computing.

Related Pages