Key takeaways
In the 2010s, under the background of the rise of the mobile Internet and the cloud migration of enterprises, cloud vendors gradually surpassed telecom and enterprise customers to become the most important demander in the global optical module market, bringing many far-reaching changes to the optical module industry. In this report, we sort out the changes brought by the rapid development of cloud computing to the optical module industry in the past ten years, analyze the reasons behind them, and make an outlook on the development of the optical module industry in the AI era (unless otherwise specified, the "optical module" mentioned in this report refers to Ethernet optical modules, which are different from coherent optical modules used in long-distance transmission). We think:
The demand characteristics of cloud vendors for optical modules point to the traffic characteristics of their data centers. Due to the introduction of technologies such as distributed computing and virtualization, cloud data centers have higher east-west traffic than traditional data centers. Driven by the rapid development of the mobile Internet and the cloud-based migration of enterprises, the east-west traffic of global data centers has experienced rapid growth. Compared with traditional data centers, cloud vendors are promoting the improvement of internal bandwidth in data centers at a faster pace, by: 1) introducing higher-rate server network cards, switches, optical modules, etc., which is a landmark event for cloud data centers in 2015 and 2016 to start the deployment of 100G Ethernet; 2) The internal network architecture of the data center has shifted from the traditional three-layer architecture to the leaf-spine architecture, which further increases the demand for optical modules. In 2022, Google, Amazon, and Meta will account for more than 85% of the global demand for 200G and above optical modules. On the other hand, domestic optical module manufacturers have gradually entered the first-class chain system of the world's leading cloud manufacturers by virtue of their own cost advantages, R&D capabilities, delivery capabilities and rapid response capabilities to meet customer requirements, and have greatly improved the status of high-quality customers in the global optical module market.
Looking forward to the development of the optical module industry in the AI era, we believe that there will be the following similarities and differences with the cloud computing era: Similarities: Larger east-west traffic will release the demand for high-speed optical modules. Under the guidance of scaling law, the number of parameters of large AI models continues to rise, resulting in a significant increase in the traffic of communication between GPUs (such as ALL Reduce) in the AI training network. At the same time, based on the pursuit of GPU utilization, the latency requirements for card-to-card communication have also been further improved. Cloud vendors improve the communication bandwidth between GPU cluster nodes in the following ways: 1) improve port throughput, for example, NVIDIA uses CX-7 network cards in DGX H100 servers, with a rate of 400G, which is a major upgrade compared with the 50G and 100G network cards used in the mainstream of cloud vendors' general-purpose servers, and introduces 800G end** replacement, 800G optical modules, etc.; 2) Optimize the networking architecture, for example, NVIDIA's AI training network adopts the fat tree architecture, which has a smaller blocking ratio than the leaf ridge architecture, and according to the calculations in our text, the demand for 800G optical modules under this architecture is linearly related to the scale of H100.
Differences: 1) The driving force behind the iteration cycle and ramp-up rhythm of high-speed optical modules changes. In the era of cloud computing, optical modules are mainly deployed in the front-end network in the data center, and the bandwidth required by the front-end network is largely determined by the needs of the user. In the AI era, a large number of high-speed optical modules are deployed in the training network, because the training network is a back-end network, it does not have a direct connection with the user side, and the bandwidth is determined by factors such as the workload and latency requirements of the device side, so the ramp-up rhythm of optical modules is faster. In terms of iteration cycles, with the growth of Nvidia's GPU products (B100 in 2024; X100 in 2025) is expected to increase the speed upgrade cycle of optical modules, such as from 800G to 16T lasted only 2 years, reversing the previous trend of slower upgrade cycles in the cloud computing era.
2) In terms of the competitive landscape, it is expected that the market position of the leading manufacturers will remain stable. We believe that in AI data centers, with the increase in reliability requirements and the shortening of iteration cycles of optical modules, the technical threshold of the industry is expected to be significantly improved, and the advantages of leading optical module manufacturers, such as high reliability, leading R&D strength, and delivery capabilities, are expected to be further highlighted in the AI era, and we expect the market position of leading manufacturers to remain stable.
A different view from the market
1) The market is concerned that the demand for optical modules for AI networks may peak in 2024. We believe that the iteration of AI large models is still advancing rapidly, such as multimodal large models are still in the early stage of development, and under the guidance of the law of scaling, the investment intensity of various vendors in large model training is still expected to increase. In addition, with the continuous development of large AI models, areas such as "alignment" are also expected to generate additional computing power requirements. The commercial progress of AI is continuously advancing, gradually closing the loop of the AI industry. We believe that the release of AI inference demand in the future, especially with the advancement of multimodal large models, the continuous development of applications such as generation and generation or the further improvement of AI inference network throughput and communication bandwidth is expected to bring about the full release of the demand for high-speed optical modules. With the continuous emergence of AI applications, the cloud computing infrastructure is expected to release the need for continuous upgrades to support larger-scale, higher-performance computing scenarios. At present, 800G optical modules are mainly used in AI training networks, and we believe that the demand for 800G optical modules on the cloud computing side will also increase with the development of AI applications.
2) The market is worried about the deterioration of the competitive landscape of the optical module market. We believe that in the AI era, the threshold of the high-speed optical module industry may be further raised, and it is expected that the market position of leading manufacturers will remain solid. One of the biggest challenges in AI large model training is how to reduce the mean time between failures due to the long training cycle and many interruptions. From the perspective of network device hardware, the reliability of optical modules is particularly critical, because optical modules are the most prone to failures in AI training networks, which largely determine the training efficiency of large models. We believe that in AI data centers, with the increasing reliability requirements and shortened iteration cycles of optical modules, the advantages of high reliability, leading R&D strength, and delivery capabilities of leading optical module manufacturers are expected to be further highlighted, and we expect the market position of leading manufacturers to remain stable.
Highly recommended companies
In the past 10 years, the rise of the mobile Internet and the rapid development of cloud computing have injected long-term development momentum into the optical module industry. Looking ahead, we believe that AI will open a new growth cycle for the optical module industry, which is expected to bring positive changes to the industry in the long run. We are optimistic about the development opportunities of optical communication manufacturers that cut into the ** chain of overseas head cloud manufacturers, and it is recommended to pay attention to: [Optical Module] Zhongji Innolight, Huagong Technology; 【Optical Engine & Optical Devices】Tianfu Communication, [MPO] Taichenguang.
Table of Contents of this Report:
This is an abridged excerpt from the report, the original PDF of the report
Special Research on Information Technology-Science and Technology: Optical Transceivers: The Leap of the Times, From Cloud Computing to AI-Huatai**[Wang Xing, Gao Mingyao, Wang Ke, Chen Yuexi]-20240219[Page 29]".
Report**: Value Catalog