OpenAI Sora Technology Interpretation Investment Opportunities

Mondo Finance Updated on 2024-02-19

1. Experts interpret the technical details and industry impact of SORA

1. The SORA released by OpenAI is the culmination of technology, and it has vigorously produced miracle results, and there has been an "emergence" in the field of **;

2. SORA has reached the level where it can be put into commercial or industrial production, and it is a milestone product in the field of creation;

3. The greater demand for Infra and AI chips brought about by SORA's achievements will increase rapidly, and the demand for high-end computing power consumption will grow rapidly;

4. SORA is limited by the shortage of computing resources and cannot use too large a large number of model parameters, but it still works very well, and the bottleneck of the increase in the number of parameters in the future is in the computing resources. To achieve AGI, it is necessary to have stronger AI chips and greater computing power.

2. Review of the AI sector at home and abroad from October 2023 to the present: Multimodal is driving the AI large model sector to enter the second stage.

1. There is a consensus that there are generational differences between domestic AI models and overseas ones, but the consensus that AI models are the core of the next generation of scientific and technological innovation and that domestic models must have independent intellectual property rights are still the mainstream. Since the end of October, the sector has been affected by the progress of industries at home and abroad and the fluctuation of the A** market.

1) At the end of October, after a round of in-depth **, coupled with the landing of the third quarterly report, the sector was close to bottoming, and companies with technology and performance capabilities such as Kunlun Wanwei and Zhongji InnoLight took the lead. At this time, the domestic AI industry is accompanied by the completion of large model filings, especially on November 3, the second batch of large model filings passed after nearly half a year of waiting, and the market pattern is gradually clear.

From November to December, the field of overseas large models ushered in a series of updates at the same time, which also supported this stage.

2) At the end of December, the first field began, including the impact of the drama industry policy, which accelerated the panic adjustment of the media sector, including AI large models and application companies, which continued until January.

At this time, the domestic AI large model began to explore the leap from GPT3 to GPT4, and the market generally believes that the domestic large language model is at the level between GPT3 and GPT4.

Minutes**: Wen Bagu Research] Mini Program

3) On January 26, according to Reuters, U.S. Commerce Secretary Gina Raimondo said that Biden would propose to require U.S. cloud computing companies to determine whether foreign entities are accessing U.S. data centers to train AI models. Based on this, the U.S. Department of Commerce has issued a draft for comments on "Taking Additional Steps to Address the National Emergency with Serious Malicious Cyber Activity" (the system shows that the release event is January 29, 2024), and the deadline for comments is April 29, 2024. This news further hit the sentiment of the domestic AI sector, which continued to decline as the market adjusted.

2. From an overseas point of view, AI is still leading the direction of the growth of the entire science and technology, and multimodality has become the main development direction, from large language models to multimodal, and then to general artificial intelligence, the path has gradually become clear, and the point of divergence is mainly in the judgment of rhythm. If the exit of ChatGPT at the beginning of 2023 is the main catalyst for the breakthrough of the AI sector, then the new round of breakthrough is a further breakthrough from large language models to multimodal, driving the AI sector to enter the second stage.

1) In the field of large models, although OpenAI has experienced a short period of internal management out of control, it has quickly got on the right track, and GPT applications and multimodal have accelerated.

On November 6, OpenAI held its first development Sunday, at which a number of major updates were made, including OpenAI's official announcement of its own GPT, which will soon launch the "GPT Store" to share revenue with creators. A disagreement ensued within OpenAI's internal management, but it was quickly resolved. OpenAI's ChatGPT remains the highest-ranked standalone app for AI end apps.

At the beginning of December, multimodal technology began to usher in multiple breakthroughs, pika10 test results are amazing; On the evening of December 6, Google released the latest Gemini large model. Gemini was created from the outset as a multimodal model that can generalize and fluently understand, manipulate, and combine different types of information, including text, audio, images, and more. Google released Gemini, Google's Gemini large model surpassed human experts in MMLU (Large-scale Multi-task Language Understanding) evaluation for the first time, and achieved 30 SOTA (current optimal results) in 32 multimodal benchmarks, surpassing GPT-4 in almost all aspects, on January 12, Open AI officially launched GPT Store to ChatGPT Plus, Team and Enterprise users, A variety of GPTS developed by partners and the community are available. Q1 will launch a GPTS Builder revenue program, where U.S. developers will be paid based on users' interactions with their GPTs as a first step.

Then this week, OpenAI released the first **generation model SORA, which can generate up to one minute of HD**, which can generate complex scenes with multiple roles and specific movements, showing breakthrough semantic understanding capabilities, complex scene understanding capabilities and consistency, etc., The performance effect is much better than the previously released **model, which truly opens the era of **large models.

2) In the field of computing chips, the strong demand for AI chips has triggered a continuous artificial intelligence investment frenzy, in December, AMD officially launched the AI GPU accelerator Instinct MI300X and the world's first data center acceleration processor APU Instinct MI300A, since then, from January to February, overseas technology giants have successively disclosed financial reports and new, showing a strong demand for AI large model computing power, Nvidia's stock price broke through the platform in the second half of last year amid multimodal progress and rising expectations.

3) At the market level, the expectation that the U.S. economy is still booming and a soft landing is gaining momentum while the U.K., parts of Europe, and Japan are in recession, making U.S. stocks continue to perform strongly.

3. Through the above review, we can summarize: 1) As we emphasized in our annual strategy, multimodality is the most important marginal change in the field of AI large models in 2024, but it is not the end of AI development. We still don't know for sure when AGI will arrive, but judging by the recent progress of multimodal exceeding expectations, the timing of AGI may be sooner than the market expects. The significance of multimodality is just like the release of ChatGPT at the end of 2022, leading the AI large model sector into the second stage.

2) For China, the pattern of large models and applications has become much clearer than half a year ago, but it is far from the time to see the endgame, therefore, in terms of investment, it is still to look at an industrial trend and buy a comparative advantage. We still believe that in 2024, China is expected to usher in a breakthrough in the field of large language models, and from the perspective of SORA, computing power is still a prerequisite, and having more computing power is the hard threshold for large model companies to run out.

3. How do you think of the emergence of Openal Sora?

On February 14, OpenAI announced that it would add memory capabilities to ChatGPT, and on the 16th, Google launched Gemini 15 series, the Pro version supports a context window of 1 million tokens, which is an order of magnitude higher than the previous large model in the industry, and 1 million tokens is equivalent to 700,000 words or 1 hour**, and the application scenarios of large models will be greatly expanded. Gemini's multimodal capabilities have also been significantly upgraded, from the first generation of being able to read images, to 15 can read **, for example, Gemini can find a certain moment in a certain movie and describe the relevant details, but now it has been less than 3 months since the release of the first generation, and the development of multimodal capabilities of large models may be faster than we think.

And the day after Google released Gemini, OpenAI's first ** model Sora made a stunning appearance. In our previous multimodal in-depth report, we clearly pointed out that the 2023 year of ** generation can be compared to the 2021 year of 2D image generation, considering the acceleration effect of large language models on various fields of AI, this year's ** generation may make greater development, and the emergence of SORA also verifies our previous view, as the overall technical level of the industry goes up, it may be similar to the field of Wensheng graphs, and some popular ** generation applications will run out, and the industry will accelerate its development. We focus on sorting out the changes and opportunities brought by SORA to different links in the industrial chain after achieving technological breakthroughs

The first level we believe is the opportunity for new technology solutions to be validated:

1) The demand for computing power increases. SORA adopts diffusion model + Transformer architecture, diffusion model performs well in generating diversity and quality, but it is difficult to achieve better semantic control and consistency, and this is what Transformer is good at, the technical path of diffusion and Transformer integration was only proposed last year, and then in December, Li Feifei and the Google team launched the ** generative model Wa.l.T also uses a similar technical solution, but still stays at the level of academic research, this time SORA further verifies the potential of the combination of the two models, and other players will increase their attempts in this direction in the future, and some open source models may also appear, which is expected to promote the overall technical level of the industry to go up. In the past, the computing power requirement of the Transformer model was greater than that of the diffusion model, and there will be more opportunities for the computing power provider here.

2) Increased demand for data. SORA unifies the data representation of images and **, and expands the model size and improves the performance of the model through a large number of datasets, and players with a large number of high-quality images or ** resources are expected to benefit.

3) Multimodal fields such as 3D may also accelerate. Compared with the previous ** generative model, SORA began to show the ability to understand and interact with the physical world, characters or objects will not be easily deformed in **, but can maintain a good consistency, OpenAI also said that SORA is not simply regarded as a **model, but as a "world simulator", and the expansion of ** generative model may be a promising way to build a general simulator of the physical world. It turns out that AI 3D models have not found a good balance in terms of generation efficiency and accuracy, and this wave of technological breakthroughs in the field may bring some inspiration to the 3D field, and the technical inflection point of AI 3D engines may be faster than the market originally expected.

The second level we believe comes from the opportunity to land applications after the breakthrough of the generation technology:

1) The commercial feasibility of generation technology in vertical fields such as advertising and e-commerce has been greatly improved. Compared with the previous model, SORA's semantic comprehension ability, consistency and flexibility have been significantly enhanced, which means that the commercial feasibility of the generation technology has been greatly improved, for example, the marketing field can provide more marketing tools for small and medium-sized Bs, and the e-commerce field can provide relevant services for sellers, and the corresponding company's customer payment rate and ARPU value have more room for improvement.

2) The cost of idea implementation has been greatly reduced. SORA can generate up to one minute of high-definition**, which has basically met the requirements of the current mainstream short** creation time, coupled with the maturity of previous text creation, Wensheng pictures and other technologies, the threshold and cost of producing different forms of content have been greatly reduced, on the one hand, players with rich IP resources in hand are expected to reduce the cost of trial and error, expand the way of IP monetization, on the other hand, there may be a new UGC platform with greater commercialization space, and everyone may create IP. In the past, for UGC platforms such as Xiaohongshu, Zhihu, Douyin, Kuaishou, etc., for every doubling of the threshold for users to create content, the number of content created by users will increase tenfold, and the scale of users on the corresponding platforms will also increase significantly. **After the generation technology matures, a new AI era may be born of Douyin, what is more interesting is that the former CEO of Douyin Group announced his resignation in early February, and the later energy will be focused on the editing tool clipping, it can be seen that large manufacturers actually see opportunities, of course, this process will also have the risk of the original players being subverted, for example, Adobe's stock price has also fluctuated recently, we believe that the domestic market focuses on the AI model of the company that continues to invest in the company can have more room for expansion, Companies that have overseas application access model capabilities and give full play to the value of tools can achieve deep cultivation of vertical scenarios.

Deep cultivation of vertical scenes: ** requires a combination of many elements such as script construction, picture, style, particles, special effects, etc., but to achieve the production of complete commercial content still needs people and tools, **The number of creators will expand greatly with the decline of the creation threshold, tool products provide more convenient and efficient tools with scenes, as well as the experience of understanding and serving vertical users, and pay more attention to the needs and experience of users on top of technology, which will also benefit.

3) Professional creators in the fields of film and television, games, and MR can use advanced AI technology to reduce costs and increase efficiency, such as the production capacity of animated films may be further expanded, and top players are expected to benefit.

4) As the content increases, the demand for data transmission, encoding and decoding, content moderation and other links will also increase, and relevant players believe that they are expected to benefit.

Finally, we believe that multimodality must be AGI, that is, the only way to achieve general artificial intelligence, and it is also the real starting point for AI business monetization, whether it is Google's Gemini upgrade or OpenAI's SORA, it is expected to promote the further acceleration of AI multimodal applications, and the changes at the industry level in 24 years may be more huge than in 23 years, and we are also optimistic about TMT's investment opportunities throughout the year.

Computing power: Based on the news of the AI industry chain since the Spring Festival, we believe that there is still room for upward revision in the computing power sector.

1) First of all, there are constantly new news and actions on the side of Open AI, according to reports, the company plans to raise 5 trillion to 7 trillion US dollars to open up the first chain of computing power chips, and the investment amount far exceeds the market size of about 500 billion US dollars in the global semiconductor market;

2) Nvidia announced that it intends to develop customized computing chips (such as ASICs), and also officially released Chat with RTX exhibited at CES in January, users can use LLAMA and MISTRAL models locally on computers using NVIDIA RTX graphics cards, and its inference framework is stronger than the common PyTorch and LLAMA-CPP.

3) Finally, from before the Spring Festival to the present, many leading manufacturers in the overseas AI industry chain have released their latest financial reports and given corresponding guidance, from which it can be seen that the construction of computing power is still the top priority of industrial development in the future. On the one hand, regardless of the performance of the traditional business of leading cloud vendors, the construction of their AI computing facilities is the primary driving force for the growth of capital expenditure in the future, and on the other hand, the AI revenue of industrial chain hardware companies such as AMD, Coherent (the parent company of Finisar), Lumentum (the parent company of Cloud Light), Arista and Fabrinet has achieved good growth. At the same time, AMD continues to emphasize the expectation of a market size of 400 billion yuan in computing power chips in 27 years (similar to the 360 billion US dollars we calculated according to TSMC), and Coherent has also raised the CAGR of the 800G and above optical module market size to 65% in 28 years.

Based on the above information, we believe that:

First of all, let's look at Open AI, which is not only a circle breaker of AI, but also a promoter of the industry, and a "catfish" (which may be related to the CEO's external propaganda style). Therefore, the impact of these new models on the industry is not limited to the model itself, on the one hand, SORA's own demand for training and inference computing power is very obvious, when the reading time is the same, the number of tokens is several orders of magnitude higher than the text, and the required computing power is the same; On the other hand, the catfish effect brought by the company is worth paying attention to, whether it is the amount of investment in computing chips given by Open AI, or the simultaneous release of new products with Google to steal the limelight, it shows that the company's ambition is to become a new giant in the AI era, and it is not ruled out that Open AI also reserves some undisclosed products.

Secondly, looking at the medium and long-term guidance given by industry chain manufacturers, although everyone is blind and even an order of magnitude of computing power demand is blind and has no exact understanding, but qualitatively, the guidance of the industrial chain implies the logic that the computing power space will continue to rise or even multiply.

Finally, look at NVIDIA's new moves, whether it is to make custom chips or launch PC software products, it is to implement the concept of fully participating in AI, and further increase the degree of excellence in the industry, especially at the edge of the PC, the pattern is still unclear compared to the cloud, in the case that Intel, AMD, and Qualcomm have entered the game, Nvidia will definitely bring big marginal changes to the AI terminal market from GPU, which will not only in terms of product maturity, but also in supply security and procurement** The upstream is good for the downstream PC factory.

Based on the above analysis and judgment, optical modules are still the strongest and most certain resonance attributes of the AI industry chain in the visible range, and we recommend the leading companies in the sector, Zhongji Innolight and Tianfu Communication, and at the same time, we recommend paying attention to Xinyisheng and Yuanjie Technology, which have potential marginal changes. At the same time, the development of models and edge architecture has also accelerated the volume and profit of PC OEMs, and Lenovo Group, the world's leading PC host manufacturer, is recommended.

Minutes**: Wen Bagu Research] Mini Program

Related Pages