Sora does not rely on brute force, and the big factory is busy opening the blind box

Mondo Pets Updated on 2024-02-23

Text|Digital Intelligence Frontline Zhao Yanqiu.

Edit|Niu Hui.

In the week after the start of the Spring Festival, the reaction of the domestic artificial intelligence circle and major manufacturers related to SORA technology to OpenAI's announcement of SORA was in stark contrast to the enthusiasm on the **.

OpenAI is becoming more and more closed-source, with almost no specific information, and China is still in the stage of opening blind boxes. I have to admit that SORA is a combination of algorithm combination, data selection, training strategy, computing power optimization and other capabilities, although these technologies may not be original to OpenAI, but OpenAI's deep insight into them, as well as exquisite system conception and design capabilities, make a "subversive" breakthrough, rather than a simple brute force.

In the face of such a large-scale system project, the domestic artificial intelligence circle still needs to be supplemented in all aspects.

01 The reaction of major manufacturers This week, companies such as Byte, Alibaba, Tencent, Huawei, and Inspur did not speak out. The R&D teams of some relevant manufacturers are "unpacking the blind box", and the information is absolutely confidential, "SORA will affect the company's product R&D plan this year." ”

It is worth noting that the degree of active attention and insight into SORA, in the middle and senior management of large manufacturers, is generally not as urgent and in-depth as after the launch of ChatGPT last year.

On the intranets of major factories, the "melon-eating masses" outside the core R&D team are sporadically posting and discussing, "not to mention the heat of discussion", and even the intranets of major domestic artificial intelligence factories are "zero stickers". This situation is very different from the hot search news on **, and even the wail about the widening gap between China and the United States.

However, some of the quicker moves can also give a glimpse of some sense of urgency in the industry. On February 17, the day after the release of SORA, the Alimo community launched an analysis of SORA's technical path, and the article was very hot; On February 18th, the school launched a series of SORA interpretation courses; Just started after the Spring Festival, and Inspur's related businesses have given an analysis report to SORA. Many large factories have arranged research and reporting operations related to their business lines, and some of them will make SORA analysis and research this week.

Since OpenAI revealed very little information, unlike some specific analysis of the technology after the launch of ChatGPT, the analysis of SORA has more guessing components and less specific basis.

From the internal discussions of employees in major factories, everyone focused on several directions: the technical mechanism of SORA, including whether SORA can become a real-world simulator; computing power; Commercial direction and timing. At present, there are still many "mysteries" about the technical mechanism; Speculation on computing power consumption is also confusing; In terms of SORA commercial time, ** ranges from one month to half a year, and it is generally believed that the speed will be fast.

From OpenAI's actions, including the release of SORA, ChatGPT, Dalle and the agent that has been emphasized, OpenAI may release GPT5 in the second half of this year, which will be the first version of the real agent. With this agent, for example, if you want to make an app in the future, GPT5 can automatically generate**, package and deploy, including application, configure domain names, and finally generate an accessible APP. These speculations also indicate that the future of work for every employee is being reinvented.

Although there is little wailing about the technology gap in the big factory forum, employees have complaints and helplessness in the exchange. However, some people believe that SORA is super beneficial to domestic AI, because in the global short market, Byte, Tencent, and Kuaishou account for the top three, and everyone knows that the SORA principle is based on the existing GPU computing power in China, and it is speculated that "if you are fast for a year", there will be similar products launched in China.

02 OpenAI does not rely on brute force, and the industry has paid attention to the amazing effect of SORA thanks to new algorithm combinations and training strategies. However, similar to ChatGPT, purely in terms of specific algorithms, it is not OpenAI's original.

SORA has put a lot of effort into algorithm organization and data training strategies to fully exploit the potential of algorithms and data and learn deeper knowledge. Liang Jiaen, chairman of Yunzhisheng, said that through architecture design and training strategies, rather than simple algorithm improvement, OpenAI continues to refresh the industry's cognition. This reflects OpenAI's deep insight into the potential of algorithms and data, as well as its ingenious system conception and design capabilities, rather than simply using "brute force" to make such a "subversive" breakthrough.

After SORA's official announcement, New York University's Xie Senin speculated on its technology. Due to Xie Saining's close relationship with the SORA team, his speculation has a wide impact, especially his guess that "the SORA parameter may be 3 billion".

Some people believe that there is some truth in the 3 billion parameter. According to a senior person's analysis, the best effect generated by SORA is amazing, but there are many details problems, it should be OpenAI to show off its muscles first, and OpenAI will further expand the model; Another veteran intuitively analyzed from the perspective of computing power, ** is three-dimensional, the computing power required for unit processing is very large, if the SORA parameter is too large, the computing power will be insufficient.

However, there are also some people in the industry who believe that "more than 3 billion".

3 billion parameters, I think it's misleading. A short-** artificial intelligence veteran told Digital Intelligence Frontline, "SORA relies on OpenAI's most powerful language model to generate caption (subtitles, captions). In the technical report provided by Sora, they briefly describe how they designed the automation technology, generated text descriptions, or converted short user prompts into longer detailed descriptions to improve the overall quality.

And judging from OpenAI's style of groping the boundaries of artificial intelligence, some people also believe that 3 billion is too small. "It's not in line with what it's always done, they're all 'miracles'. Song Jian, CTO of Zhongke Shenzhi, said to the front line of digital intelligence, in fact, the road has been pointed out in theory, and many companies have also tried.

A Wave source said that SORA's breakthrough once again proves that AI is a systems engineering, and purely static speculative parameters may not make sense.

In terms of generation, the difficulty of everyone in the past is that it is difficult to maintain the coherence or consistency of **, because there are many things that are contrary to common sense, such as wrong light and shadow, and space deformation, so the industry can't figure it out.

Whether OpenAI will eventually adopt a larger scale of parameters is not yet known based on publicly available information, but I suspect that they will definitely try it in their style. Liang Jiaen said that previously, when OpenAI did GPT3 from GPT2, it firmly believed that as long as the algorithm architecture is reasonable, through ultra-large-scale unsupervised learning, it is possible to defeat supervised learning through small-shot or even zero-shot learning, which is OpenAI's firm belief in scale effect. "This time, SORA has learned more 'knowledge' that conforms to the laws of physics through algorithm combination and data design, which is in line with OpenAI's consistent style over the years. ”

However, SORA cannot yet be called a proper simulator of the physical world. In the ** it generates, there are a large number of errors. OpenAI itself has also proposed in its technical report that this is a promising direction.

People have different needs for SORA. "If you're doing a digital twin now, you might as well build it directly with the physics engine as the underlying layer, like NVIDIA's Omniverse, which isn't exactly physical, but it's already very accurate. Song Jian said, "But for visual arts, it's about visual sensibility, and it's okay to be anti-physical, as long as it visually gives everyone a good enough impact." ”

03 Computing power conjecture "Everyone's speculation about computing power is very confusing now. An Nvidia source told Digital Intelligence Frontline. Since there is very little information released by OpenAI this time, it is difficult for the industry to evaluate.

Visual models or multimodal models are not the same as large language models in terms of computing power. A veteran of artificial intelligence computing power told the front line of digital intelligence that even though SORA may only have a few billion parameters, its computing power is estimated to be similar to tens of billions or hundreds of billions of large language models.

He further analyzed that he could refer to the Wensheng graph model Stable Diffusion, with only about 1 billion parameters, but it took nearly a month to train the computing power with dozens of servers. He estimates that SORA's training computing power may be at least an order of magnitude larger than the former, that is, hundreds of servers, and OpenAI will definitely go further to scaling and make the SORA model bigger.

On the other hand, the inference computing power of this model is also much larger than that of large languages, and some data have shown that the inference computing power consumption of stable diffusion is similar to that of the LLAMA 70B (70 billion) parameter model. In other words, in terms of inference computing power, a 1 billion Wensheng graph model is about the same as a 100 billion large language model. The inference computing power of SORA, the first generative model, is definitely much larger than that of the first generative model.

The text is one-dimensional, three-dimensional, and the unit is much more computational. An artificial intelligence expert told the Digital Intelligence Frontline that he thinks it needs to be a few kilocalories large to have a chance.

Due to the development of Wensheng ** promoted by SORA, the overall domestic computing power will still be very tight this year. According to a person from a computing infrastructure company, in terms of artificial intelligence computing power, the computing power of several giants in North America is now more than ten times the total computing power in China, or even more.

However, in some parts, domestic computing power has been idle. This includes several situations, such as some companies that started to train large models in the first half of last year, giving up the development of large models or using open source models instead; Last year, the application of large language models encountered challenges, and a large number of inference applications have not yet landed, which will lead to dozens or hundreds of idle machines for some enterprises.

Song Jian also found the problem of idle computing power. He observed, especially from around November 2023, the leasing of computing power became easier, and it could be the original 2 3 or even 1 2.

Related Pages