OpenAI threw a bombshell SORA at the industry, triggering an extremely fierce exchange of opinions in the tech community.
Yann Lecun, a Turing Award winner and chief AI scientist at Facebook, publicly stated that SORA is a generative pixel and cannot understand the physical world; Zhou Hongyi, chairman of 360, and Fu Sheng, CEO of Cheetah, have also been "in the ring" in recent days.
After the emergence of SORA, Zhou Hongyi mentioned a point of view that spread widely, he believes that the emergence of SORA means that the realization of AGI (General Artificial Intelligence) will be shortened from ten years to one or two years. Fu Sheng said in public that SORA is a product-level milestone, but it is not a technological revolution in AI, and AGI will not come within a year. The two debated on social platforms.
Chen Ran, founder of OpenCSG, believes that SORA is a more important milestone than ChatGPT, "In my opinion, ChatGPT is a stepping stone that makes a basic preparation for SORA innovation, and SORA I think is an innovation of the next generation."
Regarding SORA, OpenAI's official technical documents do not give more information, and there is no final answer to the confrontation around its views, but the bombshell dropped by OpenAI at the beginning of the year can be expected to be the focus of discussion throughout 2024, just as ChatGPT is to 2023.
This pot of wine is enough for all of humanity's brightest minds to savor for a while. Xiao Yanghua, a professor at the School of Computer Science and Technology at Fudan University and director of the Shanghai Key Laboratory of Data Science, believes that machines may subvert the way scientists and philosophers interpret the world for thousands of years.
An even more important milestone?
The appearance of SORA was both expected and unexpected by Xiao Yanghua.
It is expected that GPT will definitely develop into multimodality, which is the consensus formed at the beginning of last year. Unexpectedly, seeing the results, especially when it comes to simulations of the physical world, will upend so much of what we already know. ”
From a rational point of view, Xiao Yanghua told Yicai that the speed of OpenAI's progress did not exceed his expectations, because when ChatGPT appeared, many people judged that this was the singularity moment of human society, and once this moment was crossed, the future would be exponential development, and we were just witnessing the exponential development. But from an emotional point of view, "our receptors have never been able to accept only mild, linear changes, and the shock of SORA is still a huge shock." ”
The industry is divided into cautious and calm and optimistic about the emergence of SORA. As early as the day of SORA's release, Zhou Hongyi posted a long article on Weibo to express his optimism about SORA, he believes that SORA shows not only the best production ability, but after the large model has an understanding and simulation of the real world, it will bring new achievements and breakthroughs, "This is really not far from AGI, not a problem of 10 years and 20 years, it may be achieved soon in one or two years." ”
Fu Sheng poured cold water on this heat, in his opinion, SORA is actually a major milestone in the product, but it is not a bigger breakthrough than ChatGPT in technology, and it is not even so related to AGI, it is an extension of the ability of large models.
Chen Ran does not think that SORA can simulate the physical world, but he told Yicai that OpenAI's combination of Transformer architecture and stable Diffusion model is an original new structure, and it is undoubtedly a huge technological innovation and perhaps a more important milestone to go through this path. Chen Ran is a member of the large-scale model entrepreneurship as a technical talent, and the openCSG he founded focuses on the ecological construction of open-source large-scale models, hoping to link the upstream and downstream to make large-scale models, datasets, and agent AI (** set) more democratic and fair.
We now recognize that ChatGPT's Wenshengwen is a milestone innovation, and now it is a new huge innovation of Wensheng**, which will make a lot of variables appear on the application side, which is a huge change for the future entrepreneurial form and investment form, and is more revolutionary than Wenshengwen. Chen Ran believes that SORA has technological innovation and is a good product, but it has not really shown its power so far, and its future application may be wider than ChatGPT.
As an investor, Luo Xu, managing director of Lenovo Venture Capital, believes that compared with the shock brought by the launch of ChatGPT last year, SORA has a similar sensory impact on the industry, but in terms of technical difficulty, SORA will be higher than ChatGPT this time.
The main reason is that text data can be structured, but the data of ** is not structured and the volume is large, and it is relatively difficult to use such data to train. Luo Xu believes that SORA solves the training problem of a large number of unplanned data and finds an engineering method, so all previous attempts in the industry have been crushed.
Investors pay as much attention to SORA as entrepreneurs, and after its appearance, the topic of SORA cannot be avoided in all discussions at investment meetings.
Luo Xu told Yicai that the first point discussed at the internal meeting of Lenovo Venture Capital was what kind of state the technology is now, and secondly, what will this technology bring next?
We think that the technology launched now should be in the early stage of generation, but some things have been verified in the early stage, such as the training method can solve the problem of coherence and consistency of the timeline, but the ceiling and capability boundary of the multimodal model itself are very high, and there are more possibilities for further development. Luo Xu said that after internal discussions, he made such a judgment on technology, and there will be many development opportunities in this field this year.
The ensuing question is, if Wensheng develops as well as a language model, what will it bring next? Luo Xu believes that language description is the compression of world knowledge, and language models cannot compress a lot of perceptual information and information about the physical world, but this information is richer than language, if AI can be trained, it means that the model will improve the cognition of the physical world to another level, which is very important for its logical judgment and reasoning.
I think this is the beginning of multimodality, and it is a step forward in the direction of cognition, but how much value can be generated below, it depends on how much the multimodal model can play a role in the cognition of the world, and now we see that it is more of a tool for generation. If you grasp this direction well, your understanding of the world will be more profound. Luo Xu said.
Behind the polemics. After the launch of SORA, one of the most controversial points in the tech community is whether the model can understand the physical world, and on this basis, can it promote the rapid arrival of AGI?
In the technical documentation, OpenAI positions SORA as a generative model for World Simulators. "SORA is fundamental to being able to understand and simulate real-world models, and we believe this capability will be an important milestone in achieving AGI." ”
Some people believe that based on its physical interaction effect, SORA is generated based on the understanding of the physical world, but many people believe that SORA does not understand the laws of physics, but only expands the image based on scale training.
Even the Turing Award winner Yann Lecun (Yann Lecun) has repeatedly made a statement, on February 17, he said on the social platform X: to clarify a "huge" misunderstanding, generating most of the seemingly realistic ** from the prompt does not indicate that the system understands the physical world, and the generation is very different from the cause and effect of the world model**. He argues that the method of building a model of the world by generating pixels is doomed to failure.
On February 26th, Zhou Hongyi posted a 20-minute ** to refute the "authority", he said, "The person who looks down on SORA the most now is Yang Likun", although he is a veteran figure in this field, but it is not necessarily true that what the authority says.
SORA may not have summarized the laws of the formula from the study of phenomena, but it should have established the cognition of common sense, and only on this basis can the picture be restored. Zhou Hongyi believes that the launch of SORA announced a milestone in artificial intelligence, don't just see the appearance, we must see the development of artificial intelligence behind it, if the machine not only understands the language, but also learns human knowledge, and hides a lot of knowledge and physical laws hidden in the process of interaction between humans and the world, learn to understand, then it is not far from the real AGI.
Previously, Fu Sheng publicly stated that SORA would not promote the rapid arrival of AGI, which was contrary to Zhou Hongyi's previous views. In **, Zhou Hongyi also mentioned "Xiaofu (Fu) classmate" and reiterated his views.
Subsequently, Fu Sheng imitated Zhou Hongyi wearing a red dress, holding a mobile phone in front of the mirror**, and replied that "Lao Zhou is secretly changing concepts", what he mentioned is not whether SORA has an understanding of the world, but whether SORA has shortened the time of AGI, or whether it will be of great help to the arrival of AGI, and whether SORA has improved AI's understanding of the world. ”
As soon as Comrade Lao Zhou came up, he said that Sora understands the world very well, and AGI has changed from 10 years to 1 year, and I think this view must be wrong. Sora's understanding of continuous ** is definitely stronger than before, but there is no revolutionary breakthrough in the underlying technology, and there is no better understanding of the world than large language models. Fu Sheng said that AI must of course have a certain understanding, but in terms of whether it can reproduce the physical world, he believes that there will still be deviations over time.
There are also many people in the industry in the academic community who have taken a stand. Lin Dahua, a leading scientist at the Shanghai Artificial Intelligence Laboratory, recently posted on the circle of friends, "This time, I clearly agree with Yann Lecun's views." Admittedly, SORA is a milestone breakthrough in the build. But there is a huge gap between generating realistic ** and mastering the laws of physics and even agi, which is a completely different thing. ”
The more we tested GPT-4, the more we felt that humans were still far away from AGI. Lin Dahua said.
Zhao Junbo, a doctoral supervisor at Zhejiang University, also publicly said that SORA may not be a world model, "I also oppose many self-leading analogies of this technology to AGI, and we are still far from AGI." He said that a world model needs to have the ability to output actions, output the future **, and output the judgment of the current state. It's likely that Sora has learned some patterns of how the world works, but we don't know if it has the other abilities described above.
Chen Ran studied OpenAI's technical documentation, and he said that in the same way that the large language model uses the previous word (token) to the next word (token), SORA actually uses pixels to ** and generates the next pixel, but in the ** model, its basic unit changes from token to patch, that is, pixel block.
For companies that used to make visual models, they often did image and generation based on Diffusion, but OpenAI's credit lies in the integration of the architecture of the large language model Transformer with Diffusion, from the next token to the next patch, with a new generation path.
For those of us who are engaged in technology, we feel that the more shocking is not actually the generation of **, but it makes a docking between pixels and characters, and replaces the most critical point in the teleportation anchor with patch, which is very innovative, the language is a character, and the image can also be used to represent the law with characters, I think this is a very big revolution. Chen Ran said.
Chen Ran believes that a greater value in the future is that the metaverse may be accelerated, because the image also has rules, "every frame, every **, collect it to a certain extent, and the virtual world will be generated." In this sense, this is also where SORA is more grandiose and milestone than ChatGPT. ”
Face up to the gap. For domestic large-scale model companies, GPT-4 has not yet caught up, and OpenAI has made progress again.
The emergence of SORA may make many people aware of the gap. Xiao Yanghua said that there has always been a gap, and we must face up to the gap and have a sense of crisis. However, acknowledging the gap does not mean giving up, and we must catch up and narrow the gap, but we must also be fully aware of the arduousness of catching up.
From a domestic point of view, Zhao Junbo believes that the gap between us and North America has increased in this direction. "And this time it's different from GPT in that if you want to chase it, you basically don't even have an anchor. Meta is the most likely player to open source, but V-Jepa is taking a very different technical route at the moment. ”
Chen Ran began to make large models in April 2022, "I see that the gap between large models and the United States is getting bigger and bigger, on the one hand, the investment environment is getting worse, the cost of trial and error is getting lower and lower, and secondly, the computing power is 'stuck in the neck', the large model relies on datasets, computing power, ** and ecology, of which computing power is the key point, and there is no way to solve it in a short time, which determines that we will go slower and slower, which is equivalent to the United States is driving on the highway and we take the country road." ”
However, Chen Ran is not pessimistic, he believes that there are advantages in the application layer in China, and there will be a buffer period for the development of large models.
Last year was the first year of the big model, and in about three years, China may be in the stage of staggering, and the gap with the United States is getting wider and wider, but I think it will not be more than 3-5 years, and in the end capital is profit-seeking, and if this market can make money, capital will flow back. Chen Ran**, after 2027 and 2028, the gap with the United States will begin to narrow.
I think that in the end, this market needs some companies like Alibaba to emerge in the field of AI, and some entrepreneurs who dare to do things and have an international vision to create this market. Chen Ran said.
For the domestic catch-up, Xiao Yanghua believes that we generally blindly follow more, and really want to understand why to follow, how to follow, and how to differentiate the competition. In the future, we can actively deploy on other tracks of AGI to form an advantage, so as to balance the opponent in the overall strategy. "From a small point of view, the competition of AI is related to the fortunes of the country, and from a large point of view, SORA opens up a wider range of imagination space in the future, which is a major opportunity for the entire human development, and at the same time accompanied by a major challenge. ”
From the perspective of investors, Luo Xu believes that the ability improvement of companies with head technology models such as OpenAI does not seem to have reached the ceiling, and it is still iterating, and the speed will be faster than that of startups, these companies not only have richer resources such as computing power, but also have engineering training methods that are not known externally, these methods make the training cost decrease, and the efficiency is improving, and these startups are still filling in and seeking experience in engineering methods. So that the distance will be farther and farther away.
Whether there can be a company like SORA in China, in Luo Xu's view, there is no conclusion yet, SORA's technical methods have not been fully mastered, and from the perspective of investment, its emergence can only be said to let everyone see what the direction of future progress is.
Since last year, Luo Xu has talked a lot about startups that pay attention to vision, but after the emergence of SORA, it has a greater impact on similar startups at home and abroad, because the technical routes adopted are different. Will SORA affect investors' choice of AI projects, and will they be more cautious about Wensheng**? Luo Xu is more confident in this direction, because "it has shown us the possibility of generating ** and the possible right way forward".
Last year, we were focusing on the direction of multimodality and generation, and this time SORA has increased the upper limit of the entire technical capability many times, and we have more confidence in the possibility of the future implementation of the technical direction. Luo Xu said that in the future, he will continue to look for corresponding entrepreneurs to do this, but it may still be a little difficult to catch up with SORA, and it is necessary to control expectations when investing and conduct more in-depth research on the industry.