In the early morning of February 16, Beijing time, OpenAI released the first Wensheng ** model SORA. The model can generate a 60-second piece of content based on the user's input text prompt description.
In the past 3 days, OpenAI and Sora have occupied the center of the topic of the AI industry, whether it is the users of AI tools, or Musk, Yang Likun, Zhou Hongyi and other technology giants, they have all expressed their opinions one after another. Musk said that "human beings are willing to gamble and lose"; Zhou Hongyi predicted that "the realization of AGI will be shortened from 10 years to 1 year".
SAM Altman, CEO of OpenAI, actively interacted with netizens on the social platform X, and also took the opportunity to start hiring: "OpenAI is the most talented and friendly group of people I have ever met in one place, dedicated to solving the most difficult, interesting and important problems, all key resources are in place, very focused on building AGI (Artificial General Intelligence), you should probably consider joining us." ”
What do insiders think?
In 2023, Wensheng diagram and AI dialogue are in full swing, and the speed of progress is visible to the naked eye, but Wensheng** is like a "gold mine" slowly developed by AI, and startups such as Runway and PIKA have surfaced. Until the beginning of 2024, OpenAI released SORA and showed a number of ** generated by SORA, crushing the ** generation length of the industry's current average "4S", and improving the generation quality to a higher level.
In the official demo**, Sora can directly output a picture with multiple characters, multiple scenes, and camera movements. For example, a prompt reads: The camera walks through the bustling streets of Tokyo, following a few people enjoying the snow and shopping. In the ** generated by Sora, the camera swoops down from the snowflakes in the sky and follows a couple holding hands through a Japanese-style street.
Another example is the description of a chic, sleek lady on the streets of Tokyo, filled with warm neon lights and dynamic city signs.
In the ** generated by Sora, the lady is wearing a black leather coat and a red skirt walking on the neon street, not only the subject is coherent and stable, but also has multiple shots, including the slow cut from the street scene to the close-up of the lady's facial expression, and the light and shadow effect of the neon lights reflected on the wet street floor.
*After it was issued, netizens exclaimed in countless languages on social ** around the world: Reality no longer exists. Industry bigwigs analyze and evaluate SORA from different angles.
Musk left comments on social platform X, such as "human beings are willing to gamble and lose" and "human beings will create excellent works with the help of AI"; Cristobal Valenzuela, co-founder and CEO of Runway, one of the players in the AI industry, said that what used to take a year of progress has become a matter of months, and then into days and hours.
According to ** report, the founder of Mobvoi sighed in the circle of friends: "LLM ChatGPT is a simulator of the virtual thinking world, and the ** generative model SORA based on LLM is a simulator of the physical world, and both the physical and virtual worlds have been modeled and simulated, what is reality?" ”
In addition, Zhou Weiwei, vice president of Hongbo Co., Ltd. and CEO of InBev Digital, also analyzed the ** that touched her the most in the circle of friends, and praised "From the perspective of art, Sora obviously knows how to distinguish and reasonably match various montages, and the stream of consciousness is ......."From a technical point of view, it is really ...... to complete a stable and complex RTX so quickly”
She also bluntly said that in the AI era, one step is fast, one step is fast, and the first-mover advantage barrier far exceeds that of the Internet era, "Catching up or overtaking in corners?" In the past, many proud comprehensive talents seemed so mottled and powerless in front of strong AI, rather than sighing on the spot, it is better to bow down to the game and at least be a qualified tool person who is proficient in tools. ”
Zhou Hongyi, the founder of 360 Group, quickly posted a long Weibo and ** after the release of Sora, predicting that Sora may bring huge disruption to the advertising industry, movie trailers, and short ** industries, but it may not beat TikTok so quickly, and it is more likely to become a creative tool for TikTok.
He believes that the power of large language models lies in their ability to fully understand the knowledge of the world. Previously, all Wensheng diagrams and Wensheng ** were operated on graphic elements on a 2D plane, and the laws of physics were not applied.
This time, many people analyzed SORA from the technical point of view and product experience, emphasizing that it can output 60 seconds**, maintain the consistency of multiple lenses, and simulate the natural world and physical laws, which are actually more superficial. The most important thing is that Sora's technical thinking is completely different. In the ** produced by SORA, it can understand like a human that a tank has a huge impact, and a tank can crash a car without a car crashing into a tank. ”
Zhou Hongyi believes that this also represents the direction of the future. Based on the understanding of human language, human knowledge and world models, and many other technologies, we can create super tools in various fields. In addition, the emergence of SORA may mean that the implementation of AGI will be shortened from 10 years to 1 year.
San Francisco-based early investor Zak Kukoff**, a team of less than five people, will make a film that grosses more than $50 million at the box office in five years, using the Wensheng model and non-union labor. Many domestic AI creators are also imagining that a movie made by AI will arrive as soon as possible.
OpenAI explains the technological breakthrough point in detail
The emergence of SORA has transformed words into a feast, and in addition to shock, related technologies have also attracted much attention.
Jim Fan, a senior scientist at Nvidia, said on X that he sees some strong objections: "Sora isn't learning physics, it's just manipulating pixels in 2D. Jim Fan said he disagreed with this reductionist view.
He then posted that SORA is a data-driven physics engine that simulates many worlds, both real and fantastical. "The simulator learns complex rendering, 'intuitive' physics, long-term reasoning, and semantic fundamentals with some denoising and gradient math. ”
Xie Senen, an assistant professor at New York University, published a number of tweets analyzing the SORA, speculating that SORA is built on a diffusion Transformer model, and the entire SORA model may have 3 billion parameters.
While everyone is analyzing SORA's technical achievements in the existing information, OpenAI is uncharacteristic and announces the relevant technology
We explored the use of ** data to train generative models at scale. Specifically, we jointly trained a diffusion model with text as input on ** and images of different durations, resolutions, and aspect ratios. We introduce a transformer architecture that operates on the spatiotemporal sequence package and the latent encoding of the image. Our state-of-the-art model, Sora, has been able to generate high-fidelity for up to one minute, marking a major breakthrough in the field of generation.
Our findings suggest that by scaling up the scale of generative models, we are expected to build general-purpose simulators that can simulate the physical world, which is undoubtedly a promising development path. ”
This technical report focuses on two main areas: first, it details a method for transforming various types of visual data into unified representations, enabling large-scale training of generative models; Second, an in-depth qualitative assessment of SORA's capabilities and its limitations was conducted.
It is important to note that the specific technical details of the model are not covered in this report.
In addition, OpenAI has publicly acknowledged some of SORA's flaws: it may be difficult to present complex physical changes, unable to understand cause and effect, and confusing spatial details.
For example, in the demo**, "five gray wolf pups frolicking and chasing each other on a remote gravel road", the wolf population will change, and some will appear or disappear out of thin air.
SORA is currently characterized as an early stage of research and is not intended for public use because the company is concerned about the abuse of deepfakes**. Only a subset of visual artists, designers, and filmmakers now have in-house trial opportunities. Many players in the industry, including some practitioners in the film and television industry, have expressed their anticipation for the day of full opening.
Synthesized from OpenAI, 21st Century Business Herald, X platform, etc.