Since OpenAI announced its new generative AI model Sora, it can be said that "one stone stirs up a thousand waves", and related discussions have not stopped.
People are feeling anticipation, anxiety, and fear ...... the new tool, which claims to be able to output text commands for 60 seconds**Together, they form a huge "sora chaos map". On the one hand, various labels say "the** is generated by sora", but in fact, the funny** made by netizens has become the traffic password for "whole life", which has made many Internet fun people gain joy.
Netizens marked the funny** as "the**tagged by sora".
On the other hand, although SORA is not open for use, a large number of "training institutions" have appeared on the Internet, taking advantage of industry anxiety and information asymmetry to start harvesting leeks. A blogger who claims to be a doctor of Tsinghua University has sold more than 520 copies of the introductory AI course for 199 yuan a copy. Someone else found out that he sold 250,000 sets of this AI course a year, with sales of nearly 50 million. So netizens have called him the only AI giant who can be on an equal footing with Ultraman.
Memes made by netizens.
What exactly is Sora?
Sora, derived from the Japanese word "sora", not only refers to the sky, but also symbolizes the infinite space and infinite possibilities, thus leading to the meaning of freedom. In OpenAI's introduction to SORA, you can see countless paper airplanes flying freely in the air on the homepage, symbolizing the autonomy and creativity of the SORA model and reflecting the concept of freedom contained in "empty".
SORA Intro Page.
I think everyone should have seen a lot of all kinds of SORA generation**, I believe many people will have a question after reading **: How is SORA generated**?
In the technical report released by OpenAI, SORA is described as follows: SORA is a "diffusion transformer". There are similarities to traditional converters (including encoders and decoders) in the way they are processed, but instead of text labels, they process visual data called "patches".
Patches are actually a method for large models to decompose visual data into small pieces or small parts when processing ** and image data. By compressing the image into a low-dimensional latent space, the identity of the space is decomposed into multiple patches, so that the model can better process and generate high-quality image content. The advantage of this approach is that it allows the model to process visual data with different resolutions, durations, and aspect ratios, providing greater flexibility and capability for image generation.
Visual coding process.
The "diffusion transformer" is a technology that combines the diffusion model and the transformer architecture, which can generate or *** the "clean" patch in the image by using the transformer's ability to deal with the complex relationship between the data, as well as the strategy of the diffusion model to gradually refine the data, and gradually recover the clean data from the noisy data to generate the image or **.
Let's take a simple example, if we have a dog ** now, we can add noise to this ** step by step, so that it becomes more and more blurry, and eventually it will become a mess of noise. If we reverse this process, we can also remove the noise step by step for a bunch of disorganized noise, and restore it to the target**, and the key to the diffusion model is to learn to reverse the noise reverse.
In fact, the image and ** generators of Midjourney and Stable Diffusion, which were previously popular, are also based on the diffusion model, but the difference is that Sora can make the model ** multiple frames at a time, ensuring that the subject can remain unchanged even after leaving the line of sight, and it also shows a new spontaneous understanding of the grammar of film and television shooting, not only to follow the subject to move the camera, but also to change angles when moving the lens. It is still possible to keep the picture reasonable and complete.
Another strong part of SORA is that it "inherits" OpenAI's ability to understand the text, can generate high-quality **and** according to the prompt words, and can expand forward or backward **, for example, in this ** displayed on this official website, SORA can be expanded based on the same **beginning, extending to different endings, or introducing from different beginnings, and finally getting the same ending.
All three beginnings will eventually lead to the same ending.
But in fact, OpenAI's ambitions go far beyond that, SORA is not only a creative tool, it is actually a complex data-based simulation system capable of simulating real or imagined worlds. It creates photorealistic 3D scenes and animations by learning how to render a scene correctly, simulate physical behavior, perform long-term reasoning, and understand the meaning of the scene.
This allows it to create a lot of ** that does not exist in reality, for example, in the following **, the prompt "Realistic close-up of two pirate ships fighting each other while sailing in a cup of coffee**" requires not only Sora to generate a realistic 3D model, but also to have these models animated according to the rules of physics and simulate the dynamics of liquids, and to use advanced rendering techniques to achieve ** level of realism, even if the semantics of the scene do not exist in the real world, But the engine still achieves the correct physics rules we expect.
Although SORA is still flawed at present, it is a promising goal, and by building such a complex simulation system, we can model and build real-world digital interactions. Whether it is Google, OpenAI or Musk's XAI, the ultimate goal is to build a world model, for example, the MOSS in the movie "The Wandering Earth 2" is the incarnation of strong artificial intelligence, which can deduce the results caused by different choices by building real-world models and powerful computing power, so as to achieve the ultimate goal, which may be the ultimate form of AI in the eyes of many people.
But in any case, that's all for later.
Will Sora really smash the job of the entire film and television industry?
In fact, artificial intelligence has often become an "imaginary enemy" in many people's fantasies since the day it was born, and with the development of new tools such as ChatGPT, the distant fear of AI has gradually evolved into a deep worry about jobs at hand, especially after the release of SORA.
In terms of the ability of SORA to generate, the first to bear the brunt must be film and television practitioners. After all, the cost of making a 1-minute segment in the traditional way is very high, in addition to the scene, lighting, and actors, it is also necessary to communicate the storyboard in advance, find a good angle, and consider the position of the camera and the actor. If you need some special factors, such as fleeting light and shadow, ideal weather conditions, etc., then you have to gamble on your luck.
And all this is not a problem here in SORA, as long as it is through a simple prompt statement, it can be directly generated**, and compared with the previous AI tools, whether it is**The duration, the fineness of the picture, or the integrity of the details, or even multi-lens shooting, SORA can be summarized with "crushing", which will obviously have a greater impact on relevant practitioners.
In the meme picture made by netizens, the classic Hollywood logo "Hollywood" has become "Sorawood".
According to a recent survey of Hollywood industry leaders by industry research firm CVL Economics, anxiety is currently hanging over Hollywood, with 36% of respondents saying that generative AI has reduced the need for their company's daily job skills, and 72% of companies surveyed are early adopters of generative AI tools.
Of those, 75 percent said generative AI tools have prompted their business units to cut merger-related jobs. It is also expected that more than 200,000 jobs in Hollywood will be affected by AI in the next three years, especially post-production jobs such as visual effects, sound effects artists, and sketchers.
In fact, it is not only those in the film and television industry who are affected. In the face of the "dimensionality reduction attack" from SORA, some entrepreneurs in the field of AI**, like RUNWAY CEO Cristóbal Valenzuela, are ready for "game on", some like Guo Wenjing, the founder of PIKA, began to prepare for a new product that benchmarks SORA, and some people like Emad Mostak, CEO of Stability AI, can't help but sigh "Altman is really a magician" and regard SORA as AI** GPT-3 moments of the world. This time, many people really felt a sense of crisis.
The scenery should be long-sighted.
While sora is really exciting, you don't have to be overly anxious about it. On the one hand, there are still a lot of classic "soul mistakes" in the ** that sora generates. For example, in many **, characters and animals will disappear, transform or conjure up clones out of thin air; There will also be some "haunted" images that defy the common sense of physics, such as a candle blown by a person without changing, a basketball passing through the basket, a chair floating and moving, etc.
In the ** generated by Sora, the flame before and after the old man blows out the candle does not move at all, which is slightly weird.
On the other hand, AI is completely different from humans in terms of creative logic, so it cannot really distinguish between good and bad stories. There are many people who believe that the more machine-generated things are, the more precious they will be to human creations, for example, food with "pot gas" is often better than pre-made dishes, and utensils that embody the efforts of workers are not as accurate as machines but have more "temperature" ......Such examples can be said to be everywhere, let alone in film and television, which can best reflect human emotions and encompass various artistic categories.
For example, in many movie scenes, behind the expression, tone, and expression of the characters, there are not only a variety of delicate human emotions, but also the sum of half a lifetime's experience, emotions and customs.
Although these contents seem to be inconspicuous, they convey a lot of information all the time, and it is these contents that are truly combined into each unique person, and also form the flow of emotions between characters through various reactions and interactions, and it is the changes in these details that silently affect our emotions and bring us moving, which is difficult for generative AI to do, and this may be the fundamental reason why many AI generators have "no soul".
The classic film "Love Before Dawn" is composed of almost all dialogues
In addition, the use of AI in the film and television industry is not new, previously swept in Hollywood Best Picture, Best Director and other 7 awards "Instant Universe" has used Runway's AI** tools, last year 21st Century Fox has cooperated with IBM Watson to use AI tools to produce trailers for the AI-themed horror film "Morgan", Disney's Marvel completely used AI to produce the opening animation of "Secret Invasion".
Not long ago, NVIDIA founder Jensen Huang said in an interview, "In the last 10 or 15 years, almost everyone will tell you that learning computers is important for children, and everyone should learn how to code." But in fact, it's the exact opposite, our job is to create computing technology so that no one needs to program, to make programming languages more flexible, and now everyone in the world is a programmer, and the technology gap has been completely bridged. ”
And this seems to have become a true portrayal of the AI era, whether it is chatgpt-4 or SORA, with the help of ever-changing new technologies, people who do not understand programming languages can also make software programs, and people without film and television related technical backgrounds can also calmly make their own **, which will undoubtedly go further, activate new production capacity, promote the development of the industry, and even generate new links between people, which may be the greater significance of generative AI.
We have reason to expect that in the future, there will inevitably be more combinations and innovations in the production of AI technology and movies or TV series, and perhaps some wonderful works that we have never imagined will appear, bringing us more surprises.