After the Sora cooling off period, we are more concerned about what the combination of AI XR can bri

Mondo Technology Updated on 2024-03-07

Text VR Gyro.

It is undeniable that OpenAI can always create explosive models.

Just before the end of the Chinese Lunar New Year holiday, OpenAI once again dropped a bombshell on social **. The advent of the new artificial intelligence system, SORA, heralds a radical change in the way modern content is created.

Source: SORA

According to OpenAI, Sora can not only generate text prompts that take up to a minute, but also generate them from still images, or fill in the gaps by extending existing ones or by generating missing frames.

Although the model is currently only in the application for internal testing, judging from the reactions of social ** at home and abroad and the examples provided by the official, the ** generated by SORA surpasses the "predecessors" in this field in terms of quality and reliability, showing a strong momentum to become the next ChatGPT.

Of course, there is some hype in this, but it is undeniable that behind the explosion of SORA is people's national carnival of the fourth major technological change after the information revolution, and another milestone in the evolution of "multimodal true artificial intelligence" appears at the beginning of 2024.

After the launch of SORA and its technical report, OpenAI's high-level effect of 60 seconds long, high-definition, controllable images, and multi-angle switching** has gone viral on social networking at home and abroad.

Under the magic of Sora, people know for the first time that AI-generated ** can be so realistic.

Even if it is integrated into the documentary, there is no sense of disobedience (Source: SORA).

You know, the ** generation effect of AI in 2023 is still like this:

Realistic, but apparently, the object is inflexible (Source: pika).

In just the past few months, the text generation technology has achieved a leap from 5 seconds to 60 seconds, from animation to documentary quality, which makes no one sweat.

Realistic visuals and "the future is here" social ** viral marketing have made Sora the most breakthrough in the AI field at the beginning of 2024, and the limelight even overshadowed Gemini 1, which was released almost at the same time5. For a while, both the entertainment sector and the technology sector were full of Sora's figure.

A year ago, an AI-generated Will Smith eating noodles went viral on the Internet, with the number of the noodles on Twitter alone exceeding 8 million.

A year later, Will Smith uploaded a ** on his Instagram after Sora swiped the screen, with the accompanying text "It's getting more and more out of control".

As you can see, the screen is divided into two sections: the upper half shows the AI from a year ago, and the lower half shows the current AI

Source: x Although everyone soon found out that this was just Will Smith playing a meme, **The second half was not generated by AI, but recorded by himself, but many netizens shouted that they were deceived: "The most creepy thing is that you can't tell if this is a performance or an artificial intelligence generated." ”

This also proves from the side that the emergence of SORA has made people begin to believe that **generative AI can be fake and real, and the moment for AI to replace **editing seems to have arrived. In another demo** released by Sora, it can be seen that the biting movements of the characters during eating, the notches on the burgers and the tooth marks are all the same as the laws of reality, which are not at all comparable to the terrifying effect of Smith eating noodles a year ago.

The burger is flawed, but the tooth marks are very restored (Source: Sora).

However, no matter how good the generation effect is, there will always be people who can find flaws in it. After analyzing the demos generated by SORA, many people in the film and television industry said that although SORA has excellent performance in terms of image quality, detail, light and shadow and color, it cannot be directly used in film and television works at present, because it still has deficiencies in involving lens movement angles and finer content control.

An obvious case comes from this **, in which the character is running in reverse on a treadmill, and Sora obviously doesn't understand the laws of movement yet.

Source: SORA

Another corroboration is the four-legged ant, and Sora knows what image represents the word ant, but the understanding of the complete ant image is still insufficient.

Source: SORA

However, even if SORA is not perfect, the generation effect is shocking enough, and it is widely believed in the industry that SORA can be used for pre-film and television development such as concept design.

Coupled with OpenAI's commitment to actively improving SORA's immaturity and the launch of AI voice cloning startup Elevenlabs to solve a series of problems of SORA's "missing voice", after breaking through the boundaries between reality and virtuality, SORA will bring more innovation and breakthroughs to the film and television industry in the future.

SORA is not the first text generation AI model to come out, but why is it that only SORA has become a phenomenal global hit?

Fundamentally, one is the cross-level leap of first-class quality, and the other is "unexpected core technology".

Among them, the quality of SORA generation is obvious to all, and the coherent generation time of 60s alone is not comparable to Runway and Pika. The reason for the fault-type leading quality is considered by industry insiders to be the strength of the core technology.

The three model generation effects shown by the blogger "Daily News" (source: x).

Li Mu, a domestic deep learning expert, believes that SORA is similar to the moment when GPT2 is upgraded to GPT3 in the generation world, and the DIT ratio of the model may not change much from the previous work, but it uses hundreds of times the computing power, which is a miracle. The VIT, DALL·E, DiffusionMethods, and VAE based on this model are not new technologies, and I believe that the academic and open source communities will soon follow up with this kind of demo application.

Different from Runway and Pika, SORA uses the Transformer solution, which has been very useful on GPT before, in the field of diffusion model, and uses the strong contextual understanding ability of the text model for the "frame generation" of diffusion**.

Source: SORA

To put it simply, SORA doesn't directly convert the text into each frame in the text, but completes the whole by processing each space timepatches.

This is similar to the block generation in the field of 3D generation, where SORA analyzes the text and cuts the key elements in the space-time represented by the entire content into corresponding image patches including objects, actions, backgrounds, etc., and reintegrates these patches into a noisy picture with the data information of the physical world through the built-in knowledge graph. Finally, the noise image is refined by the diffusion model to become a frame-by-frame generated **.

Source: SORA

Under the constraints of spatio-temporal information, the **content generated by SORA is obviously more loyal to the instructions, which is equivalent to SORA laying the script for ** in advance, and the content generated in ** is like actors and sets that run strictly according to the script, which is also something that Runway and Pika could not do before.

And these achievements are inseparable from the core team behind SORA. OpenAI researcher Jason Wei was amazed after revealing a list of daily work schedules that were more compact than 996, saying, "Openal is nothing without its people."(OpenAI is nothing without employee contributions.) )”

Source: X According to previous social ** news, the SORA team members are very young, and there are even post-00 scientific research members in the team. Among these participants, known core members include Tim Brooks, R&D lead, William Peebles, and Connor Holmes, system lead.

Together with computing power, talent is regarded as one of the cornerstones of AI development, and Xie Saining, a CV god who was previously misrepresented as one of the authors of SORA, also believes that talent is the three core factors for the birth of a complex system like SORA, and the other two are data and computing power.

Relying on an amazing enough demo**+ the young team behind the world's third largest unicorn, Sora has earned enough traffic, and has become the only god in the field of text generation ** of punching runway and kicking pika before it is released, and even created a new "AI monetization channel" in China.

When SORA has not yet been tested publicly, the "AI lecturers" represented by Li Yizhou have already sold online courses with great fanfare, which is bound to let the "family" catch up with the first wave of using SORA to make a lot of money.

Source: Internet.

It's just that Li Yizhou has long been picked up and is not an expert in AI, and the content of his courses is basically the most basic common sense, more about "emphasizing the power and importance of AI" and "using SORA keywords to attract traffic and monetize, sell accounts, sell generates**, sell tutorials" and other old-fashioned monetization operations of Internet lecturers. is the same as the previous "teach you how to use chatgpt", it belongs to eating openai's secondary traffic cutting leeks.

Rather than focusing on how to catch up with the latest AI technology, it is better to pay more attention to the emergence of AI to change the production model, after all, AI will move towards fool-like ease of use in the future, and exploring how AI can better value-add content production in what field is what future workers should be more concerned about.

AI "one-click generation of advertising images" tool Amazon AD (source: Amazon).

This is another reason why SORA is a hot topic, with the help of this text generation technology, people are seeing examples of AGI changing the content creation process.

Prior to this, AIGC has broken through the level of text generation and image generation, and now, the last hurdle of recognized creative media - one-click generation of * has also been opened, with the success story of ChatGPT in the past, the market generally believes that SORA can also become the next AI model to change the workflow, not just stay in theory.

After the release of SORA, there was a lot of speculation online about OpenAI's next steps. AI content creator "kwebbelkop" said that OpenAI will collect data from users to fine-tune the model to make SORA more powerful.

In addition, OpenAI will also collect these ****data** to enhance SORA's RLHF (Reinforcement Learning from Human Feedback Algorithm), which means that everyone can create social **hot** with one click through SORA. Based on this, OpenAI will even have the possibility to launch a brand-new ** platform composed entirely of AI-generated content, and compete with YouTube, TikTok, etc.

Source: x However, OpenAI's ambitions may not stop there. Transforming content production has always been the focus of people's attention on generative AI, and at present, OpenAI's artificial intelligence blueprint has included Wenshengwen's ChatGPT, Wensheng's Dall·E 3, Wensheng's Shap·E, and Wensheng**'s SORA.

On traditional smartphone and PC platforms, we have seen the dominance of ChatGPT in the field of AI generation. However, traditional hardware with a single interaction mode obviously fails to stimulate the full potential of multimodal AI, just as AI subverts the past, electronic hardware products also need to be upgraded at an accelerated pace to meet the potential interaction needs of the future.

Perhaps it is precisely because of the exploration of the AI interactive ecology that OpenAI will urgently put ChatGPT on the VisionOS app store after the launch of Apple's Vision Pro, the hottest terminal electronic device at the beginning of 2024.

The launch of ChatGPT on Vision Pro is an important milestone for OpenAI, directly showing the outside world how AI (especially multimodal AI) may interact in a more natural, intuitive, and immersive way in the future.

Vision Pro's eye movement and gesture tracking (Source: Apple).

It can be said that the cooperation between Apple's Vision Pro and ChatGPT has made XR devices once again expected to become a new choice for the next generation of artificial intelligence computing terminals, after all, its subversion of the work experience in just one month has made many technology bigwigs call it "amazing".

After the official release of Apple's Vision Pro, many social bloggers began to wear Vision Pro for various daily life and work scene experiences, among which many developers tried to use Vision Pro for coding work, and got XR work experience feedback worthy of reference.

Source: Apple.

IT entrepreneur Willem blogged about his first Vision Pro coding experience, saying, "Not only is it very portable, but it also provides a complete virtual world for your eyes!" It's almost like I've got a huge multi-monitor setup with me. ”

Willem and other people who have been positive about Vision Pro focus on the word "immersion", which is a real-world coding interface and almost completely shields from outside distractions: "In Vision Pro you are almost at one with the environment. I like to walk around the window and look at some ** or server output and feel like it's a big work machine. In a way, I felt like I was standing in a big computer room, which was completely different from the traditional desktop experience. ”

Immersive coding experience (Source: willem.)com)

And when Apple's AI era arrives, the immersive coding experience will be even more magical.

Mark Gurman, a well-known technology journalist, broke the news that Apple is preparing to add AI functions to the next major update of the iOS platform programming software Xcode to benchmark Microsoft's GitHub Copilot.

While the news suggests that Apple's feature update is intended to create as many new AI features as possible for iOS 18, iPadOS 18 and macOS 15, it is only a matter of time before AI features go live on VisionOS as an important part of Apple's future productivity loop.

The improvement of programming efficiency by AI is obvious, according to an official GitHub blog, since its release, GitHub Copilot has helped more than 1 million people improve developer productivity, helping developers increase programming speed by 55%.

Source: github

And this is not only happening in the programmer circle, but also in almost all office scenarios where AI can participate, and the work efficiency has been greatly improved. A similar example is Substance 3D, a 3D modeling program developed by Adobe for Meta Quest Pro, where 3D modeling in the virtual world has completely degenerated the keyboard and mouse, and the design model can be easily pinched out with simple gestures.

Following ChatGPT, it is possible to join the VisionOS ecosystem in the form of applications for mature SORA or other AI-generated images, models, and tools. The combination of AI and Vision Pro, two popular technologies, has begun to take shape to reshape the office and creative experience.

Source: X This road of technological change requires a lot of talents and enterprises to go forward, fortunately, Apple is not the only practitioner of the idea of AI+XR, just at the same time as the release of SORA, there is another thing that has also stirred up the domestic AI market.

On February 18, 2024, Meizu announced that it would stop the research and development of new projects of traditional smartphones, and instead focus on all in AI, fully investing in "AI for new generations". Although the reason for this decision is attributed to "mobile phones can't be sold", judging from its follow-up AI transformation plan, Meizu may be more concerned about the new market demand for AI + hardware.

Source: Xingji Meizu.

The details of the AI strategic plan announced by Meizu include building AI device products, reconstructing the FlyMe system, and building an AI ecosystem. Shen Ziyu, CEO of the company, emphasized in the press conference that Meizu will build a new AI device, reorganize the product form with AI native design, and support the global mobilization of AI with more powerful hardware computing power.

For Shen Ziyu's "tomorrow's device", many people speculate that it will be an AI mobile phone, after all, the Meizu 21 Pro with the name of AI terminal has been listed. However, there are also some voices who believe that Meizu has handed over the responsibility of replacing the traditional mobile phone form to XR glasses.

Last year, Meizu just released MyVU AR glasses equipped with its own intelligent "FlyMear" interactive system, and from the perspective of its just-released three-year AI vision, XR products will occupy a pivotal position in Meizu's product ecosystem in 2025.

Source: Xingji Meizu.

Starting with Apple's Vision Pro and its rumored AR glasses form products, including Meizu, Samsung, Huawei, Xiaomi, OPPO, Vivo and other traditional mobile phone manufacturers have entered the XR track, and now, Meizu All in AI, OPPO has also set up an AI center to concentrate resources on AI, just like iPhone to create the era of smart phones, the combination of AI+ intelligent hardware currently seems to be the best choice to open the next era of intelligent computing.

This trend has also affected the leading AI technology manufacturers, in addition to the previous news that OpenAI raised $7 trillion to bet on the chip empire, Midjourney, the top AI generation technology company in the industry, has also been revealed to be developing hardware products.

Midjourney is said to have poached Ahmad Abbas, the hardware engineering manager of Apple's Vision Pro, to help develop a tool to collect 3D data, manage 3D models, and even launch its own VR headset in the future.

The LinkedIn interface shows that Ahmad has joined Midjourney (Source: LinkedIn).

In the eyes of these leading technology companies, AI is inseparable from the application carrier of hardware, and consumer hardware products also need the assistance of AI to reproduce the glory of the smartphone era.

Whether it is Apple's own Vision Pro, the new form of AI PIN or the AI mobile phone envisioned by mobile phone manufacturers, they are exploring the best mode of integration with ChatGPT, SORA and other cutting-edge models, and in 2024, when the AI generation model enters the explosive stage, it is difficult for hardware manufacturers to accept that it is difficult for hardware manufacturers to accept it. The best carrier naming rights" will continue to chase me.

Related Pages