Large domestic manufacturers collectively shielded Sora

Mondo Technology Updated on 2024-03-07

Visual China.

Text |Insight by Tseri, Author | renee

In the long river of AI, there is no shortage of technologies that become blockbusters and become famous overnight. The picture above is taken from Sora's demonstration**, whether it is the slightly fluffy cat hair, or the owner's frowning eyebrows when he is woken up, it is no different from the picture in the real world.

Half a month ago, Open AI released the first Wensheng ** model Sora, which once again made the AI world boil - it is the first model that can generate up to one minute ** according to human instructions, and the last product with such a high degree of discussion is ChatGPT.

From ChatGPT to Sora, in the past 16 months, the war in the AI world has spread from the battlefield of the Thousand Model War to various subdivided applications.

For ordinary people, it is that you can ask various questions such as Wenxin Yiyan, Xunfei Xinghuo and other general large models to generate various texts and answers, but now you can generate a high degree of restoration through text descriptions. Here's a demonstration of OpenAI**, with a drone surrounding a beautiful and historic church on a rocky outcrop on the Amalfi Coast...

What's even more surprising is that the previous **model is a combination of multiple real**, and it does not have the ability to understand,And sora is the construction of ** after "understanding" people's words.

We can see that in the official introduction page of OpenAI SORA, the paper airplane has a sense of self and flies freely. This may be the ultimate goal of OpenAI, and it will also be the vision of countless AI manufacturers.

At present, SORA has beaten the drums of war, and whether major AI-labeled manufacturers such as Byte and Byte will follow suit, and whether they can create similar models, is a question in front of Robin Li and Zhang Yiming. But at present, the big factories are in silence, waiting for their own qualitative changes.

Of course, once it is made,Sora-like modelThe monetization path is much more "realistic" than ChatGPT.

It can be an efficient creative tool for platforms such as Douyin and Haokan**, or it can make simple special effects to provide more themes for the recently popular micro-short dramas. A further guess is that most of the special effects costs and labor costs in film and television works can be saved.

However, for domestic enterprises that start businesses with large models, there is another hurdle - ChatGPT has not been digested, and there is a new topic. With, no more energy, no follow, no heat may cool faster.

A series of articles and ** display effects show that the birth of SORA means that the subdivision of the application of the Wensheng ** modelIt's time for the iPhone.

In fact, Wensheng ** is not a new thing, at the end of last year, global AI companies have successively released their own Wensheng ** models. In November last year, Meta released the ** generative model emu video. As you can see from the official examples, it is limited to simpler actions.

Immediately after that, stabilityAI has also released an open-source generative model, Stable Video Diffusion (SVD), and has generously admitted its shortcomings in the official ** - the generated ** is relatively lacking in dynamics.

That is to say,Generating highly consistent and dynamic content that really moves was the biggest challenge in the generation space at the end of last year

The best performer before Sora's release was Pixeldance. Judging from the results shown, in the basic mode (the user only needs to provide a guide** + text description), the character movements, facial expressions, camera perspective control, and special effect actions can be completed very well.

Large domestic manufacturers also began to lay out at the end of last year.

On November 18 last year, ByteDance launched the Wensheng ** model Pixeldance, which can generate highly consistent and rich dynamics**. Ali also launched the Animate Anyone model, a character**, and with the guidance of skeletal animation, it can be generatedAnimation**.

At the end of 23, the Wensheng ** tool "Du Plus Editing" was released, which is said to be able to get the latest hot spots and AI-generated copywriting with one clickOne-click generation**。At the same time, a large number of AI startups "born in response to ChatGPT" also participated at the end of last year to jointly explore the application of AI large models.

And the time comes to mid-February 2024,OpenAI's SORA is still far ahead with more realistic and smooth effects.

Why does SORA stand out from the crowd? Uncle Zhou Hongyi gave the answer, which roughly means that before this, we used diffusion to make ** and make pictures, and we can see ** as a combination of multiple real**, it does not really grasp the knowledge of this world.

But soraIt can be understood like a human being that a tank has a huge impact, and that a tank can crash a car without a car crashing into a tank. Openal leverages its advantages as a large language model to combine LLM and DIFFUSION for training, allowing SORA to achieve both real-world understanding and simulation.

Since the Transformer architecture has led the wave of general large models, the emerging research framework of LLM+DIFFUSION may have a large number of followers.

Whether to follow open source or not, and whether to engage in open source or not, has become a difficult problem for every AI vendor.

In fact, the question of whether to follow SORA or not may no longer depend on the willingness of the individual company, but on some hard conditions + soft power. For example, does the company still have enough chips?

Last year, according to the estimation of Cerry Insight, ChatGPT consumed more than 30,000 A100 in the access stage alone, which is already a game for giants. According to an exchange minutes circulated last year, large manufacturers have abundant resources - Alibaba Cloud AI experts mentioned that Alibaba Cloud has tens of thousands of A100 pieces on the cloud, and the overall number can reach 100,000 pieces, and the group size should be 5 times that of Alibaba Cloud. Tencent Cloud uses H800 accelerator cards to build a large-scale computing power cluster, with a cluster scale of thousands of servers.

Although SORA is only a ** model, the thirst for computing power is not small. At present, there is no clear algorithm architecture and detailed training data on the training side. According to the estimation of people's livelihood, all new youtube needs to be trained within a monthApproximately 231 tablets of A100. Considering that the model is trained many times, the computing power requirement is still likely to increase to a large extent. And the demand for inference is even more staggeringAssuming that the SORA model parameter is 3 billion, it corresponds18.46 million A100 demand

Immediately after that, the second fatal question is, does the company have high-quality datasets? At this stage, international manufacturers such as Google and OpenAI are competing for high-quality text datasets.

According to the experience of the first batch of large-scale model trendsetters, if they do not have enough ammunition, they are likely to leave the scene in a hurry.

According to incomplete statistics from Zhidong, from November 2023 to January 2024, 4 AI large-scale model startups announced their closure. Among them, there are independent teams from large factories, newly established companies by the founders of star products, and "old" enterprises that have gone through more than ten years. Some lack money, some lack accurate positioning, and if they want to play AI, it is far from enough to have empty enthusiasm.

In fact, the more fatal question is, does the company have AI genius?

Zhou Hongyi said that the ultimate competition in science and technology is talent density and deep accumulation. The density of talents here is by no means about the scale - after all, there are only 13 people in OpenAI's SORA team, led by fresh doctors, and participated by post-00s.

We have to admire OpenAI's courage to give full play to the "selection of talents", in addition to the unknown potential, young people also have one of the biggest advantages - to survive. According to Xie Saining, SORA is Bill's painstaking work at OpenAI, "although I don't know the details, they basically don't sleep every day and work intensively for a year."

It can be seen from the background of some of the above team members that the threshold is extremely high - most of them are doctoral graduates from prestigious universities + internship work experience, there are undergraduate students, but they must also have entrepreneurial experience and multiple work experience. Whether it's OpenAI, or the talent himself,have accumulated countless quantitative changes to prepare for qualitative changes.

Perhaps, AI companies, including OpenAI, are waiting for which "nobody" (either MIT, Qingbei level students, or 3-5 years of entrepreneurial experience, large factory experience, or a combination of the two) to realize the truth of AI, become famous in a fight, and become an AI god.

What ChatGPT and SORA tell us is that A100, high-quality data, and AI genius are scarce materials in the new era, and large domestic manufacturers are still lacking.

In the case that all kinds of AI resources are very limited, all in AI manufacturers will inevitably consider the return on investment ratio before following suit, that is, which industry models can SORA greatly change? At the same time, it is also closely related to the existing business of large factories?

The answer is none other than the short-term industry.

For creators who are positioned on hot topics, SORA can be said to greatly improve work efficiency. After all, the hot topic competition is timeliness, and it is difficult for everyone to have room for second creation, as long as they learn it, they can quickly open the gap. At present, in China, there are e-commerce service agencies that have launched the service of "AI automatically writes ** scripts according to hot stalks" to anchors with goods.

For some in-depth topics, creators can cooperate with AI in the division of labor.

The creator produces differentiated content of the event product and confirms the outline of the article; AI is responsible for the part of repetitive work, such as showing the characteristics of the product, automatically adding backgrounds to the text, without the need for creators to go to the major **search. In other words, creators can save a lot of things by relying on AI, and there will be more "flash of inspiration".Attract more users to stay.

Among them, the biggest beneficiary is the platform. Therefore, it is not difficult to understand why on February 7, Zhang Nan, CEO of Douyin Group, resigned as CEO of the group and focused on the development of Jianying. It is understood that in the past year, Zhang Nan has tilted most of his energy to the business related to Jianying, and personally led the team to seek breakthroughs in AI-assisted creation, and will soon launch an AI raw picture and first-class products.

*The platform Kuaishou will inevitably work hard in this direction: in the wave of large models last year, Kuaishou announced that it has set up a large model research and development team to promote the creation of search and AIGC assistance**aspects of application.

, is also keeping up with the pace. Although the presence of good-looking ** is not high, it has been promoting the business in a real way every year: Since the second half of 2020, a series of large-scale operations have begun. It has successively invested in MCN Muyun Culture, introduced Song Jian (general manager of the content ecological platform, who left the company a year ago), and announced the acquisition of YY China for $3.6 billion on November 17.

In 2024, micro-short dramas will explode, and they will continue to increase the market. According to Photon Planet, the infrastructure construction adapted to the micro-short drama will be completed in the first half of the year, including creator ecology, distribution logic, user operation, etc., and try to run through the monetization path.

Micro-short dramas may be the "place to use" such as the sora** model.

Compared with large-scale film and television masterpieces, the threshold for special effects production and content creation of micro-short dramas is lower, and at the same time, the SORA-like model can provide more themes for the recently popular micro-short dramas, such as science fiction. If the details of the characters are realistic, the actor's salary may be zero, and if the cost is very low, the content industry will be quickly reshuffled.

We urgently need to launch SORA in China, in addition to the needs of new business, we also need to consider the problem of location. As the initiator of the last thousand-model war, it is time to prove yourself again.

The spring of 2024 comes very late, both in terms of real weather and the AI industry.

Last winter, the research department was able to catch up with the wave of large models with the accumulation of AI in the past, and the marketing department was able to take advantage of this to hold a conference and start a thousand model war with friends.

This winter, when the researchers were still working overtime to think about how to implement the general model into thousands of industries, there was another fierce rival in the field of Wensheng**.

Always moving forward in the dark, there will always be people walking faster, and the dawn of domestic AI manufacturers is near and far.

However, sora, who is closer to the dawn, also needs to strengthen the study of common sense - after reading the original ** of the head picture in full, I found that the girl's arms and hands are not in the same dimension.

Related Pages