Text |Liu Yiqin, **Finance ElevenThe hottest topic in the field of science and technology in 2023 is AI models. This craze is led by the American startup OpenAI, and in the months after the release of ChatGPT, Chinese companies have intensively released their own large models, and throughout 2023, the number of large models released by Chinese companies has exceeded 130.
OpenAI's ability to achieve technological breakthroughs is similar to that of many companies in the field of technological innovation. There are enough excellent talents, massive financial support, years of continuous investment, and firm goals. For a long time before the release of ChatGPT, the industry and investment community were mostly not optimistic about OpenAI, but it did not shake the direction of the company. In 2023, almost everyone agrees with the direction of the big model, and everyone believes that OpenAI has put the results out, and what other companies have to do is to follow up as soon as possible and continue to optimize to ensure that they can participate in the future.
Some blame the lack of large-scale investment in large models in the past on uncertain results. Now it has been determined that computing power, data, and talents can be increased investment, and Chinese companies are good at engineering optimization, and it is just around the corner to make large-scale model products that can be applied in practice.
But is that really the case? For OpenAI, large models have always been a definite direction, and most of OpenAI's funds were spent on computing power, when NVIDIA's A100 (AI-specific chip)** was much lower than today. SemiAnalysis, a third-party data agency, estimates that OpenAI uses about 3,617 HGX A100 servers, including nearly 30,000 NVIDIA GPUs. GPUs alone are not enough, and investor Microsoft has helped OpenAI build a large-scale customized computing power cluster, which can further improve the efficiency of these GPUs. In terms of data, OpenAI has continued to invest in every link such as data collection, data labeling, data cleaning, data collation, and data optimization. Most of the people in the OpenAI team come from top scientific research institutions or technology giants.
In other words, with this strength and investment, it still took OpenAI more than eight years to create a breakthrough product GPT4, and there are "hallucinations" (that is, answering non-questions, talking nonsense, etc.).
Why is it that in a few months, a Chinese company has been able to make a large model that claims to rival GPT4? Whose hallucination is this?
In the second half of 2023, some large models have been pointed out as "shells", directly applying foreign open source large models, ranking high on some lists that test the capabilities of large models, and many indicators are close to GPT4. A number of industry insiders told the "Caijing" reporter that the better the performance of the list, the higher the proportion of shells, and the performance will deteriorate if there is a slight adjustment.
"Shell" is only the tip of the iceberg of the status quo of China's large model industry, which reflects the five problems of industrial development, which are cause and effect of each other, and each problem cannot be solved independently. Today, the popularity of large models has declined significantly, and in 2024, the problems of China's large model industry will be further exposed. But under the excitement and problems, large models have played a role in the industry.
In November 2023, Jia Yangqing, former vice president of technology and AI scientist at Alibaba, posted that the large model made by a large domestic manufacturer uses Meta's open-source model Llama, but the names of several variables have been modified. Jia Yangqing said that because of the name change, they need to do a lot of work to adapt.
Previously, some foreign developers said that the "Zero One Thousand Things" founded by Kai-Fu Lee used llama, but renamed two tensors, so the industry questioned that Zero One Thousand Things was a "shell". Subsequently, Kai-Fu Lee and Zero One Everything responded, saying that the open-source architecture was used in the training process, and the starting point was to fully test the model and perform comparative experiments, so that they could get off to a fast start, but the yi-34b and yi-6b models released by them were trained from 0, and a lot of original optimization and breakthrough work was done.
In December 2023, ** reported that in the large model project secretly developed by ByteDance, OpenAI's API (Application Programming Interface) was called, and the data output by ChatGPT was used for model training. And this is an act that is expressly prohibited in OpenAI's usage agreement.
Subsequently, OpenAI suspended Byte's account, saying it would investigate further and ask for changes or termination of the account if it was true.
Byte's response to this is that at the beginning of 2023, in the early stage of the technical team's large-scale model exploration, some engineers will apply GPT's API services to the experimental project research of smaller models. The model is for testing only, with no plans to go live and never used externally. This practice has been discontinued after the company introduced GPT API call specification checking in April 2023. In addition, the Byte Big Model team has put forward clear internal requirements that the data generated by the GPT model shall not be added to the training dataset of the Byte Big Model, and the engineer team has been trained to comply with the terms of service when using GPT.
At present, the domestic large model is mainly divided into three categories:First, the original large model; the second is the shell of foreign open source models; The third is to assemble large models, that is, to put together the small models in the past and turn them into "large models" with large parameters.
Among them, the number of original large models is the smallest, and making original large models requires strong technical accumulation, and there must be continuous high investment, which is very risky, because once the model is not strong enough, these large-scale investments will be wasted. The value of large models needs to be commercialized to prove that when there are good enough basic large models in the market, other companies should explore new value points, such as the application of large models in different fields, or the middle layer, such as helping large model training, data processing, computing services, etc.
However, the current situation is that most of the participants are "rolling" the so-called "original large models", and they are worried that the risk is too high, so there are a large number of large models that are shelled and assembled. Whether you use the open source model directly or assemble the model, as long as it complies with the relevant specifications, there is no problem. When it comes to the commercialization stage, customers don't care much about whether it is original or not, as long as it is useful, and even many customers will prefer to choose non-original technologies because of the lower cost.
The problem is that even if it is assembled and shelled, everyone has to constantly emphasize "originality", and in order to prove "originality", it needs to be adjusted and modified, which will affect the iterative ability of the large model and fall into internal friction.
One of the foundations of the large model is massive computing power, and it is advanced computing power, so the large model is also called the aesthetics of violence. Nvidia's A100 was previously considered the most suitable for training large models, and recently Nvidia has launched a more advanced computing chip, the H100, but it has not yet been sold in the Chinese market.
A long-term partner of Nvidia told the "Caijing" reporter that in 2023, the price of A100 will increase by about 1 times, and according to his understanding, the Chinese companies that will intensively buy A100 in 2023 are mainly large manufacturers with their own business needs, including Alibaba, Tencent, ByteDance, etc., and there are few startups. Some well-known large-scale model startups will take the initiative to ask to establish a strategic cooperative relationship with him, in order to prove to the outside world that they are investing in computing power, "the kind that does not give money".
In 2023, Chinese companies that will intensively buy A100 are mainly large manufacturers with their own business needs, and there are few startups. Figure icDespite the "export control rules" of the United States, it is not impossible for Chinese companies to obtain NVIDIA's computing power, and there are many ways to choose from it. In addition to direct purchases, they can also be purchased through NVIDIA's partners in China. The GPU itself is very expensive, and the deployment, operation, debugging, and use after buying it are all costs. Previously, there was a saying in the industry that many scientific research institutions in China could not even afford to pay the electricity bill for A100.
The maximum power of a DGX server consisting of eight A100s is 65kw, that is, it takes 6 to run for one hour5 kWh, and at the same time with about the same amount of heat dissipation equipment. According to the average industrial electricity consumption per kilowatt-hour, 0Calculated at 63 yuan, the electricity cost of a server for a day (24 hours) is about 200 yuan. If it is 1,000 servers, the electricity bill for a day is about 200,000 yuan.
Therefore, it is difficult for startups to purchase and deploy GPUs on a large scale, except for large manufacturers.
GPU resources can also be rented, and A100 computing power services can be directly rented on cloud service platforms such as Alibaba Cloud, Tencent Cloud, or Amazon AWS. Rents have also risen quite a bit over the past year.
However, the reality is that many large model companies do not want to make large-scale investment in computing power. A number of investors who pay attention to AI told the "Caijing" reporter that once startups begin to deploy computing power, there will be two "problems", one is that there is no upper limit and no end point for this investment, and no one knows how far it will burn. OpenAI will still be down today because the computing power can't keep up. Second, the company will become an asset-heavy company, which will adversely affect the company's future valuation and will directly affect the returns of investors.
In 2023, many Chinese investors will directly tell large-scale model entrepreneurs to recruit some people with a background from prestigious universities first, seize the development conference, release large-scale model products, and then do the next round of financing, and don't buy computing power. Startups have received a lot of financing during the boom period, recruited people with high salaries, released high-profile products, and pushed up valuations. Once the wind has passed, it will be necessary to continue financing or go public, and then through the money raised before, it will be used to bid for projects at low prices or even at a loss, or directly invest abroad to consolidate income.
This may fall into a vicious circle: if you are unwilling to take the risk of high investment in computing power, it will be difficult to make breakthroughs in the field of large models, and it will be difficult to compete with those giants who really invest on a large scale in this direction.
Data and computing power are the foundation of large models, and in terms of data, China's large model industry faces the same problem as computing power: is it worth investing on a large scale?
In China, the general data acquisition threshold is very low, in the past, crawler tools were mainly used to collect data, but now you can directly use open-source datasets. China's large model is dominated by Chinese data, and the industry generally believes that the quality of Chinese Internet data is low. One AI company founder described it as a Google search or YouTube when he needed to search for professional information on the internet. On the domestic ** or app, it is not that there is a lack of professional information, but that there is too much advertising content, and it takes longer to find professional content.
The Chinese data used by openai to train large models is also the same as that of Chinese Internet platforms, but it has done a lot of extra work to improve data quality, which is not something that ordinary data labeling work can do, and requires a professional team to clean and sort out the data.
Previously, some AI entrepreneurs said that it is difficult to find relatively standardized data service providers in China, most of which are customized services, and the most important services are very expensive.
This is somewhat similar to the logic of whether to invest in computing power on a large scale, which does not seem to be a good deal for many companies, especially startups. If there is a large-scale investment, once the final model effect is not ideal, it is also a "waste", it is better to use open-source data training and directly develop the project.
In addition, there is a lack of effective data protection in the Chinese marketA person in charge of AI at a major factory said, "In China, the data you can get, others can also get it", "If you spend a lot of money to make high-quality data, others can get it at a very low cost, and vice versa." ”
The intermediate link of large models, including data processing, will be a relatively clear new development direction in 2024. No matter what kind of model it is, when it is implemented in specific application scenarios, it must be optimized and debugged with professional data, which has higher requirements for data processing, and also needs to be involved in model debugging and engineering optimization.
But if one of the links becomes a "new outlet" in the eyes of investors, that's another story.
The above three problems all point to a common direction: capital short-sightedness.
Although OpenAI has blazed a clear path, for the vast majority of companies, the cost and time required to make a mature large model from scratch is not much shorter.
For most investors, the purpose of each investment is clear: exit and make money. OpenAI is on fire, its valuation has been climbing, and it will continue to grow in the future. In April 2023, the company was valued at about $28 billion, and by December 2023, according to a report from the United States**, OpenAI's latest round of valuation may exceed $100 billion. This is a very certain signal in the eyes of investors, if the right investment in China's large-scale model start-ups, it can also achieve a valuation increase in a short period of time.
The patience of Chinese investors is only three to five years, which is determined by the mode of capital operation. Investors who raise funds from LPs need to exit within a certain period of time and get considerable returns. Investors can exit through mergers and acquisitions, listings, or selling their shares to new investors in follow-up financing.
Early-stage financing can rely on outlets and storytelling, but in the middle and late stages or even listing, it must have a certain scale of commercialization capabilities. Investors have found that the longer it drags on, the more difficult it will be for the project to be listed or merged, because the main business model in the AI field is to do B-end customized projects, which determines that it is difficult for startups to make high-growth revenues. Investors can only take advantage of the fact that the wind is still there, quickly promote the company to complete multiple rounds of financing, raise the valuation, and then even discount the shares in the hands of the **, it is also cost-effective.
This is also why in 2023, there will be an endless stream of large model-related press conferences, and various large model lists will bloom with different rankings, which are all "stories" that will help with financing. A similar path has already appeared in the AI industry a few years ago, and the representative companies at that stage are the AI Four Tigers. The large-scale model entrepreneurship in 2023 is just to accelerate the completion of the road that has been completed in the past three years in one year.
But short-sightedness is by no means a unilateral problem for investors. In today's business environment, where most people are looking for short-term, definitive results, the future of a decade or even five years from now seems uncertain.
In 2023, China's large-scale model industry will rapidly enter the stage of commercialization from competing for large-scale model parameters. At the CES (Consumer Electronics Show) in January 2024, two well-known AI scientists, Li Feifei and Ng Enda, both said that AI commercialization will develop significantly in the future, and it will penetrate into more industries.
At present, it seems that there are two main application directions of large models:The first is to provide new tools for C-end users through large model technology, such as the paid version of GPT4, the library reconstructed with the Wenxin large model, the new AI editing tool, the Wensheng diagram tool, etc. However, it is difficult for C-end payment to grow on a large scale in the short term, and there are relatively few people who have a rigid need for large-scale model tools.
A more promising commercialization direction is B-end services. In the Chinese market, doing B-end software services has always been a "big and difficult" business. A number of investors and industry insiders have mentioned that the largest B-end customers in the Chinese market are ** and state-owned enterprises, and the large model as an advanced productivity tool will have a direct impact on reducing manpower. In ** and state-owned enterprises, reducing manpower will become a resistance in many cases.
If you settle for the next best thing and choose small and medium-sized B customers, it may be difficult in 2024. An AI model entrepreneur said that he recently asked many enterprise customers, and the response was: "What can a large model do?" Can it help me lay off employees or can it help me make money? ”
To this day, even the most advanced large models still have the problem of "illusion", which is tolerable in C-end applications, but in some professional B-end scenarios, having "hallucinations" means that it is difficult to really land. In the past, compared AI, such as face recognition, if the recognition was wrong, the cost of manual assistance and adjustment was very low, but large models are good at "talking nonsense in a serious way", which is confusing.
But the big model is already being used in practice. Many industry insiders have mentioned that because of the emergence of large models, many problems that could not be solved in the past have new methods to solve, and the efficiency has been significantly improved. For example, the stitching large model mentioned above was rarely tried in the past, but now many AI companies have begun to put together multiple small models in different scenarios, so that when solving most similar problems, there is no need to train the model separately, and it can be directly called and used.
In addition, in some companies with large businesses, large models have also been used. Similar to the development of AI algorithms driven by AI vision technology in the last round, these AI algorithms are rapidly playing an important role in content recommendation, e-commerce, taxi-hailing, food delivery and other fields. Now, Tencent's game business, Alibaba's e-commerce business, and Byte's content business have all used large models.
In 2024, there will be several relatively certain trends in the development of AI large models:First, the financing enthusiasm has declined, and the number of companies completing multiple rounds of financing of hundreds of millions of dollars in 2023 will be significantly reduced, and large-scale model startups need to find a new way out. At present, it seems that large manufacturers are more capable of doing the work of large model infrastructure, and startups can consider adjusting the direction to fill the gap between the basic large model and the application.
Second, the application of large models will continue to deepen, but this will mainly be concentrated in areas with a high degree of digitalization and a very large business volume. On the C-side, large models will also be further popularized, but for Chinese companies, they cannot only rely on C-end users to pay, and other monetization models will be added to C-end application scenarios, mainly advertising.
Third, domestic computing power will be further valued, and getting attention does not mean that there will be significant progress in the short term, which is a long process. While the domestic computing power capacity is improving, there will be more opportunities to speculate, build momentum, and circle money.
The tuyere will stimulate the rapid expansion of the industry, and the bubble will be born, and the greater the opportunity, the bigger the bubble. Only by putting aside the bubble can we see new opportunities for industrial development.