Wen Qiuping. Edit: OK too.
Proofread Tina
Curated by eason
Recently, First New Voice and Tianyancha officially released the "2023 China AIGC Innovative Enterprise Series List", showing the industrial chain layout of generative AI from three dimensions: basic layer, model layer, and application layer. Among them, the model layer mainly includes general large models and vertical large models (scene, domain, and industry large models).
Currently, the base layer only has the head"Krypton players"to be eligible to enter the game, not among the fierce involution. The application layer is the "kaolin flower" that grows on the large model. As the basic model of generative AI, the large model provides it with powerful language processing capabilities and wide applicability. According to public information, as of October this year, 238 large models have been released in China. The "100 Model War" is fighting extremely hot!
First New Voice found from the process of list selection and research that the battle of domestic large models is gradually entering the second half. The focus of the leading technology enterprises has begun to shift from the general model to the vertical model such as the industry field, and has begun to take root downward.
For example, on October 31, Alibaba Cloud not only released the latest 2Version 0, and also launched eight major industry models;On September 21, HUAWEI CLOUD released the Pangu Medical ModelOn September 19, the first "industrial-level" medical AI model in China was officially released. It can be said that after "AI for Science", large models have begun to enter the stage of "AI for Industries".
In order to conduct in-depth research on the development direction and application effect of general large models and vertical large models, First New Voice specially interviewed Zhao Chao, an AI algorithm expert of Wofone Technology, Ji Daqi, CTO of Daguan Data, and Zhang Li, vice president of Yuncong Technology.
Since the beginning of the year, ChatGPT has detonated the enthusiasm of large models at home and abroad, and funds from all walks of life have flocked to it.
According to a related ** report, the number of pre-trained models on Hugging Face, the world's largest open-source community for large models, has increased from 100,000 to more than 300,000. I don't know if Open AI expected that there would be a grand occasion when ChatGPT was first released.
Returning to the domestic market, according to incomplete statistics from public information, as of the end of November 2023,There have been 200+ large models launched in China, and they are "falling into place" in all walks of life。Judging from the statistics, in addition to the general model, the landing speed in the financial industry is the fastestNearly 15% of the large models are financial vertical models.
In terms of the types of large model manufacturers, domestic Internet technology companies have entered the game, including large manufacturers such as Alibaba, Tencent, and Huawei, manufacturers perpendicular to the AI field such as iFLYTEK, SenseTime, and Megvii Technology, as well as large model start-ups such as Zhipu Huazhang, Baichuan Intelligence, Daguan Data, and other vertical industry enterprises, as well as vertical industry enterprises such as finance, automobiles, education, smart home, and consumer electronics, which are also based on artificial intelligence technology and data accumulation capabilities in vertical fields. (Click on the "2023 China AIGC Innovative Enterprise Series List" to view the list of domestic general large model and vertical large model enterprises).
It is worth noting thatIn the first half of this year, everyone's attention was mainly focused on the number of parameters and effect optimization of large models. Starting in the second half of the year, the focus shifts to how to put it in practice and how companies can use their capabilities to deliver revolutionary efficiencies. After half a year of practice, the three companies interviewed by First Xinsheng have gradually explored the development path of large models with their own characteristics.
For example, Wofeng Technology launched the "Yuanxin Large Model" in April this yearIts solution absorbs the capabilities of the general large model, and conducts industry knowledge training on the basis of 8 years of experience in the field of marketing + service, transforms the general large model into industry experts, and can build an exclusive knowledge base based on enterprise information. At present, Wofone Technology has successfully applied the large model to its four major product lines: UDESK, Gaussmind, ServiceGo, and Weifeng.
Zhao Chao, an AI algorithm expert at Wofone Technology, said: "Large models have a huge demand for computing power and data, and Wofone Technology has accumulated a large amount of online text, text and voice data since its establishment. Based on the existing data, the company plans to iterate the model for industries or specific scenarios. To this end, the team adopts the industry open source model and uses the data accumulated in the customer service industry to optimize and innovate the model to better meet the needs of the industry and improve the application effect in specific scenarios. ”
In the iteration of the full parameters of the large model, some skills and language problems will be encountered, so Wofone Technology has adopted two training strategies. One is to fix a part of the parameters and iterate only on the rest. The second is to iterate on the basis of the general large model.
Cloudwalk Technology officially launched the "Calm Large Model" in MayThe biggest feature is that Cloudwalk has a multi-modal series of large models, and has the ability to adjust industry large models, which can help customers deploy models according to the needs of industry scenarios and achieve the best cost performance. In July, Cloudwalk and Huawei jointly released the "Integrated Solution for Training and Pushing Large Models". The solution is based on cloud-based large model algorithms and tools, allowing users to easily train, build, and manage their own large models.
Regarding the prosperity of the domestic market and the company's plan for large models, Zhang Li, vice president of Yuncong Technology, said to First New Voice: "In fact, the company has made technical reserves in the field of large models two years ago. Due to the fact that the chips and computing power have not reached a high level, the large model cannot give full play to its effectiveness and efficiency. Last year, the performance of NVIDIA-led GPU chips has been significantly improved, especially the parallel computing power, which makes the training of large models more industrialized and possible, which has promoted the vigorous development of the large model industry and market this year. ”
The "Cao Zhi" large model launched by Daguan Data is the first batch of domestic GPT large language models dedicated to vertical industries in China, independent and controllable, with long text, verticalization and multilingual characteristics, good at long document writing, review, translation, etc.
For a long time, Daguan Data has focused on the field of TOB and has accumulated deep professional experience in industries such as finance and manufacturing. The landing route we take is to introduce large models into the original products to provide customers with more valuable services. For example, in the past, Daguan's intelligent text processing platform IDPS was mainly biased towards text extraction, which required complex steps such as annotation, training, and tuning to achieve results. However, the large model can now be used to achieve automatic extraction without labels, which significantly reduces the delivery cost. Let the enterprise truly reduce costs and increase efficiency. Ji Daqi, CTO of Daguan Data, said.
First New Voice found through exchanges with three interviewed companies and previous researchAt present, there are three common basic application scenarios for large models: First, enterprises want to use large models to directly generate articles, designs, etcThen you can use GPT or other open-source large models with a little fine-tune (fine-tuning), and the follow-up work is mainly for front-end page design, without too much model iteration.
Second, enterprises want the large model to reflect the attributes of the enterprise when providing services, such as answering questions related to the enterprise。In this case, it is also difficult to quickly iterate a unique model for each enterprise, and the situation of the enterprise is changing at any time, and the corresponding model needs to be constantly adjusted. Therefore, it is feasible to combine the enterprise knowledge base with the large model.
Of course, there are also companies that have confidentiality needs for their knowledge base and are reluctant to provide it to external models. In this case, you can also deploy based on the model you trained. There are usually two deployment methods: one is to use the enterprise knowledge base to iterate on the basis of the enterprise's own model, and the other is to strengthen the understanding of the large model through RAG (RAG: retrieval-augmented generation retrieval enhancement generation), and then combine it with the knowledge base. The most direct advantage of RAG is that it allows large models to use their own logical derivation capabilities to understand enterprise private data and expand their Q&A capabilities.
Third, data analysis is also a common scenario for some enterprises. Traditional report configurations are complex, and when there are many reports, finding a specific report is time-consuming. Through the natural interaction mode of the large model, users can directly ask questions and realize intelligent data query. This interactive way of analyzing data is intuitive and efficient, and users can quickly get the information they need, which greatly improves the user experience.
Both the general model and the vertical model have their own unique capabilities, and they are complementary.
Because the general large model has strong language understanding capabilities, it can broaden the breadth of the application range, while the vertical large model is aimed at specific industries or needs, and can better meet the actual requirements in terms of accuracy and depth. These two are not opposites, but mutually supportive and synergistic development. In the future, the two types of large models will coexist and become the key to empowering thousands of industries.
Ji Daqi also agrees with this point of view," he saidThe general model and the vertical model need to have stronger generalization, while the vertical model must maintain high accuracy in the application of vertical industries.
Referring to the landing space of general and vertical models, he believes that one of the core differences is in terms of customer needs, and customers of different levels and sizes have different requirements for large models. For example, in TOC or small and medium-sized B-end enterprises, customers have lower requirements for the effect of the model, but pay more attention to cost control. As a result, they may choose to use a generic large model to solve some of the problems in order to achieve above-the-ordinary results at a lower cost.
However, for some large B-end customers, the ability to improve performance can greatly bring significant impact and value to their business, so they are willing to invest more costs. These customers may choose to train a large vertical model or take advantage of a professional vertical large model service like Daguan Data to get better results. In this case, the customer's focus is not only on cost, but also on how to achieve the best business results.
Therefore, in the application of large models, it is very important to flexibly choose the model strategy suitable for specific business scenarios.
Zhao Chao also said that the iteration cost of general large models is high and requires a lot of computing power support. On the contrary, vertical large models have lower decision-making costs and require less computing power. However, the root of the vertical large model is always in the general large model, which is usually trained based on the general large model by SFT supervised fine-tuning and other methods. In addition, if the basic capabilities of the general model are strong, the tuning cost of the vertical model is relatively low.
When verifying algorithms and strategies, the effect can be verified because the vertical large model can be iterated in a relatively short period of timeTherefore, enterprises usually prioritize verification and tuning on vertical models, and then apply the experience to the general model after the verification is completed, so as to improve the capabilities of the general model. After the general model is effectively improved, the industry model is iterated. It is a spiral cyclical process, which promotes the development of vertical models and general models to learn from each other and complement each other, rather than in the direction of single exclusion.
Zhang Li said that from the perspective of industry application, the general model is not a product, but a capability. If a business wants to purchase this capability, it usually needs to meet three conditions. “First, it is necessary to have sufficient financial reserves. Second, it is necessary to have the data and know-how accumulation of the industry's own model. Third, it is necessary to have corresponding technical capabilities. Understand the underlying principles of large model technology and how to train a model that meets their needs, and the flexibility of this ability allows customers to better leverage large model technology to meet their domain-specific needs. ”
In addition, Zhang Li also emphasized that the landing application of large models cannot be two burdens and one hot, depending on both ends. On the one hand, the first side must have the accumulation and ability of vertical industry landing models;On the other hand, the demand side needs to figure out what problems it needs to solve and what goals it needs to achieve with the big model.
StillIn Zhao Chao's view, custom models may have higher value in vertical industries, which is mainly manifested in two aspects:First, the vertical industry model can better meet the specific needs of enterprisesto create more business opportunities for enterprises. Second, the use of different large models will bring significant cost differences. Therefore, enterprises can choose to optimize training on large models, compressing large models with billions of parameters into vertical models with hundreds of millions of parameters.
One possible solution is to annotate the data with a large model and then train it with a smaller model. In this way, it can not only provide enterprises with the excellent effect of vertical models, but also reduce the threshold for the use of hardware resources, thereby reducing the cost burden of enterprises to a certain extent. By fine-tuning the size of the model parameters, it is possible to meet the needs of specific industries and achieve higher economic efficiency in resource utilization. This strategy helps to provide enterprises with a more flexible and sustainable approach to the application of the model. Zhao Chao said.
In the future, giant companies such as Unilever, McDonald's, and Coca-Cola are likely to train their own large models. Zhao Chao believes that although this is a large private model from the outside, in fact, one training method is to use the enterprise's own large amount of data to train a complete model. Another approach is to use a vector database strategy, where the internal data is converted into vectors, and then the vectors are processed to obtain a smaller model that can be used in conjunction with the larger model. This approach allows for the ability to train the model separately and at a lower cost. "From the customer's side, the output model has enterprise characteristics and characteristics, but from a technical point of view, the essence is the superposition of large models and small models. ”
He also thinksIn the future, this method of "large model + small model" may become the mainstream landing method to a large extent in the actual application process. Because frequent iterations of the underlying model are difficult and require high computing power. Unless it is for technical research, buying a large amount of computing power is likely to cause a waste of resources, and the benefits are not obvious.
The application of large models is inseparable from the support of computing power, data and algorithms. This means that small and medium-sized enterprises or enterprises with insufficient computing power will have a high threshold for applying large models.
First, in terms of computing power, enterprises can try to increase the number of iterations and improve the convergence speed of the model without increasing the cost of hardware。At the same time, the computational complexity can also be reduced by converting floating-point numbers to fixed-point numbers and preprocessing large-scale matrix operations. These methods can effectively save computing resources and improve the training efficiency and overall performance of the model. In fact, some breakthroughs have been made in matrix operation, for example, a fast calculation method for super-large matrices has been proposed in the academic community, which is dozens of times faster than the traditional row and column calculation method.
In terms of computing power, Zhao Chao's view is that, on the one hand, enterprises with insufficient computing power can consider using small-scale computing power to do experiments to verify the application effect of large models. This is also one of the optimization directions to be considered within enterprises and academia. On the other hand, few-shot learning (small-shot learning Xi and zero-shot learning (zero-shot learning Xi) are currently the most popular large-scale model training technologies. They can demonstrate strong Xi and reasoning skills in situations where data is insufficient. In this way, enterprises with insufficient data can effectively apply large models to optimize performance. With these two methods, continuous optimization and innovation can promote the wide application of large model technology.
Second, in terms of algorithms, it is also necessary to explore structures and methods that are more suitable for large models. At present, most of the large models are built on the Transformer model proposed by Google. However, the Transformer model is not necessarily the best choice. For example, some researchers have introduced other structures such as resnet (deep residual network) on the basis of the Transformer model, and have achieved good results in the field of images. Therefore, the innovation and optimization of algorithms is still a promising direction.
Third, in terms of data, we need to consider how to improve the quality and applicability of data. With the growth of Internet data, the types and forms of data have become more diverse and complex. For unstructured data, it needs to be structured in advance to facilitate the Xi learning and understanding of the model. At the same time, the data needs to be cleaned and filtered to remove noise and useless information.
All of the above paths can effectively improve the validity and reliability of the data, thereby improving the generalization and adaptability of the model.
Regarding the future development of large models, Zhang Li's view is that the development of large model technology will shift from R&D-driven to eco-driven, which is an inevitable trend. Customers' needs for large models will become more and more complex, and large model manufacturers cannot directly solve all customer problems, nor can they have a comprehensive and profound grasp of the know-how of all industries. Therefore, the landing application of large models needs to be supported by professional information service companies in various industries.
This cooperation model can more effectively respond to the professional needs of different fields, so that the application of large models can penetrate into various industrial chains more quickly and deeply. Moreover, through close cooperation with information technology companies, large model manufacturers can also build an ecosystem to make the development of large models more comprehensive and sustainable. Zhang Li said.
Although the development of large models is currently very active and lively, there are still two major difficulties in the actual implementation.
Difficulty 1: How to find the right application scenario?
Ji Daqi said,In order to make the large model technology truly land, it is necessary not only to rely on the large model itself, but also to consider the intermediate implementation process and the path to the last mileThat is, to design a suitable product form, choose the best cost performance, control the cost of machine resources, and finally find the best landing effect. Therefore, it is necessary to have professionals who understand both large models and the industry to solve this problem together.
One of the main problems in the TOB industry is the increasing difficulty of supervision. On the TOC side, it also has to face regulatory requirements such as filing. In the traditional Internet era, it is relatively easy to review text content, and some problematic content involving ideology can be discovered and dealt with in a timely manner. However, large models make regulation significantly more difficult. Therefore, in the process of implementation, how to carry out effective supervision has become an urgent problem to be solved. Failure to do so may result in misuse, misuse, or other potential legal issues. While solving regulatory problems, we also need to think about how to let more people benefit from the application of large models. In a word, how to ensure the balance between reasonable regulation and promoting social benefits is a key issue that the whole industry needs to seriously consider and solve.
After the customer provides the data, the engineering team of Daguan Data will process it according to the specific situation, and this step is actually quite smooth. But the more difficult problem is how to combine large models to give full play to the value of data and empower enterprises to achieve clearer business goals. This requires a clear business strategy, defining the functionality and features of the product, and ensuring that the entire process effectively meets the needs of the customer. Ji Daqi emphasized.
Therefore, the challenge for all companies today is to think strategically about the application of large models and to translate these thoughts into concrete product design and implementation steps. Solving this challenge requires a combination of data science, business insights, and technical expertise to form a comprehensive and actionable solution. Ultimately, through deep strategic planning and clear product design, the potential of data and big models can be better leveraged to achieve more targeted and effective business outcomes.
Nowadays, the focus is not only on how to develop great large models, but also on how to apply those models better. This needs to consider the level of the solution, especially the user experience, and not just be limited to applications like OpenAI's chat capabilities, or just solve problems like search engines.
Current and future trends also indicate that people want to apply AI in more scenarios and use it as an underlying platform. This requires enterprises to innovate from 0 to 1, and constantly find some scenarios that are suitable for landing and can be promoted on a large scale, so as to have more inspiration and methods for landing, and enhance everyone's confidence in this field. I believe that there will be a lot of large-scale models next year.
Difficulty 2: It is difficult for strategic planning and software and hardware facilities to be perfectly compatible.
Zhang Li explained that there are five factors that cause this difficulty: First, the customer's goal is not clear, which leads to the inability to achieve the expected effect.
The second is that many customers do not have enough understanding of the large model, and mistakenly think that this is a mature product that can be used out of the box after buying.
Third, even if the first two problems are solved, a detailed implementation plan has been formulated for the customer, and the application of the large model in the customer's enterprise will be promoted in stages. However, over such a long period of time, no one can guarantee that the customer's strategic goals will changeThis involves the stability and sustainability of the customer's strategic layout on the large model.
Fourth, the landing of the large model must be a two-way process. The customer is the protagonist, and the technology company is the "coach" positioning, responsible for accompanying and guiding the customer forward. However, due to the high requirements for the technical capabilities of enterprises using large models, and the traditional informatization capabilities of many customer technology departments, customers eventually rely entirely on technology companies, making technology companies change from "coaches" to protagonists, and the relationship is misplaced. This is seriously problematic because the goal of technology companies is to empower multiple industries and cannot focus on just one customer.
Fifth, the application of large models in the vertical market not only considers the model capabilities, but also considers the hardware configuration, but it is impossible for customers to completely replace the original hardware, subvert their original systems, and more importantly, consider the integration with the original systems. This requires engineering and integration capabilities to help customers reasonably integrate large model technology and existing resources. This involves compatibility issues with the original system, software, database, hardware, etc.
In the face of the above problems, Kei's view is that people should reach a consensus on two points. First of all, in the future, only a few manufacturers may have the ability to provide high-quality underlying general large models, and vertical large models and their industrial applications will usher in a lot of opportunities and competition. In the future, multiple large models may be combined at the same time to solve various problems within the enterprise. Second, the goal of enterprises is to use AI to solve problems, not simply combine with AI. As a result, companies need to think about how humans and machines can work better together and solve problems as a starting point. It's not about chasing big models for the sake of using them.
Zhang Li also holds the same position, she believes that when using large models to solve fundamental problems, it is necessary to focus on the effective combination of technology and industrialization. The focus of large model vendors should also be to build model-based applications or products to meet the actual needs of customers, rather than using large models for the sake of promoting large models. If you find that a large model is not up to the task, Cloudwalk can switch to other large models, even open source models. The goal is always to work together to solve the real problems faced by our customers.
Many applications in the past may not have been as good as they could have been from the user side, but the introduction of large models can make them even better, better understand user needs, and achieve a higher degree of automation. Rather than disrupting all the applications today, the company is adding the power of large models to it. Reduce costs or improve training efficiency through cloudification, and quickly industrialize this technology, so that more customers can enjoy the advantages of large models at a more reasonable cost。Zhang Li added that in the process of AI implementation, large models should be human partners, not substitutes.