99 of the industry models are likely to be replacedDialogue with Baichuan Intelligence Wang Xiaochua

Mondo Culture Updated on 2024-01-30

Wen Hao Xin

Edited by Liu Yuqi

99% of the industry's large models may be replaced", Hong Tao, co-founder and co-president of Baichuan Intelligence, shocked the four seats.

In the 100-model war, the industry model has always been the focus, and the reason boils down to two points, one is the fast combination with the technology and business of R&D manufacturers, and the other is that the demand is clear, and the actual landing speed and commercialization far exceed the general large model.

However, such large industry models usually have to be fine-tuned and fine-tuned, and the disadvantages are obvious, such as long training time, high deployment costs, and enterprise data privacy issues.

Based on this, the optimal solution is being searched for at home and abroad, and two exploration paths have been formed:

A kind of database company represented by Pinecone and Zilliz, a vector database route with fire;One is the RAG (Retrieval Enhanced Generation) route led by OpenAI.

If we use a figurative metaphor to explain, the difference between fine-tuning, vector database and RAG is differentFine-tuning a large model is like a child going from elementary school to college or even graduate schoolVector databases and RAGs are more like open-book exams, where you don't need to learn to understand to give answers.

In short, on the basis of not changing the model, the vector database and RAG use some "plug-in" means to improve the accuracy of the application of the large model, so as to make up for the illusion, poor timeliness, lack of professional domain knowledge and other shortcomings of the large model itself.

Although it is a two-path choice, the vector database and RAG are not completely opposed, the vector database needs to be retrieved, and there is also a vectorization stage in the RAG process, but the emphasis is different.

In China, Tencent has focused more on the direction of vector databases, and has elevated it to a strategic position"The large model is the computing engine, what changes is the calculation method, and the storage needs a vector database".

In December, Baichuan Intelligent opened the Baichuan2-Turbo series API based on search enhancement, combined with the two routes of RAG and vector database, and played a set of "large model + ultra-long context window + search enhanced knowledge base".

Wang Xiaochuan, founder and CEO of Baichuan Intelligence, also gave his own conclusion:"Large model + search augmentation is a new computer in the era of large model, the large model is similar to the CPU of the computer, and the real-time information of the Internet and the complete knowledge base of the enterprise together constitute the hard disk of the large model era."

Figure: Experiments have shown that the RAG+ large model is more effective than the finely tuned large model, from Microsoft**).

In all respects, search enhancement is more cost-effective than fine-tuning industry models," says Wang.

Light Cone Intelligence had a conversation with Wang Xiaochuan at the communication meeting to gain an in-depth understanding of how to think and make breakthroughs in technology as an early enterprise that chose the RAG and vector database routesAnd how to land in industry applications?

The core points are as follows:

1. Search enhancement is the first step, even the most critical step, for large models to become practical.

2. The large model + search constitutes a complete technology stack, which realizes a new link between the large model, domain knowledge, and the knowledge of the whole network.

3. Large model + search enhancement is a new computer in the era of large model, large model is similar to CPU, Internet real-time information and enterprise complete knowledge base is hard disk.

4. Avoid projectization, replace projectization with productization, and realize low-cost customization of enterprises with customized capabilities.

5. The evolution of China's large-scale model technology is much faster than imagined, and the catch-up direction is mainly concentrated in the field of text.

Q: Before RAG was proposed, what means did the industry have to solve the defects of large models?

Wang Xiaochuan:The industry has explored a variety of solutions, including scaling parameters, extending the length of context windows, connecting large models to external databases, and training or fine-tuning large vertical industry models with specific data. Each of these routes has its advantages, but they also have their own limitations.

For example, although the continuous expansion of model parameters can continuously improve the intelligence of the model, it requires the support of massive data and computing power, and the huge cost is very unfriendly to small and medium-sized enterprises, and it is difficult to solve the illusion and timeliness of the model by relying entirely on pre-training. Therefore, the industry urgently needs to find a path that integrates many advantages to effectively transform the intelligence of large models into industrial value.

Q: The concept of "search enhancement" proposed by Baichuan Intelligence is very consistent with the RAG technology idea of Dahuo, how to understand "large model + search"?

Wang Xiaochuan: Large model + search enhancement is a new computer in the era of large model, and large model is similar to the CPU of a computerInternalize the knowledge inside the model with pre-training, and then generate results based on the user's prompt;The context window can be thought of as the computer's memory, storing the text that is being processed at the moment;The real-time information of the Internet and the complete knowledge base of the enterprise together constitute the hard disk in the era of large models.

Based on this technical concept, Baichuan Intelligent takes the two major models of Baichuan as the coreThe search augmentation technology is deeply integrated with the large model, combined with the previously launched ultra-long context window, a complete technology stack of large model + search enhancement is constructed, and a new link between the large model and domain knowledge and the whole network knowledge is realized.

Q: How can I solve the existing problems of large models through search enhancement?

Wang Xiaochuan:Search augmentation can effectively solve the core problems that hinder the application of large models, such as hallucination, poor timeliness, and lack of professional domain knowledge. On the one hand, the search enhancement technology can effectively improve the performance of the model, and enable the large model to be "attached to the hard disk" to realize the "omniscience" of real-time Internet information + complete knowledge base of the enterprise.

On the other hand, the search augmentation technology can also allow the large model to accurately understand the user's intent, find the knowledge most relevant to the user's intent in the massive documents of the Internet and professional enterprise knowledge base, and then load enough knowledge into the context window, and further summarize and refine the search results with the help of the long window model, so as to give full play to the context window ability and help the model generate the optimal results, so as to realize the linkage between various technical modules and form a closed-loop powerful capability network.

Q: On the technical path, how is "large model + search" realized?

Wang Xiaochuan:On the basis of the long context window and the vector database, the vector database is upgraded to a search-enhanced knowledge base, which greatly improves the ability of the large model to obtain external knowledge, and combines the search-enhanced knowledge base with the ultra-long context window, so that the model can connect all the enterprise knowledge base and the whole network information, which can replace the personalized fine-tuning of most enterprises, so as to solve the customization needs of 99% of the enterprise knowledge base.

However, there are many technical difficulties in the implementation process. In terms of search augmentation, the expression of user needs is not only colloquial and diversified, but also related to the upper and lower levels, so the alignment of user needs (prompt) and search has become the core problem in the process of obtaining external knowledge for large models. In order to understand user intent more accurately, Baichuan Intelligent uses a self-developed large language model to fine-tune the understanding of user intent, which can convert the user's continuous multi-round, colloquial prompt information into keywords or semantic structures that are more in line with the understanding of traditional search engines.

Baichuan Intelligence also refers to Meta's COVE (Chain-of-Verification Reduces Hallucination in Large Language Models) technology to split the complex user problems in real scenarios into multiple independent sub-structure problems that can be retrieved in parallel, so that the large model can conduct targeted knowledge base searches for each sub-problem and provide more accurate and detailed answers. At the same time, through the self-developed TSF (Think Step-Further) technology, Baichuan Intelligence's knowledge base can infer the deep problems behind user input, more accurately understand the user's intention, and then guide the model to answer more valuable answers, providing users with comprehensive and satisfactory output results.

Q: What is the level of testing and running effect of large model + search?

Wang Xiaochuan:Through the method of long window + search enhancement, on the basis of the 192k long context window, Baichuan Intelligent has increased the original text scale that can be obtained by the large model by two orders of magnitude to 50 million tokens. And it has passed the "needle in a haystack" test, which is recognized as the most authoritative long text accuracy test for large models in the industry, and can achieve 100% answer accuracy for requests within 192k tokens.

For document data above 192k tokens, Baichuan Intelligence combines the search system to extend the context length of the test set to 5000W tokens. The test results show that the method of sparse retrieval + vector retrieval can achieve 95% answer accuracy, and can achieve close to the full score of the whole domain even in the dataset of 50 million tokens, while the simple vector retrieval can only achieve 80% answer accuracy.

Q: In the process of promoting the implementation of 2B, what problems did Baichuan Intelligent find in the industry model?Why can't the industry model be promoted?

Wang Xiaochuan:Although the industry model was born in response to the needs of the industry, the current situation is that the concept is very hot, but there is no good practice, and it is facing many difficulties.

The concept of L0 and L1 has been proposed in the industry, L0 is the standard model, and L1 refers to the transformation of vertical domain data on it. There are two methods for common transformation, one is SFT (Note: supervised fine-tuning, usually used on pre-trained large language models) and the other is Post-Train (Note: parameter tuning, compression, and deployment stages after model training. Although the difficulty of SFT has decreased by 1-2 orders of magnitude, it is still difficult to implement the technology, and the talents of the model company are still needed. For enterprises, this is a huge challenge and resource consumption, and once it starts, it needs the support of GPU computing power, and the cost of training instead of inference is very high. Despite the large investment, the training model is like "alchemy", which cannot guarantee the effect and may decline. Again, once the data or algorithm is updated, the company has to retrain again. When the data changes, real-time data needs to be introduced, and the model base needs to be upgraded, the previous training will be completely zeroed out and have to be restarted.

We don't completely deny the idea of making industry models, but we still think that search enhancement can replace industry models in most scenarios.

Q: Why can search enhancement replace the industry model?Search enhancement is the key to adoption

Wang Xiaochuan:Everyone is calling for large models to be practical and landed, but today, especially from a domestic point of view,Search enhancement is the first step and even the most critical step for large models to become practical, and large models without search enhancement cannot be implemented in enterprises.

After using the knowledge base and search enhancement, directly hang the system on, plug and play, and the "hard disk" can be used on it, and the stability of the search will be much better, avoiding the original post-train or SFT when the reliability and stability are not enough, now whether it is used for vector retrieval, or sparse retrieval can be greatly improved. After all, as mentioned earlier, the original knowledge base was dragged in, and after the training was completed, as long as the data was updated, it had to be retrained. Now use the "hard disk" hook method to plug and play, to avoid the original model upgrade, the model is separated from your system, the model upgrades the model, and the hard disk upgrades the hard disk. Compared with the existing training industry model, the method of search enhancement + large model will bring great advantages.

Q: What industries can search enhancement leverage?What new changes will it bring?

Wang Xiaochuan:After solving the illusion and timeliness problems, the large model + search enhancement solution effectively improves the usability of the large model and expands the fields that the large model can cover, such as intelligent customer service, knowledge question and answer, compliance and risk control, marketing consulting and other scenarios in finance, government affairs, justice, education and other industries.

One is a large amount of text data, there is text data, and the know how of the text needs to be processed, and the second is dealing with customers, he needs to communicate with customers, such as customer service scenarios, or answering customer questions, these two scenarios are relatively concentrated, giving full play to the two advantages of large models, and having the ability to supply unlimitedly.

Q: What is the stage of commercialization of Baichuan Intelligence?How to think about the relationship between customization and productization?

Wang Xiaochuan:In the communication of commercial clues, Baichuan Intelligence found that many customers wanted to understand the large model in the early stage, and many people came to ask what the large model is and what it can do. In the past two months, the customer's problems have become more and more specific, and there are already some scenarios where I feel that large models can be used. But it's more painful to solve, and the most basic thing is fine-tuning, and a ruthless SFT and post-training will mention, but these are actually very heavy. The purpose of doing this now is to tell customers that I can quickly implement your practical application, so now whether it is a privatization scenario or an API scenario, many customers are communicating, and the product we released this time is to solve their problem.

The so-called customization, customize, more precisely, personalization, customers are born with personalized needs. What Baichuan hopes to avoid is projectization, replacing projectization with productization, which means that the product has the ability to be customized and can realize the low-cost customization of the enterprise.

The core is still cost, the customer cost is high, and the project profit is low. Relatively profitable 2B companies mostly sell products, and most of the customization is projects. Search for an enhanced complete technology stack, the purpose of which is to customize the API plug-in enterprise knowledge base, is a product, configurable and adjustable. We also want to do it in a portfolio way when we do customization for our private customers, rather than doing it all over again.

Q: As a witness of the wave of large models, Baichuan Intelligent looks back on this year and what stages have it experienced?

Wang Xiaochuan:China is now divided into three phases.

The first stage is the panic phaseAfter OpenAI released ChatGPT, Chinese companies have not yet done it, and the data flywheel United States ran first, and at that time everyone was discussing whether AGI was coming;

The second stage is the investment periodFor example, when I started to do Baichuan Intelligence, everyone started to move, and people kept joining in, and all the focus was on the big model;

The third stage is the high-speed iteration periodWhether it is capital, academia or industry, we can see new progress every day, our technical personnel are keeping up with the latest things every day, allowing themselves to continue to iterate and improve, the development speed of the industry is actually beyond the perception of the outside world and the capital circle, and it is still iterating rapidly.

Q: From a technical point of view, what are the characteristics of China's large-scale model update and iteration?

Wang Xiaochuan: First of all, China's large-scale model technology is evolving much faster than expected. At the beginning, everyone felt that the advantage of the United States was particularly obvious, and we could not catch up. But later, after various large models, including Baichuan Intelligence, came out, they found that they were better than GPT-3 in some scenes5 or even 4 is a little better, this is a fact that has happened. For example, Baichuan Intelligence, the first model in June, the second model in July, and 50 billion parameters in August have been moving forward, and it is still an alternative product in the United States in the field of open source.

The second characteristic is that the direction of domestic catch-up is still concentrated in the field of texts. Texts represent the level of intellectualization, and we believe that companies that put texts first in the pursuit of intelligence are moving in a long-term direction. GPT to GPT-4 has only begun to have GPT-4V multimodality, so those companies that consider audio, image, and ** are not competing in one direction at this time.

I estimate that Chinese companies will have the opportunity to overtake in corners in the future, because I think that catching up with the direction of text and improving the intelligence of large models are the things that the industry should pay the most attention to, including long windows and multimodal with large parameters (today's so-called swarm intelligence) are all working in this direction. Although there is not only one way to multimodality, but multimodality is the closest thing to application, and when China plunges into the direction of application, it can be promoted by smaller multimodal models.

Wang Xiaochuan:

Related Pages