In 2023, the most important thing in the global technology field is a new round of artificial intelligence model competition. With the release of ChatGPT by OpenAI, domestic large models have sprung up like mushrooms after a rain, and the grand occasion of the "100 model war" has attracted global attention, and the development speed of large models is changing with each passing day. However, large models have long been upgraded from a single technology competition to the competition of the entire system ecology, and the key to victory lies in who can lay a solid foundation and system layout to better meet the AI-driven "computing power era".
Recently, at the AICC 2023 Artificial Intelligence Computing Conference, Yang Jing, founder & CEO of New Zhiyuan, spoke with the Roundtable Forum on the theme of large model innovationChen Yujun, head of Loop Intelligence AI, Zhang Jiajun, researcher of the Institute of Automation, Chinese Academy of Sciences, and Wu Shaohua, director of Inspur Information Artificial Intelligence Software R&DFocusing on the opportunities, pain points and ways to break the game in the era of large models, the collision of ideas.
The guests at the meeting pointed out that the development and application of China's large model are still in the exploratory period, and it is necessary to innovate in algorithms, data, and computing power, and feed back the large model through user feedback and practical innovation, so as to further consolidate the basic large model technology and promote the scenario-based implementation of the large model.
The following is a transcript of the roundtable Q&A:
Yang Jing: The 100 model battles show their own powers, and the players at the head combine their own advantages to lay out the large model, and I would like to ask you to share your respective large model business layout.
Chen Yujun: "Improve the ability of large models to write long texts and reduce the problem of illusions".
Based on ToB's service experience, we hope to make the long text capability of the large model as valuable as possible for various enterprise applications, while reducing the illusion problem of the large model. Our business is also based on how to improve the two capabilities of large models, and hope to use these two features to produce some better large model applications for enterprises.
Zhang Jiajun: "Build a multi-modal large model to solve practical problems".
We have not yet established an engineering entity, and our business layout is more TOB and TOG. In addition, we are working on multi-modal large models, which can be more easily implemented in industry scenarios. We do not emphasize that it is a large language model or multi-mode general, but to solve practical problems in actual scenarios, through our exploration, it is indeed better than the previous cost reduction and efficiency increase, and the problems that could not be solved before can be solved.
Wu Shaohua: "Build a basic model to help the industry land".
Inspur Information has always invested its energy in the innovation of basic models, and comprehensively empowered developers at the application layer and Metabrain ecological partners to reach the final users. We believe that in the current field of large models, only by truly improving the capabilities of basic models can large models truly solve the problem of fragmentation at the industry application level and better support the implementation of industry scenarios.
Yang Jing: What is the technological breakthrough in the development of basic large models?How should we break the game in order to create a high-performance large model and catch up with GPT4 as soon as possible?
Wu Shaohua: "Double innovation of algorithm data to create an internal flywheel".
If you want to approach or even surpass the capabilities of GPT4, you must consider both algorithms and data. The first is the algorithm, which cannot blindly use the Liama structure or the Transformer structure without making any innovation. The second is data, OpenAI's data flywheel effect is very significant, and they can collect a lot of real feedback from actual users through various channels. In this case, the core of wanting to approach or even surpass GPT4 is innovation, especially in terms of algorithms and data.
Based on source 2In order to obtain high-quality Chinese math data, we cleaned about 12 pb of data from 2018 to the present, and finally only obtained 10 gb of Chinese math data. And this is less than 10GB of data, and there is a lot of room for quality improvement. In this regard, we chose to use large models to synthesize data, built internal data cleaning tools and platforms, and used tools to quickly obtain very scarce data and high-quality data internally.
Zhang Jiajun: "Follow can not be surpassed, bold innovation, bold attempts, and use expertise to solve practical problems".
In the process of catching up and surpassing GPT4, we are faced with the problem of not knowing its algorithm and not knowing what data is used, which makes it impossible for us to follow and fully validate GPT4, and can only surpass GPT4 on some datasets and in some capabilities, without a comprehensive and recognized metric. So following it can never be surpassed, I think it should be innovation. On the one hand, it is necessary to innovate from the level of data proportioning, and on the other hand, from the innovation of model algorithms, bold attempts and bold changes in the model structure, and following will not solve the fundamental problem.
It is worth noting that we do not necessarily need to achieve the ability of GPT4 to apply the technology to practical scenarios, for example, we solve the problem of modal understanding, controllability, security, and many fields can be used, but there is no ability to achieve GPT4.
Chen Yujun: "Start with the end in mind, strengthen user co-creation, find model limitations, and achieve innovative breakthroughs".
There is no real breakthrough point for large models, for example, before the large model technology, Google's machine translation did a very good job. Different problems will have different critical points, and we need to analyze them in detail, some problems may be directly solved by large models, while others may require long iterations. Secondly, we should not only pursue how to catch up with or surpass GPT4, but also think about how to make the large model understand human intentions and let the large model really help us complete the task.
We look at this problem from the end in mind, and find that the current model has many limitations, such as not supporting long texts, hallucinations, unstable semantic understanding output, including GPT4. On the one hand, we have incubated a TOC company and co-created with all users who use the model. On the other hand, we also co-create with our B-end partners extensively, so that the model can generate value on partners. We believe that only by using the model as much as possible can we know the limitations of the model and make innovative breakthroughs.
Yang Jing: At present, heap computing power has been considered an effective means to drive the evolution of large models, but the shortage of computing power has become a common problem in the industry
Chen Yujun: "Achieve the best possible training effect with as little real data as possible".
Computing power shortage is now a common problem, and OpenAI will also have a computing power shortage. What we can do is to be able to deal with this problem through innovation in algorithms and data with as few resources as possible. We can achieve the best possible results with as little real data as possible, and we can achieve similar or better results with a lot of computing power saved. Algorithmically speaking, we use very good training methods to achieve the best possible results with the least amount of computing power required by the model.
Zhang Jiajun: "Computing power is an important factor, but the shortage of computing power will not hinder innovation".
It is a recognized fact that the performance of the model trained by OpenAI using large computing power is indeed better than that of the small model trained by small computing power. However, this does not mean that we must have computing power comparable to openai in order to innovate, and computing power will not hinder our innovation. Although we may need to train for a longer period of time, such as two months for OpenAI to train and half a year for training, as long as we have the appropriate algorithms and data, we can still achieve innovation.
Wu Shaohua: ".Large model structure, distributed training algorithms,Data collaboration and optimization reduce computing power requirements
The essence of this problem lies in the fact that when training large models, it is generally believed that the larger the computing power, the higher the performance of the model, but in fact, this concept mainly comes from early research, when the number of model parameters and data increases, the accuracy of the model will be improved. However, this improvement will be converted to computing power, resulting in the formation of the concept that the larger the computing power, the higher the performance of the model. However, the current large model training paradigm has changed, and the introduction of instruction fine-tuning, thousands of high-quality data can improve the model capabilities, so whether a large amount of computing power needs to be invested in the pre-training stage has become a problem.
According to the development source 20's experience, we cleaned 12 GB of data on the Internet, and only got 10 GB of Chinese math data. In this case, there is no point in pursuing the volume of data, if an effective means can be found to reduce the amount of data, the demand for computing power will be reduced. In source 2In the development process of 0, the overall volume of data is not large, but the quality of internal evaluation is very high, which is a very effective means to improve the efficiency of computing power. At the same time, when designing the model structure, the number of parameters should be reduced as much as possible under the same architecture, which can improve the efficiency of parameters and save computing power, which is equivalent to considering the cost of computing power from the algorithm level. In addition, the premise of the currently formed distributed training algorithm is that the P2P bandwidth between chips should be high enough to meet the huge demand for tensor parallelism for communication. In this case, we have done additional work for large-scale distributed training of computing power, which can reduce the demand for communication bandwidth in the process of large model training, and can train large models with more diverse devices.
Jing Yang: Large pre-trained models have shown strong performance, but they still face some challenges in the industry, such as deployment, customization, data privacy, and security. So, how do you think large pretrained models should go to the industry and realize their potential?
Chen Yujun: "Work with partners and industry experts to teach large-scale model industry knowledge".
Since 2019, Circular Intelligence has been thinking about how to realize the implementation of AI models in the industry, so when launching related products, it has also paid attention to the needs of about ten industries, including banking, insurance, automobiles, real estate, etc. This year, when using large models to solve problems in various industries, we found that one of the more challenging points of the implementation of large models is that each industry has different professional knowledge and knowhow, for example, law companies have very high requirements for the accuracy of the output of the model, and the model needs to read the entire laws and regulations, and must output the content of the regulations word for word, and at the same time, the model needs to remember the corresponding chapter numbers of laws and regulationsIn the scenario of real estate marketing, through the extraction of sales and customer communication scenes, we found that the industry "black words" similar to "200 of 500" represent that the house area is 200 square meters, corresponding to 5 million **;For the financial industry, we need to understand the financial report information, which are all problems we encounter in the process of landing large models in the industry. Therefore, one of the most important steps to realize the implementation of large models in the industry is to teach large model industry knowledge with partners and industry experts. At the same time, we are also building a form of cooperation, which allows as many partners as possible to join together to build a large model, and through a large number of customer feedback, we can find the current problems of the model and find the next stage of evolution.
Zhang Jiajun: "Lower your profile, manage user expectations, have more contact, and be more patient".
First, lower your profile. Our large model is for everyone to use directly, and in many scenarios we have to lower our posture. Second, user expectation management. We need to give an expectation of how long it will take to solve the problem, because different industries will have a variety of different problems to solve, and we need to avoid overly high expectations for customers and solve the problem realistically. Third, be more engaged. Let everyone use more, find problems, solve problems, and get better and better from the perspective of user feedback. Fourth, be patient. Whether you are making a big model or as a user, cultivate everyone's patience, and the future will definitely get better and better.
Wu Shaohua: "Large-scale model training empowers developers to reach application scenarios".
For source 2In terms of model 0, we have launched a large-scale model co-training program, and the core starting point of this program hopes to allow our R&D team to reach all developers. Developers put forward the requirements of their own applications or scenarios, provide 1 2 examples, and we will prepare the training data and enhance the training of the source large model, and the trained model is still open source in the community. At the same time, we also have another form, Inspur Information will empower partners, provide them with our experience in model capabilities, and help partners apply these to the industry.
Yang Jing: Nowadays, everyone is facing the problem of shortage of computing power, and large model training is inseparable from the support of sufficient computing power. So what work have you seen or are doing to adapt to the trend of computing power diversification?
Chen Yujun: "Avoid duplication and waste of computing power, and concentrate industry knowledge to cooperate and train".
Everyone's training should be the logic of cooperation and training, different industries will have different knowledge, we should gather this knowledge as much as possible, use limited computing resources to train together, so that we can achieve the point of saving computing power, to avoid a lot of repetition and waste of computing power.
Zhang Jiajun: "I won't put my eggs in one basket and take the road of localized large models".
Our approach is not to put all our eggs in one basket, and we will also use all kinds of computing power at home and abroad. Since 2020, we have been taking the road of localized large models, and we have always adhered to this road, and almost all the chip computing power in China has been adapted, which can ensure that we have the ability to retain the solution of training large models.
Wu Shaohua: "Responding to the Trend of Computing Power Diversification from the System Level".
Inspur Information has developed a set of framework, using one layer of this framework, it can specifically manage all kinds of computing power, and we provide a solution for the industry to face multiple computing power from a system perspective.