Author丨He Sisi.
Editor丨Chen Caixian.
In August this year, at the main forum of the GAIR conference held by Leifeng.com in Singapore, Huang Xuedong, a former academician of Microsoft Global Technology and an academician of the United States Academy of Sciences, put forward his theory of the development of large models in an ancient Chinese saying**:
At that time, the mainstream trend of domestic large-scale model research and development was a self-developed pedestal model, which was in full swing in the "100-model war", and Academician Huang Xuedong's view was the opposite, believing that it was too dangerous to put all the eggs in one basket, and the capabilities of four or five large models should be integrated, and each large model had its own application scenarios.
To summarize in a professional term, after leaving Microsoft and joining Zoom as CTO, Huang Xuedong's large model development route advocated within Zoom is the federated large model - the large language models of OpenAI, Anthropic AI, Google, Meta and other technology giants are brought together to form Zoom's AI base, so as to achieve better results at a lower cost.
Recently, after a series of studies and experiments, Huang Xuedong's team verified the route plan for the federated large model in August, and made a major breakthrough: Zoom's AI technology team integrated multiple well-known large models at less than 6% of the cost of GPT-4, and the trained federated large model achieved the effect of GPT-4-32K in the performance of conference scenarios.
In terms of computing power, the federated large model can achieve 99% of the performance of GPT-4 in Zoom application scenarios with less than 10% of the computing resources, and greatly exceed the response speed of GPT-4.
Compared with domestic and foreign manufacturers that pursue a single optimal pedestal large model, although they have also made good breakthroughs in technical research and can achieve optimization in a single mode and some tasks, their overall capabilities are still weak, and there is a big gap between them and GPT-4.
The reason for this is that most manufacturers are incapable of taking into account both effect and cost, and either do not have enough financial resources or insufficient capabilities. Due to the extreme admiration of self-developed culture, players whose advantages were originally concentrated in application scenarios are also more inclined to make the model bigger and stronger through their own strength, and lack the awareness of learning from the outside to learn from Xi and complement each other's weaknesses.
At a time when the phenomenon of reinventing the wheel is serious, the federal model proposed by Zoom is instructive.
What is the Federal Grand Model?
The power of the large model era is divided into three layers, one layer is the underlying computing power, the middle layer is algorithm innovation, and the top layer is model application. Although Zoom has built its own large model team, it is not a vendor that sells algorithms. Compared with algorithm research and development, Zoom, which has a clear landing scenario (such as ** conference) and a large number of vertical industry users, is more inclined to application.
Like most application-focused vendors, Zoom's appeal for large models is mainly reflected in the cost performance - to achieve the strongest model capabilities with the lowest cost, so as to provide users with the best quality service and improve user satisfaction. For example, improve the communication efficiency of the meeting, enhance the automatic text summary function of the meeting, and automatically generate the meeting draft and meeting Q&A. For this reason, Zoom has an advantage in choosing the route of the federated large model.
According to the AI Technology Review's exclusive conversation with the Zoom team, in the past six months, they have made rapid progress in the implementation based on the federal model, which is mainly reflected in three aspects:
First, the improvement of AI landing methods.
Unlike other AI app transformations, Zoom takes a federated AI approach that is the cornerstone of Zoom's innovation. It is reported that Zoom has connected to a number of models, including Zoom's self-developed LLM and third-party model GPT-35 and GPT-4, as well as large models such as Claude 2 from Anthropic AI.
The model I want to access is not limited to the above, but embraces all kinds of LLMs with an open mind, not only integrating the latest LLMs, such as OpenAI's GPT-4 or even the future GPT-5, etc., but also integrating open source or closed-source LLMs into them, so as to jointly improve the end-to-end experience of customers.
In order to verify the effectiveness of the federated large model, Zoom also conducted several rounds of testing internally. The results show that Zoom's federated large model trained based on model integration has achieved an effect comparable to many well-known single-pedestal models, including OpenAI's GPT-35 Turbo (99% vs 93%) as well as several other state-of-the-art LLMs.
Second, adhere to low-cost landing.
The most suitable and lowest-cost LLM can be selected according to the specific scenario. The quality of completion of the initial task is evaluated according to the z-scorer, and higher-level LLMs are called as appropriate to enhance the completion of the tasks based on the results achieved by the initial LLMs.
Focusing on practical application scenarios, such as some simple problems, Zoom will choose to use small and medium-sized models to solve, and some difficult problems will call GPT-4 to solve. Compared with a single model, this method can achieve a lower cost to a large extent.
It is equivalent to GPT-4 being a teacher, and he takes the following students to work together, just like a team that needs different skills to work together to create a more effective collective.
In specific tests, compared to OpenAI's GPT-4-32K as Microsoft's Copilot agent, the results show that Zoom AI Companion's conferencing feature enhances the quality of the large model while ensuring lower costs and faster response times. Zoom has achieved GPT-4-32K performance at less than 6% of the cost, which is impressive.
Third, the performance is getting stronger and stronger.
Powered by a federated AI approach, Zoom was able to leverage the progress of many leading partners in large models, demonstrating its high-performance capabilities at a low cost.
AI Technology Review has learned that Zoom can achieve 99% of the performance of the most advanced large model GPT-4 in Zoom application scenarios with less than 10% of computing resources, and greatly exceed the response speed of GPT-4.
In terms of language support, the early AI models, including most of the current models, are mainly pre-trained on English data, while Zoom has added translation models and expanded multilingual capabilities, and can now support 32 languages except English.
These tests highlight the effectiveness of Zoom's combined AI approach and the benefits of integrating different machine Xi systems.
The next stop of the federal grand model is **?
The successful implementation of the idea of Zhuge Liang on Zoom has fired the first shot for the entire industry, and also proves that the federal model has pointed out a direction for the industry to make the big model land.
When large models are implemented in the industry, the most serious challenges focus on performance, response speed, and cost, but the federated large model method proposed by the Zoom team solves these challenges well. According to AI Technology Review, no domestic company has been able to integrate more than four or even more large model federations.
Behind this is mainly a technical test, that is, which models should be selected according to the specific application scenario. On this basis, there are also strong technical barriers to how to integrate.
In addition, in terms of performance, response speed and cost, based on Zoom's current performance, it has achieved performance comparable to GPT-4 at a lower cost than GPT-4, which is the top level in the industry at present, but in practice, the federal large model is not a smooth road.
Huang Xuedong once said that the technology trend of multimodal joint development centered on large models is bound to become a reality in the next two years. However, in the current view, the federal model is still a relatively new concept, and in order to successfully apply this technology, it is not possible to complete it overnight, at least it is necessary to have a strong cognition and full understanding of this technology.
Secondly, starting from the federated model itself, Zoom emphasizes the integration of multiple models, and if it is a single model, it only needs to consider the degree of adaptability to a certain model, including how to pour data into it for training, how to fine-tune, and how to enhance capabilitiesHowever, if there are multiple different models, it is much more complicated, not only need to consider the collusion relationship between different models, such as whether this problem needs to be solved by model A or model B, but also need to consider which model can be used to land at a lower cost, and which model to use has higher performance and better experience......
This is the core challenge of the federal model, and it is one that Zoom needs to focus on overcoming. The Zoom team revealed to AI Technology Review that the biggest challenge they faced was how to integrate the many stinker models into Zhuge Liang. How to decide which large language model to dynamically use in what scenario to achieve the lowest cost, fastest response speed and best quality. Balancing the relationship between the three is an art, and the understanding of technology, the acquisition of data, and the practice of engineering are all indispensable.
Judging from the implementation effect of Zoom so far, the federal model is only comparable to GPT-4 in individual scenarios, such as conference Q&A. But there is still work to be done on quality, and 99 to 100 percent of the distance cannot be eliminated immediately. In the future, there is still a long way to go for the federal model to catch up and surpass in all scenarios.
Leifeng Net, Leifeng Net, Leifeng Net).