Not to be outdone, Google launched its next-generation artificial intelligence model - Gemini 15。
The news was jointly announced online by Google CEO Sundar Pichai and Chief Scientist Jeff Dean and other senior executives, and instantly attracted wide attention from the industry.
Although OpenAI's new work has attracted most of the attention, Gemini 15With its amazing breakthrough in cross-modal long text understanding, it has successfully broken through from the flank and has become a major focus in the AI circle recently.
The new model demonstrates unprecedented information processing capabilities, capable of stably handling up to 1 million tokens, which equates to about 1 hour of audio, 30,000 lines**, or 700,000 words.
This feat not only makes the Gemini 15 easily surpassed its own gemini 10 pro(3.20,000 tokens) as well as GPT-4 (1280,000 tokens) and Claude 21 (200,000 tokens), which broke the current record for the length of the context window of the current public LLM.
What's even more amazing is that Google revealed the Gemini 15 The data of 10 million tokens has been successfully tested and processed in an experimental environment, which is equivalent to the analysis of the content of the entire Lord of the Rings trilogy in one go. According to Pichai, the larger query window opens up endless possibilities for enterprise-level applications, such as filmmakers who can upload entire productions to get Gemini's expert opinion on plot trends, and auditors who can use it to efficiently review vast amounts of financial records.
This upgrade, Gemini 15. The most advanced MOE architecture design is adopted to improve the efficiency and response quality of the model. Compared with the unified network structure of the traditional Transformer model, the MOE model innovatively divides itself into multiple professional modules, and activates the most appropriate sub-modules when processing tasks, so as to achieve effective resource allocation and accurate calculation.
This architecture is not only suitable for complex tasks with large-scale datasets, but also gives the model greater scalability and flexibility, and many advanced models, including GPT-4, are said to use this technology to varying degrees. According to preliminary data provided by Google, Gemini 1The 5 Pro has made significant progress in math, science, reasoning, multilingualism, and comprehension, and its performance is close to that of the previous flagship Gemini 1., even with fewer computing resources0 ultra。
In the official demo and technology**, Google showcased Gemini 1 through a series of examples5 Pro's Powerful Features:Complex Reasoning and Multimodal Analysis:The model seamlessly parses and summarizes complex documents, such as a 402-page PDF file for the Apollo 11 mission, and Gemini is able to list three key moments in a short period of time and provide original dialogue references.
In the case of Victor Hugo's magnum opus, Les Misérables, it not only provides an overview of the scene, but also pinpoints the page number of a particular picture and its associated plot.
Extra-long** comprehension:The face duration is 44 minutes, which is equivalent to 6840,000 tokens of the silent film "Sherlock Jr."》,gemini 1.The 5 Pro is capable of delivering a concise synopsis in just 57 seconds, and responds quickly to specific times and key information about the "paper out of pocket" in the film.
In-depth analysis:When faced with a total of 81When a large project of 60,000 tokens, the model can quickly locate the ** segment of the specified demo, and even put forward useful modification suggestions and explain the reasons in detail.
In addition, the Gemini 1The 5 Pro's performance in the "contextual learning" area is impressive. In a particular test, the researchers fed the model a grammar book about Kalamang (an endangered language with fewer than 200 speakers and extremely scarce online materials), a bilingual vocabulary and a large number of parallel sentences, totaling about 250,000 tokens, as contextual information. In the absence of any prior training, Gemini 1The 5 Pro has successfully mastered the translation skills from English to Kalamang and vice versa, and its translation quality is close to the human level, and the translation effect of the half-book content is significantly better than that of GPT-4 Turbo and Claude 21。
Google Gemini 1The launch of 5 undoubtedly sets a new milestone in the field of AI, especially a major breakthrough in contextual understanding and processing capabilities, giving enterprises more powerful tools when dealing with complex information and knowledge-intensive tasks. With Gemini 15 Emerging to the top, OpenAI's short-term monopoly on the headline position seems a bit wronged, and this competition around the efficiency of AI models will undoubtedly continue to drive innovation and development in the entire industry. February** Dynamic Incentive Program