The return of the top students, can Google s most powerful model, Gemini, beat GPT4?

Mondo Education Updated on 2024-01-28

Google is ahead in terms of technology forward-looking, while OpenAI is more focused on product polishing.

Author Sukhoi.

Edited by Bo Wang.

Silence"Google, which has been doing so for a long time, has finally amplified its tricks.

On December 6, local time in the United States, Google released the multi-modal large model gemini. Google Deepmind announced directly,Gemini is "Google's largest and strongest AI model".

Screenshot of Google Deepmind's official account, **Google Deepmind

Gemini Google is a multimodal model built from the ground up, which is also a way to get closer to human perception of the world.

As Google's "killer feature" to deal with GPT-4. Gemini has achieved 30 SOTA (State of the Art) records in 32 multimodal benchmarksIt is the first model to surpass human experts on the MMLU (Massive Multitasking Language Understanding) assessment. Gemini achieved a result of 90 in this category0%, and as a comparison, the score of the human expert is 898% and 86 for GPT-44%。

There are three versions of Gemini:

Gemini Ultra: for highly complex tasks;

Gemini Pro: Gemini models for a wide range of tasks

Gemini nano: Smaller, suitable for specific tasks and mobile devices.

Three versions of Gemini, Google.

Gemini's versatility allows it to run on everything from mobile devices to large data centers. "Eli Collins, VP of Product at Google Deepmind, said"We are getting closer to the vision of a new generation of AI models. ”

With artificial intelligence, we will have the opportunity to do something important on a larger scale. Google CEO Sundar Pichai specifically mentioned artificial intelligence in his open letter to Google's 25th anniversary. In the face of the strong OpenAI, the top student Google needs a phenomenal product to prove its strength in the field of artificial intelligence.

Gemini is Google's answer.

The core strength of Google's gemini model lies in itNatively multimodalcharacteristics.

In the past, multimodal large models were often built by training individual components for different modalities and then combining these components to simulate multimodal functions. While performing well on certain tasks, such as image descriptions, they often perform less well when dealing with tasks that require deeper conceptual understanding and complex reasoning.

Google's Gemini model is pre-trained on different modalities from the beginning, and then fine-tuned by using additional multimodal data to further improve the effectiveness of the model. This native multimodal training method makes Gemini more efficient and accurate when processing multiple types of data and complex tasks, thus setting a new standard in the field of multimodal AI.

And,The launch of Gemini is mainly:"Aiming for OpenAI's GPT-4"Come. In the words of Li Yunlong in "Bright Sword", it is - "what fights is the elite".

TV series "Bright Sword".

In terms of computing performance, Gemini almost "beats" GPT-4 in its entirety. Gemini Ultra outperformed previous SOTA results in 30 of the 32 academic benchmark sets that are widely used in large model development. Among them, in multiple-choice problems, math problems, python** tasks, reading, etcGemini's performance exceeds the state-of-the-art level ever before.

Google says that it has adopted a new benchmarking approach to MMLU, which allows Gemini to use reasoning to think more carefully before answering difficult questions, and that Gemini's performance is significantly better than answering questions based on their first impressions alone.

Gemini Ultra performs well in several coding benchmarks, including Humaneval and Natural2Code. Among them, only Gemini is inferior to GPT-4 only in tests on the Hellaswag dataset.

The hellaswag dataset is primarily used to study grounded common-sense reasoning abilities, howeverA research expert in the field of NLP told Jiazi Lightyear:"This does not mean that GPT4's common sense inference performance is better, because it cannot be ruled out that ChatGPT's model has been trained on the Hellaswag dataset. "

Comparison of Gemini model and GPT-4 partial test scores, **Google.

In addition, in terms of multimodality, the Gemini Ultra achieved a 59 in the new MMMU benchmarkA state-of-the-art score of 4% highlights its multimodality and complex reasoning capabilities.

In terms of image benchmark testing, Gemini UltraOCR processing is possible without extracting text from images, outperforming the previous state-of-the-art model.

Test comparison of the Gemini model with GPT-4V in terms of multimodality, **Google.

gemini 1.0 is trained to recognize and understand text, images, audio, etc., at the same time, so it can better understand information with nuances, answer questions related to complex topics, and is especially good at explaining reasoning in complex subjects such as mathematics and physics.

"Reasoning defects" are also a problem with the GPT series. Dr. Gary Marcus, a well-known critic of language models, has commented: "Large language models can't do things that are strictly defined: follow the rules of chess, multiply five digits, make reliable inferences in a family tree, compare the weights of different objects, and so on." ”

Gemini Solving Physics Problem Demonstration Case, **Google.

Despite significant technological advances, the problem of AI-generated false or fabricated information persists. Eli Collins points out that this is still an unsolved research puzzle.

But he also stressed that Gemini has undergone Google's most comprehensive security assessment to date to ensure its reliability and security. Google ran a series of adversarial tests on Gemini, simulating malicious users using the model and typing in a variety of prompts to detect whether the model produced hate speech or showed political bias. These tests included "true toxicity prompts", which consisted of more than 100,000 prompts collected from the web to fully test the model's responses.

Google Data's TPU v5p AI Accelerator Supercomputer, Google.

It is worth noting that Gemini was trained on Google's self-developed cloud chip Tensor Processing Units (TPUs). In particular, the TPU V5P version has a significant improvement in performance, making the model training speed increase by 2 compared with the previous generation8 times. It is reported that the TPU V5P chip is designed for data center training and large-scale model operation.

Starting December 13, developers and enterprise customers will be able to access Gemini Pro models through Google AI Studio or Google Cloud Vertex AI. Google AI Studio is a free, web-based development tool that provides developers with API keys to quickly prototype and launch applications. Vertex AI offers a customized Gemini to provide a more comprehensive and managed AI platform, with full data control capabilities that take advantage of Google Cloud's additional capabilities, including enterprise-grade security, privacy protection, and data governance and compliance.

In addition, starting with the Pixel 8 Pro device, Android developers can also use the Gemini Nano with AICORE, a new system feature in Android 14. By signing up for the early preview of AICORE, developers can explore the potential of Gemini Nano as an efficient model designed for on-device tasks, take advantage of Gemini's advanced technologies more easily, and explore more possibilities in app development in the Android ecosystem.

By 2024, Google plans to launch Bard Advanced, which is very similar to the initial form of AI Agent. Powered by Gemini Ultra, Bard Advanced can quickly understand and respond to multimodal inputs, including text, images, audio, and **.

While OpenAI's GPTS was making a splash, Google seemed too quiet.

In February, Google lost $100 billion in market value due to a mistake in its chatbot Bard at an event in Paris, raising concerns about Bard's accuracy.

With the launch of ChatGPT by OpenAI, especially after integrating GPT technology in Bing search and surpassing Google in terms of app volume for the first time, people began to wonder if Google is lagging behind its competitors in the field of artificial intelligence.

In fact, Google was the first to come up with the Transformer model in 2017 and set the rules for today's game.

Google on the big model"Highlands"'s sense of competition is no later than that of OpenAI. In 2021, Google launched 1The 6 trillion parameter switch transformer emphasizes the potential of sparse multimodal structures. At the same time, Google has also proposed the FLAN-T5 model, which reduces the model size with more supervised data, and has fewer parameters but better performance than the GPT-3 model.

For the technical assessment, The Economist conducted a comparative test in January, asking ChatGPT and Google's Lamda-based bot Bard questions about math, reading and dating advice.

The test results show that Google's AI performs better on math problems, but ChatGPT is more accurate on common sense problems. A few days later, OpenAI upgraded ChatGPT and tested it again on par with Google AI on math issues. Although ChatGPT is expensive to train and difficult to iterate as a large language model, it also shows its great potential for continuous evolution. It is worth noting thatGoogle's language model is on par with ChatGPT in terms of performance.

In this showdown, both Google and Microsoft need more cost-effective solutions. Google has made a lot of research progress in the field of AI, but it has not yet deployed and monetized these results, similar to Microsoft in some of the past. This may be because Google underestimates the competitive strength of Microsoft and OpenAI, or is overconfident in its dominance in the search engine space.

Jiazi Lightyear Based on the analysis of multiple views, it is concluded that Google is ahead in terms of technological foresight, and OpenAI is more focused on product polishing.

Led by Sam Altman, OpenAI focuses on product-oriented work, working on scaling and optimizing models, with a primary focus on fine-tuning methods in detail.

Google, on the other hand, has always maintained a positive and forward-looking attitude in the direction of technology development, but it has repeatedly adjusted its overall strategic planning.

Google has taken a deep dive into the sparse model architecture. It's just two years later, the trillion-level Switch Transformer has hardly produced any splash, while the 100-billion-level GPTS series has flourished. Similarly, although the repeatedly improved FLAN-T5 model surpasses GPT-3 in performance, its optimization progress is relatively slow.

on Google"Difficulty in choosing"During this period, OpenAI has completed training on ChatGPT.

In September 2022, Google's Deepmind launched the Sparrow model, which, like ChatGPT, adopts a strong chemistry Xi (RL) framework based on human feedback. The model uses small parameter settings, which is significantly different from the thinking of the Lamda and Palm models that Google values. It's just that Google didn't quickly determine whether the sparrow model was the best choice, which also led to the sluggish productization of the sparrow model, and finally failed"Fly on the branches and become a phoenix"。

"Hesitation"It seems to have been Google's chronic problem. "But it's better late than not good!Finally there is a strong contender for the OpenAI throne. After Google's announcement, Nvidia AI scientist Jim Fan commented.

In April of this year, Google merged the Google Brain and DeepMind teams to form Google Deepmind. Some have dubbed the team the "AI Avengers". Eli Collins, the former head of AI product at Google, has been entrusted with the role of VP of Product for the new team.

Currently, Gemini Pro and Gemini Nano have been integrated on the chatbot Bard and the smartphone Pixel 8 Pro, enabling more advanced inference, planning, understanding, and other capabilities. And the more powerful Gemini Ultra will be released next year.

I don't know what will happen to openai"Response"What about it?Maybe we'll have to wait until GPT-5 is released soon.

However, the top student Google is not entirely concerned about this time battle, but is looking to the future.

Our quest for answers will drive extraordinary technological advancements over the next 25 years. By the year 2048, if a teenager is somewhere in the world and shrugs his shoulders at everything we've built with AI, we know we've succeeded. Then, we went back to work. ”

Google CEO Sundar Pichai said in an open letter to Google on its 25th anniversary.

Reference: Introducing Gemini: Our Largest and Most Capable AI Model, Google.

ChatGPT is very popular, why did Google lose to OpenAI on its own turf?Interface News.

Cover image**: The movie "Superman Returns").

Related Pages