Google amplified the move to launch the AI model Gemini to challenge GPT 4

Mondo Technology Updated on 2024-01-28

This article was first published in Zijin Business Review, authorized by Zijin Finance, **please indicate**).

Google's new "big killer" gemini is finally here!

On December 6, local time, Google announced the launch of the artificial intelligence model Gemini, which Google said was its most powerful and versatile large language model to date.

According to Google, it can understand the world around us just like humans, processing **, text, audio, images, etc., all without a problem. In addition, it can complete complex tasks in the fields of mathematics, physics, and other scientific fields, and can understand and generate high-quality programs in a variety of programming languages**.

According to Google's benchmark results, Gemini showed "state-of-the-art performance" in many tests, even beating OpenAI's GPT-4 completely in most benchmarks.

As soon as the news came out, the social ** exploded instantly. Nvidia AI scientist Jim Fan commented: This is a strong contender for the OpenAI throne.

The challenger to ChatGPT is here

In the past few years, Google has been making AI-first a corporate strategy, and AlphaGo, which defeated the human Go champion in 2016, was the brainchild of Google.

Since OpenAI launched ChatGPT a year ago, Google has been working hard to prove its prowess in the field of artificial intelligence by developing AI software that can compete with the company.

At the Google Worldwide Developers Conference held in May this year, Google revealed for the first time that it was developing its AI model Gemini, and after 7 months, Gemini finally came.

According to Google's official news, Gemini is a new large model developed by the Google brain team, which has stronger generation capabilities and higher reliability, and is the most powerful AI large model built so far.

In the different versions released by Google, the Gemini Ultra is described as the largest and most powerful model for highly complex tasks;Gemini Pro is considered the best model for a wide range of tasksAnd the Gemini Nano is the most efficient model designed specifically for devices such as mobile phones.

During the live demonstration, the subject showed Gemini an omelette cooked in a pan and asked if the omelet was cooked, to which he replied, "It's not ready yet, because the egg is still dripping." ”

After the release of Gemini, the outside world was most concerned about its challenge to OpenAI GPT4. In the interview, the reporter asked Eli Collins, vice president of product at Google's DeepMind, "Can Gemini beat all the big models on the market, including GPT4?"”

In his response, Eli Collins said that the team has been rigorously testing the Gemini model and evaluating its performance in a variety of tasks. From natural images, audio, and comprehension to mathematical reasoning, Gemini Ultra outperforms 30 of the 32 academic benchmarks widely used in large language model (LLM) research and development.

In order to prove that its product is better than OpenAI's ChatGPT, Google has thrown out several report cards.

According to data given by Google, Gemini Ultra scored 90% on the MMLU (Massive Multitasking Language Understanding) test, making it the first model to surpass a human expert in the MMLU test. In comparison, the human expert scored 898%, GPT4 score rate is 864%。

AI has entered the multimodal era

Nowadays, most of the multimodal large models are multimodal applications grown on top of the large language model LLM, rather than the multimodal large model trained from scratch. Different from the window dialogue that is the main focus of general large models, the consensus in the industry is that multimodal large models are the future.

In contrast, Gemini is a truly native multimodal large model.

From the beginning of the design, multimodality is part of Gemini's plan, and from the initial pre-training data, Gemini has been training models for different modalities, so its functionality has reached SOTA (State of the Art) in every major domain.

Based on this, Google calls its multimodality natively multimodal, which can "seamlessly" understand, manipulate and combine different types of information, and has powerful interactive capabilities.

In terms of reasoning, Gemini 10 has complex multimodal reasoning abilities to help understand complex written and visual information. This gives it a unique ability to uncover hard-to-discern knowledge content in massive amounts of data. Its remarkable ability to read, filter, and understand information to extract insights from hundreds of thousands of documents will help enable new breakthroughs at the speed of digitalization in fields ranging from science to finance.

And in terms of coding, Gemini 10Ability to understand, interpret, and generate high-quality programming languages such as Python, J**A, C++, and Go.

At the same time, Gemini 10 is trained to recognize and understand text, images, audio, etc., at the same time, so it can better understand nuanced information and answer questions related to complex topics. This makes it particularly good at explaining reasoning in complex subjects such as mathematics and physics.

It is reported that in Gemini 1Version 0 includes three different sizes, namely Gemini Ultra, Gemini Pro, and Gemini Nano.

Among them, Gemini's most powerful full-blooded version, Gemini Ultra, will have to wait a few months before it can be seen by the public. According to Google, the Ultra version is currently only available to select customers, developers, partners, and security and accountability experts.

Related Pages