A few months before OpenAI's high-profile release of GPT-4, it was rumored in the industry that Google's artificial intelligence company Deepmind couldn't sit still and finally wanted to take out the Gemini model that had been hidden in the boudoir for a long time.
On December 6, local time, Google held a Gemini press conference, which is also the company's highest-level press conference in the field of artificial intelligence so far, officially declaring war on competitors OpenAI and Microsoft to compete for artificial intelligence supremacy.
The Gemini model is best-in-class in every feature – as one observer put it, it's a "machine of all things."
This model is inherently more capable. Sundar Pichai, CEO of Google and its parent company Alphabet, told MIT Technology Review, "It's a platform. Artificial intelligence is a profound platform shift, bigger than web or mobile. Therefore, it represents a big step forward for us. ”
Sundar Pichai, who was previously in charge of Chrome and Android, is known for being obsessed with products. In his first founder's letter as CEO in 2016, he said, "We're going from mobile-first to AI-first world." In the years since, Pichai has deeply integrated AI into all of Google's products, from Android devices all the way to the cloud.
This is a big step for Google, but it doesn't seem to be a huge leap for the field as a whole.
According to Google Deepmind, Gemini surpassed GPT-4 in 30 of the 32 standard performance metrics — however, the gap between them is small. Google DeepMind has packed all of the best features of the moment into one powerful package, and judging by the demo**, it does a lot of things well, but there are few that we haven't seen before.
But the hype around Gemini is really something we haven't seen in this space, and in terms of the rumors that "something big is about to happen in the AI world" that we have heard everywhere for months, Gemini may be a sign that we have reached the peak of AI hype. At least for now.
Chirag Shah, a professor at the University of Washington who specializes in search, likened the event to Apple's annual iPhone. "It didn't impress us too much because we've seen too much lately," he said. ”
Like GPT-4, Gemini is multimodal, which means it is trained to handle a wide range of inputs: text, images, and audio. It can combine these different formats to answer everything from housework to college math to economics.
Gemini gave a live demonstration at yesterday's press conference, showing it a chart that it updated with new information after analyzing hundreds of pages of research with new data. In another example, Gemini was shown a ** of an omelette in a pan and asked if the omelette was cooked, to which it replied, "It's not ready yet because the egg is still dripping." ”
However, most people are still not able to experience Gemini in its entirety at the moment. Today's release is the backend for Bard, Google's text-based search chatbot, giving it more advanced reasoning, planning, and comprehension capabilities.
In the coming months, the new Gemini Plus Bard will be available in English for the first time in more than 170 countries, excluding the European Union and the United Kingdom. Sissie Hsiao, Google's vice president for Bard, said it was because the company needed to "converge" with local regulators.
Gemini is also available in three grades: Ultra, Pro, and Nano. Nano works directly on the device, such as Google's Pixel phone. Starting December 13, developers and enterprises will have access to Gemini Pro, which can run on limited computing resources. The most powerful model, the Gemini Ultra, is a full-power version, and Google executives told reporters at a press conference that it will be available "early next year" after "extensive trust and safety checks".
Pichai said Gemini represents the progress that Google DeepMind has made in AI: "I think this is the Gemini era of AI models, and it will always represent the forefront of our progress in AI technology." ”
and GPT technology competition
OpenAI's most powerful model, GPT-4, is regarded as the industry's leading standard. While Google boasts that Gemini outperforms OpenAI's previous model, GPT 35, but company executives dodged the question of how much the model exceeds GPT-4.
But the company specifically emphasizes a benchmark called MMLU (Massive Multitasking Language Understanding). This is a set of tests designed to measure a model's performance on tasks involving text and images, including reading comprehension, college math, and multiple-choice tests in physics, economics, and social sciences.
On text-only questions, Pichai said, Gemini scored 90 percent, while human experts scored about 89 percent. GPT-4 scored 86% on such questions. On the multimodal question, Gemini scored 59%, while GPT-4 scored 57%. Pichai hinted that Gemini was already number one long before GPT-4.
It's clear that Gemini is a very complex AI system. Melanie Mitchell, an artificial intelligence researcher at the Santa Fe Institute in New Mexico, said Google's model was very impressive for benchmark datasets, "but in my opinion, Gemini is not significantly more capable than GPT-4." ”
Percy Liang, director of the Center for Basic Models at Stanford University, said that while the model has good benchmark scores, it's hard to know how to make sense of the numbers because we don't know what's in the training data.
Using feedback from human testers, Google Deepmind trained Gemini to answer facts more accurately, give attribution when asked, and hedge when faced with unanswerable questions, rather than gibberish. Emily Bender, a professor of computational linguistics at the University of Washington, questioned Google's advertised "one-size-fits-all machine" claim, saying the company was using narrow benchmarks to evaluate models that were expected to be used for these different purposes, "which means they can't be evaluated effectively and thoroughly."
Where has artificial intelligence got?
In March this year, OpenAI released its GPT AI model and launched its supported paid chatbot ChatGPT. It's a lot of pressure for Google.
Google has spent several years investing heavily in research in the field of artificial intelligence, and with its unique data resources, it is far ahead of the industry. But I didn't expect to be overtaken by OpenAI in the corner.
In April, under tremendous pressure from investors, Google must prove that it is no worse than OpenAI, so it announced that it would merge Google Brain, which is responsible for artificial intelligence research, with parent company Alphabet's London-based artificial intelligence research lab Deepmind, and in the rest of this year, it will come up with the Gemini model to respond to the challenge of GPT-4.
Google executives revealed that Google has been hesitant to launch a tool that can be used by the public, because it doesn't want to fool everyone (in fact, it is not smart enough to be universally applied), and has security concerns (which is also the fuse of GPT's Gongdou).
Google has been very cautious about releasing these things to the public," Jeffrey Hinton told MIT Technology Review when he left the company in April, "There are so many bad things that could happen that Google doesn't want to ruin its reputation." "Faced with seemingly untrustworthy or unmarketable technology, Google treads cautiously until it misses out on a bigger risk.
Google has suffered many times in its history in the past, and launching defective products can be counterproductive for the company's growth — such as rushing to launch Bard in February in response to ChatGPT-backed rival Bard. But a small blunder by Bard wiped out $100 billion in its stock price.
In May, Google announced that it would embed generative AI in most of its products, such as Google Mail and other software, but the results did not impress critics, such as the chatbot that foolishly tells you that an email has come and that it doesn't actually have it at all.
Large language models have this problem. Although generative AI systems seem to be able to speak like humans and write texts that look a lot like humans, what they are best at is telling lies...This is not the only problem with this type of model today. They are also easily a**, and full of prejudices, and when the masses use them, they are also polluted by them.
Google has not solved these problems or the illusion problem.
AI illusions are another obstacle to the development of artificial intelligence. It refers to a common occurrence in which various large language models (LLMs) around the world, including ChatGPT, confidently fabricate facts and weave together the fabricated facts with the coherence and consistency of multiple paragraphs, and refer to the common occurrence of real information.
Google Gemini's solution to the latter problem is to use a tool that lets people use Google search to double-check the chatbot's answers, but it depends on the accuracy of the search results themselves.
Gemini could be the pinnacle of this wave of generative AI. But it's unclear where AI based on large language models will go next. Some researchers believe that the next peak may not necessarily start with these models, but may appear elsewhere.
Pichai naturally disagreed. "As we teach these models to reason more, there will be more and more breakthroughs. On this point, he and Altman are clearly on the same page.