Rumors have been circulating for a long time that Google is working on a large language model. This rumor was finally confirmed recently, when Google Deepmind officially announced its own large model, Gemini, and its opponent is OpenAI's GPT-4.
Gemini is one of Google's biggest AI advances to date, aiming to compete with rivals OpenAI and Microsoft for leadership in the AI space. There is no doubt that the model is advertised as the best in its class in various features, and some even say that it is a "jack-of-all-purpose machine".
*:google)
Sundar Pichai, CEO of Google and its parent company Alphabet, told MIT Technology Review: "This model is inherently more capable, it's a platform. AI is a far-reaching platform shift, bigger than the web or mobile has ever brought about it. Therefore, it represents a big step forward for us. ”
Judging by the demo, it does a great job in many ways, but there are few that we haven't seen before. Gemini is multimodal, which means it is trained to handle multiple inputs: text, images, and audio. It can combine these different formats to answer everything from housework to college math to economics.
In a presentation to reporters at the launch, Google demonstrated Gemini's ability to take screenshots of existing charts, analyze hundreds of pages of research with new data, and then update the charts with new information.
In another example, Gemini showed the ** of cooking an omelet in a pan and was asked by voice "Is the omelet cooked". Gemini replied, "It's not done yet, because the yolk hasn't set yet." ”
However, for the full experience, most people will have to wait for a while. The launch is the backend for Google's search chatbot Bard, which the company says will provide Bard with more advanced reasoning, planning, and comprehension capabilities.
Several versions of Gemini will be available in the coming months. The new Gemini Enhanced Bard will initially be available in English in more than 170 countries and territories, excluding the European Union and the United Kingdom. Sissie Hsiao, Google's vice president for Bard, said it was to get the company "engaged" with local regulators.
Gemini comes in three sizes: Ultra, Pro, and Nano. Ultra is the most high-performance version, and Pro and Nano are tailored for applications that run with limited computing resources. The nano is designed to run on mobile devices, such as Google's Pixel phones.
Starting December 13, 2023, developers and businesses will have access to Gemini Pro. Google executives told reporters at a news** conference that the most powerful Gemini Ultra will go live "early next year" after "extensive trust and safety checks."
Pichai told us: "I think this is the era of the Gemini model. This is how Google Deepmind will build and make progress in artificial intelligence. As such, it will always represent the forefront of our progress in AI technology. ”
OpenAI's most powerful model, GPT-4, is regarded as the industry's leading standard. While Google-generated Gemini outperformed OpenAI's previous model, GPT 35, but company executives dodged the question of how much the model exceeds GPT-4.
But the company specifically emphasizes a benchmark called MMLU (Massive Multitask Language Understanding) benchmark. These tests are designed to measure the model's performance on tasks involving text and images, including reading comprehension, college math, and multiple-choice tests in physics, economics, and social sciences.
In the plain text task, Pichai said, Gemini scored 90 percent and human experts scored about 89 percent. The GPT-4 scored 86% on these types of questions. In multimodal tasks, Gemini scored 59%, while GPT-4 scored 57%. "This is the first model to cross that threshold," Pichai said. ”
Melanie Mitchell, an AI researcher at the Santa Fe Institute in New Mexico, said Gemini's performance on the benchmark dataset was very impressive.
"It's clear that Gemini is a very sophisticated AI system," Mitchell said. But for me, Gemini, while stronger than GPT-4, is not obvious. ”
Percy Liang, director of the Center for Basic Models at Stanford University, said that while the model performed well on the benchmark dataset, it was difficult to understand how to interpret the numbers because we didn't know what was in the training data.
Mitchell also noted that Gemini performs much better in terms of language and benchmarks than in terms of graphics and. "Multimodal foundation models still have a long way to go before they can be broadly used in many tasks," she said. ”
Using feedback from human testers, Google Deepmind can train Gemini to answer facts more accurately, give attribution when asked, and give feedback when faced with unanswerable questions, rather than gibberish.
The company claims that this alleviates the problem of hallucinations. But without a radical overhaul of the underlying technology, large language models will continue to be made up.
Emily M., professor of computational linguistics at the University of Washington"Google is promoting Gemini as a universal machine, a universal model that can be used in many different ways," Bender said. ”
But the company is using narrow benchmarks to evaluate these models for different purposes. "This means that we can't evaluate it effectively and thoroughly," she said. ”
It's been a long time since Gemini was born. In April 2023, Google announced the merger of its AI research arm, Google Brain, with Deepmind, an AI research lab.
As a result, Google spent almost a year developing Gemini for OpenAI's state-of-the-art large language model, GPT-4. The model debuted in March 2023 and underpins the Plus paid version of ChatGPT.
Google has been under tremendous pressure to prove to investors that it can match and surpass its competitors in the field of artificial intelligence.
Although the company has been developing and using powerful AI models for years, it has been hesitant to roll out a similar tool to the public due to concerns about reputational damage and security concerns.
Google has been very cautious about releasing these things to the public. Turing Award winner Geoffrey Hinton told MIT Technology Review when he left Google in April 2023, "There are so many bad things that could happen that Google doesn't want to ruin its reputation." "Google always treads cautiously in the face of seemingly untrustworthy technology until the risk turns into a miss.
Google learned in its fall that launching a defective product would backfire. When the company unveiled its ChatGPT competitor, Bard, in February 2023, scientists quickly noticed a factual error in the company's chatbot marketing content. The event then wiped out $100 billion in its share price.
In May 2023, Google announced that it would roll out generative AI in most of its products, from email to productivity software. But critics don't stop there, with chatbots mentioning non-existent emails, for example.
This problem is prevalent in large language models. Although generative AI systems are very good at generating what looks like humans have written it, it often makes things up crabble.
That's not their only problem. They are also easy to "jailbreak" and are fraught with prejudice. The content they generate can also cause text pollution.
Gemini could be the pinnacle of this wave of generative AI. But it's unclear where AI based on large language models will go next. Some researchers believe that this may be the next step to flatten.
Pichai disagreed. "Looking to the future, we see a lot of space. "I think multimodality is going to be significant." As we teach these models to reason more, there will be more and more breakthroughs. A deeper breakthrough is yet to come. "When I look at the big picture, I really feel like we're just getting started. ”
About the author: Will Douglas He**en is a senior editor in the artificial intelligence section of MIT Technology Review, where he reports on new research, emerging trends, and the people behind them. Previously, he was the founding editor of Future Now for Technology and Geopolitics on the BBC and the chief technical editor of New Scientist magazine. He holds a PhD in Computer Science from Imperial College London, UK, and is well versed in working with robots.
About the author: Melissa Heikkil is a senior reporter at MIT Technology Review, where she focuses on artificial intelligence and how it is changing our society. Previously, she wrote about AI policy and politics at politico. She also worked for The Economist and worked as a news anchor.
Matt Honan was also helpful for this article.
Support: ren