Google's Deepmind has released a ChatGPT competitor called Gemini. (*bloomberg / contributor/getty images)
Google Deepmind has released ChatGPT competitor Gemini, which can understand and generate many types of **, including images, **audio, and text.
Most artificial intelligence (AI) tools can only understand and generate one type of content. For example, OpenAI's ChatGPT "reads" and creates only text. But gemini can generate many types of output based on any form of input, Google said in a blog post.
gemini 1.The three versions of 0 are the largest version, the Gemini Ultra, the Gemini Pro, which is being rolled out to Google's digital services, and the Gemini Nano, which is intended for devices such as smartphones.
According to Deepmind's technical report on chatbots, Gemini Ultra beat GPT-4 and other leading AI models in 30 of the 32 key academic benchmarks used in AI R&D. These include high school exams as well as ethics and law exams.
Specifically, Gemini won in 9 image comprehension benchmarks, 6** comprehension tests, 5 speech recognition and translation tests, and 12 out of 10 text and reasoning benchmarks. According to the report, the two articles in which Gemini Ulta failed to beat GPT-4 are common-sense reasoning.
Building models that handle multiple forms is difficult because bias in the training data can be amplified, performance tends to degrade significantly, and models tend to be overfitted – meaning they perform well when tested against the training data, but cannot be executed when exposed to new inputs.
Multimodal training also typically involves training different components of the model individually, each on a type of medium, and then stitching those components together. But Gemini is simultaneously trained in text, images, audio, and data. Scientists obtained this data from web documents, books, and **.
Scientists trained Geminis by collating training data and incorporating human supervision into the feedback process.
The team deployed servers in multiple data centers, which were much larger than previous AI training efforts and relied on thousands of Google's AI accelerator chips — called tensor processing units (TPUs).
DeepMind specifically built these chips to speed up model training, and before training its system, DeepMind packaged them into clusters of 4,096 chips, called "superpods." According to the technical report, the overall result of the reconfigured infrastructure and approach implies a good investment – the amount of really useful data moving through the system (as opposed to throughput, i.e., all data) – from 85% to 97% of previous training efforts.
Deepmind's scientists envision that the technology will be used in scenarios such as a person uploading a meal being prepared in real time, and Gemini responding to instructions for the next step.
That said, scientists do admit that hallucinations – a phenomenon in which AI models return false information with the utmost confidence – remain a problem for Gemini. Hallucinations are often caused by limitations or biases in the training data and are difficult**.