What exactly is Google s Gemini?Has he surpassed ChatGPT?Bound below .

In a previous article, we covered the release of Google Gemini this month, a significant upgrade in capabilities, hardware and software applications, and its significant impact on the future of AI. However, there are also shadows hidden under Google's aura. The second article will delve into a series of negative presses since Gemini's launch. We will expose the controversy about the promotion of ** fraud, employees for its performance is not as good as GPT-35, the doubts about using Wenxin Yiyan for training, and the CMU's evaluation of its performance is not as good as GPT-35。These negative news have had a big impact on both Google and its Deepmind.

On December 6, Google ushered generative AI into a new era of native multimodality with the release of the highly anticipated Gemini. However, it wasn't all smooth sailing for the tech event. In particular, Gemini's release of "Hands-On with Gemini: Interacting with Multimodal AI" has sparked widespread controversy, with the 2.6 million times being accused of false content.

In this episode, Google shows how Gemini can flexibly respond to a variety of inputs, including language and visual comprehension. But disturbingly, the disclaimer in the description reveals some key facts:For demonstration purposes, Gemini's output time has been deliberately shortened, and the actual latency has been reduced. While this editing technique makes the presentation more concise, it also misleads the audience into believing that Gemini can understand and respond instantly and accurately.

For example, in one of the examples, paper balls are exchanged between cups. **Medium, Gemini looks direct*** detection and tracking.

But in fact, every time you operate the cup, you have to interact with the large model and train the gemini.

In another, a demo of a gesture game is particularly eye-catching. In the demo, Gemini seems to be able to directly recognize rock, scissors, and paper gestures.

However, Google's blog revealed the actual process: Gemini needs to upload three different gestures and give corresponding prompts to make a correct judgment.

In addition, there is a demonstration of the sequence of planets in the solar system. **, it seems that only the question "Is this order correct?".Gemini was able to answer correctly.

But the actual operation requires the user to provide detailed prompts, including considering the distance of the planet from the sun and asking Gemini to explain its reasoning.

These examples show that the "intuitive" interactions shown in ** do not actually happen, but are processed in post-production. The discovery raises concerns that Google may be exaggerating Gemini's capabilities, pointing out that Google is too eager to show that it outperforms its competitor, GPT-4. also made it the object of ridicule by netizens during the follow-up competition press conference.

Google's Gemini, despite its high expectations, has sparked quite a bit of controversy among its own employees. Bloomberg reportsAlthough Google has demonstrated Gemini's multimodal capabilities,Especially when it comes to mapping the real-time analysis and response of the ducks,But internal employees questioned it

Some Google employees say that Gemini's actual performance is not as responsive and intelligent as advertised**. They argue that the demo in ** glorifies Gemini's capabilities too much, making it seem incredibly easy to output high results from Gemini.

Eli Collins, VP of Product at Google Deepmind, later respondedThe demonstration of the duck drawing is still in the research phase and is not an actual product。This statement shows that despite Gemini's technological breakthroughs, it will take time for its actual products to be launched.

In addition, for Gemini and other leading models, especially OpenAI's GPT-3The comparison of 5 is also of concern. The report notes that although the Gemini Ultra scored a whopping 90% on Google's self-developed test methodology, it scored 83 when using the industry-standard 5-shot MMLU test7%, which is lower than 86 for GPT-44%。

All of this adds up to a bit of uncertainty about Gemini's future.

Google's gemini used Wenxin Yiyan when training Chinese corpus quickly sparked heated discussions on the Internet on December 18. Weibo influencer "Yan Xi" conducted a practical test and found that Gemini would directly admit that he used Wenxin Yiyan in the exchange, and falsely claimed that its founder was Robin Li.

This issue was found when using gemini-pro for Chinese conversations on Google's Vertex AI platform, but it is not reproducible on the Bard platform. It's worth noting that Gemini returns to normal when communicating in English.

Google has made fixes for these bugs in the API and does not expect similar issues to occur again. However, this incident highlights the importance of data** in AI training. Previously, in March this year, part of Google's Bard's training data was exposed to ChatGPT, which led to the departure of BERT's Jacob Devlin and joining OpenAI.

Researchers at Carnegie Mellon University and Berriai conducted a comprehensive review of Google's Gemini Pro and found that its performance was inferior to GPT 3 in several areas5 turbo。The research team used ten datasets, including MMLU, BigBenchhard, GSM8K, Flores, Humaneval, and WebArena, to test and compare Gemini Pro and GPT-35 Turbo, GPT-4 Turbo, and Mixtral text comprehension and generation capabilities. The tests cover knowledge-based question-answering, reasoning, math problem solving, translation, generation, and the ability to follow instructions as an agent.

In terms of knowledge-based Q&A, Gemini Pro slightly lagged behind GPT-3 in overall accuracy in the MMLU test with 5-shot and chain-of-thought cuewords5 turbo。In addition, the study also noted that there was little difference in performance using chain-of-thought prompts, possibly because MMLU tasks were primarily knowledge-based question-answering and would not benefit significantly from stronger reasoning-oriented prompts. Of the 57 MMLU subtasks, only two Gemini Pro outperformed GPT-35 turbo。

In terms of general inference, Gemini Pro also performs slightly worse than GPT-3 in the Big-Bench Hard dataset5 Turbo, and much lower than GPT-4 Turbo. The study found that Gemini Pro did not perform well when dealing with longer, more complex problems, while the GPT model showed greater robustness in this regard. The GPT-4 Turbo, in particular, has very limited performance degradation even in the face of very long problems.

When it comes to math problem tests,Four mathematical reasoning datasets, GSM8K, SVAMP, ASDIV, and MAWPSThe Gemini Pro's overall performance is slightly lower than GPT-35 Turbo and much lower than GPT-4 Turbo. Although the Gemini Pro achieves over 90% accuracy in the MAWPS test, it still doesn't perform as well as the GPT model. The analysis shows that the Gemini Pro does not perform well when dealing with tasks that involve complex language patterns.

In terms of generationThe performance of Gemini Pro was tested using the Humaneval and Odex datasets. On both datasets,Gemini Pro's pass@1 scores are all lower than GPT-35 Turbo and GPT-4 Turbo。Odex GPT-3The score of 5 is even higher than GPT-4. The study also found that the Gemini Pro performed better at shorter tasks, but not as well as at solving longer and more complex problems.

In terms of machine translation testingUsing the Flores-200 dataset, the machine translation capabilities of Gemini Pro were evaluated. The results showedGemini Pro generally outperforms other models in translation tasks across multiple language pairs, but there is a tendency to mask responses in some language pairs.

Finally, the researchers evaluated Gemini Pro as a WebArena environmentThe ability to navigate the web**。The test results showed:The overall success rate of Gemini Pro is similar to GPT 3The 5 turbo is comparable, but slightly lacking

This study is the first to conduct a comprehensive and objective evaluation of Google's Gemini Pro model, and it is comparable to OpenAI's GPT 35 and GPT 4 models and the open-source mixtral model were compared. The results show that while the Gemini Pro is close to GPT 3 in terms of accuracy5 Turbo, but still slightly lacking in most missions, let alone GPT-4. But in comparison, Gemini Pro outperforms the open-source mixtral model.

Since its release, Google Gemini has faced a lot of negative feedback, despite showing a series of innovations and technological breakthroughs. From alleged propaganda fraud, internal staff questioning of its capabilities, to controversy over the use of Wenxin Yiyan training, to Carnegie Mellon University's comprehensive evaluation showing that it is inferior to GPT-35. Gemini's journey is full of challenges. Sit back and wait for the three-way evaluation of Gemini Ultra.

Today is the last day of 2023, I wish all friends who pay attention to the development of artificial intelligence a happy new year, and look forward to witnessing more technological innovations and breakthroughs in the new year.

If you think this article is helpful to you, welcome to like, bookmark and share. In the meantime, please follow me to get more latest news and insights on artificial intelligence!

What exactly is Google s Gemini?Has he surpassed ChatGPT?Bound below .

Related Pages

What exactly is calligraphy?

What exactly is insight?

What's holding us back?

What is it that traps me?

What exactly is the meaning of marriage