Last night, Google released the open-source model Gemma without warning, focusing on lightweight, high-performance, and pointing at Meta's Llama. The release of GEMMA marks a shift in Google's large-scale model strategy: from betting on closed source to catch up with OpenAI in the past, to returning to the open source arena, taking into account both open source and closed source.
Obviously, Google's shift was forced. Since making up its mind to close the source last year, Google has obviously underestimated the technical difficulty of catching up with OpenAI, and has continued to be suppressed by OpenAI, or even has no power to fight back. Even the recently released large-scale nuclear bomb Gemini 15, the limelight was all stolen by Sora.
However, compared with closed source, Google also faces a lot of challenges in the open source battlefield. Although Google has a clear technical advantage and has a lot of experience in building open source communities. However, in the case of Meta, mistral and other players gradually dominating the open source market, Google, which does not occupy the time, must invest more resources if it wants to catch up.
Looking back at the history of scientific and technological competition, the advent of every new era means the decline of the technological hegemony of the previous era. Will Google be exempt from this end? From this point of view, open source is Google's "Battle of Stalingrad" on the battlefield of artificial intelligence.
01 The most powerful open source model is here!
Gemma, which means "gem" in Latin, was developed in collaboration with Google Deepmind and other teams, using the same research and technology as Gemini.
GEMMA has released two versions with 2 billion and 7 billion parameters, and each scale is divided into two versions: pre-training and instruction fine-tuning. With the support of Gemini's technology, GEMMA has formed a crushing of the existing open source large model. GEMMA beats the current mainstream open-source models Llama 2 and MISTRAL in an average score across 18 benchmarks, especially in mathematics and ability.
Among them, the GEMMA-7B model has outperformed LLAMA 2 7B and 13B in 8 benchmarks covering general language comprehension, reasoning, math, and coding. In terms of safety, both the Gemma-2B IT and GEMMA-7B IT models, which have been fine-tuned by the instructions, surpassed the MISTAL-7B v0 in human preference assessments2 model.
However, unlike Gemini, which supports multimodality, the GEMMA model is not multimodal and is not trained for multilingual tasks. But according to a technical report published by Google, gemma's tokenizer vocabulary size reaches 256k.
How do you understand this? The larger the scale, the stronger the understanding of complex sentences and unfamiliar words, and the faster it can understand other languages. Gemma's tokenizer vocabulary size reaches 256k, meaning it can learn to use other languages very quickly.
In addition to the model itself, another point worth noting is that GEMMA is designed and trained from the ground up with a strong focus on security, which means that it is ideal for on-premise deployment. For example, Google uses Google Cloud Data Loss Prevention (DLP) tools to automatically filter out private information and sensitive data from the training set. The tool outputs three severity levels based on the category of private data, such as name, email, etc. According to a technical report released by Google, the most sensitive information is almost completely not stored, and potentially private data is partially stored.
After the release of the model, GEMMA also launched HuggingFace and HuggingChat for the first time, and users can directly try it out on these platforms. It's only been a few hours since its release, and many users have already shared their trial experience, and some users have even made high evaluations of it, with users of social platform X @indigo11 calling it "fast" and "stable output".
02 The "screwed" Google is under a lot of pressure
Counting the gemma released this time, it is the third big move that Google has released in just one month.
On February 9, Google announced that its most powerful model, Gemini Ultra, was free to use, and when Gemini Ultra was released in December 2023, it surpassed human experts in the MMLU (Massive Multitasking Language Understanding) test, achieving 30 SOTA (current best results) out of 32 multimodal benchmarks, surpassing GPT-4 in almost all aspects.
On February 16, the seventh day of the Lunar New Year, Google released its large-scale nuclear bomb, Gemini 15, and expand the context window length to 1 million tokens. gemini 1.The 5 Pro can process 1 hour of audio at a time, with a library of more than 30,000 lines** or more than 700,000 words**, challenging the yet-to-be-released GPT-5.
Despite Google's frequent actions, the limelight has been stolen by OpenAI's Wensheng ** large model Sora. And the reason why Google suddenly released the open source model without warning this time is precisely because it does not want to repeat the mistakes of the past. After all, there is news that Meta will release an upgraded version of LLAMA within this week.
On the surface, Google's preemptive release of the open source model is to salvage the recent "decline". But the deeper reason is that Google wants to change the situation that has been suppressed by OpenAI for a long time and explore more possibilities for "corner overtaking".
An extremely cruel fact is that since the release of ChatGPT in December 2022, Google, which was once the leader in the AI field, has been suppressed by OpenAI to the death, and there is no way to fight back.
In February last year, OpenAI's ChatGPT swept the world, and Google hastily launched the chatbot Bard, but the product was not as expected. First, the demonstration was a factual error, which wiped out the market value of Google's parent company by 100 billion dollars overnight; After that, it failed to attract enough users with its performance, and according to SimilarWeb, Bard only had 2200 million times, which is only 1 8 of ChatGPT.
On December 7 last year, Google released its most powerful model, Gemini, which surprised the market despite its amazing results. On January 31, 2024, Google's latest financial report showed that its revenue was outstanding, but its market value evaporated by more than $100 billion overnight due to the progress of AI that was not as expected.
With the release of SORA, more and more people are aware of a problem: OpenAI's growing advantage over closed-source models, driven by Scaling Law. In other words, this announcement of its foray into open source is more like a forced move by Google in the AI race.
On the one hand, compared with Meta's entry into the open source model in the middle of last year, Google's entry into the open source model is half a year late, which means that it needs to spend several times the effort to differentiate the model and invest in promotion before it is possible to stand out among many open source models. On the other hand, at least from the content of the disclosure, compared with other open source models, the open source model launched by Google has not exceeded expectations much.
But even so, the foray into the open-source model still makes a lot of sense for Google. After all, with OpenAI on the back burner, Google desperately needs a win to turn things around. And the open-source model could become Google's battle for Stalingrad on the AI battlefield.
03 Google returns to open source
Historically, open source has been no stranger to Google, and even for quite some time, Google has been a strong supporter of open source technology. Historically, Transformers, TensorFlow, BERT, T5, Jax, Alphafold, and AlphaCode have all been innovations that Google has contributed to the open source community.
In November 2015, Google unveiled TensorFlow, which became one of the most popular open-source deep learning frameworks. Anyone with a computer and an internet connection (and a little bit of deep learning algorithms) can use the most powerful machine learning platform ever created. Since 2015, thousands of open source contributors, developers, community organizers, and researchers have invested in this open source software library.
In 2018, Google announced that it would open source BERT, a neural network-based natural language pre-trained processing technology whose use is not limited to search algorithms, and anyone can use BERT in other types of question-answering systems. Not to mention, in the era of mobile Internet, Google has also established an Android open ecosystem that is comparable to Apple.
Unlike OpenAI, simple and brutal scaling law isn't Google's only path. The open-source GEMMA not only means that it wants to reshape its influence in the AI community, but also represents a shift in Google's large model strategy: both open source and closed source.
In the current AI industry, it is indeed a good choice for Google to develop an open-source model.
On the one hand, compared with the closed-source model, the competition of the open-source model is relatively small, and the competition mainly comes from Meta's Llama. The technical prowess of closed-source models and the experience of open source communities like Google make it more likely to form a crushing situation.
On the other hand, in the story of AI landing, the open source model still has a lot of potential. The reason is that the cost advantage of the open source model will be more conducive to the implementation of the model. In many scenarios, using GPT is like driving a Lamborghini to deliver food, and the cost is too high. An AI researcher has made a calculation, GPT-3The API of 5 is almost 3-4 times more expensive than the inference cost of the open-source model Llama2, not to mention GPT-4.
In the era of mobile Internet, Microsoft, which is close to decline, has achieved the final turnaround by relying on cloud computing. Now, it remains to be seen whether Google, which is not as good as the AI story, can replicate this path.