Approaching the big leak of GPT 4 open source model, the AI community fryer! The Mistral CEO reveale

Edit: Aeneas has solved the case! The new open-source model Miqu, which has made many netizens scratch their hearts, is indeed an old quantitative version of the mistral training model, which was retrained on LLAMA 2. Today, the CEO of Mistral personally confirmed this.

The CEO said the model was an overly enthusiastic employee leaked from early access customers. In the past few days, this unsolved case, which has aroused heated discussions in the AI community, finally has an answer. Mystery model leaks.

Here's the thing. On January 28th, a user named Miqudev posted a set of files on Huggingface, which together form a seemingly new open-source LLM called Miqu-1-70B.

Curiously, netizens found in the HuggingFace entry that the prompt format of this new model is exactly the same as that of Mistral. Subsequently, the link to MIQU-1-70B was leaked on 4chan again.

The link was posted by an anonymous user on 4chan, who is speculated to be most likely on miqudev. Subsequently, netizens on X rushed to tell each other, because they found that the performance of the MIQU-1-70B is too strong! In the EQ-Bench benchmark, it is even close to the previous model king, GPT-4.

Netizens are puzzled as to why this mysterious new model can beat Mistral Medium and get close to GPT-4. Someone suggested that it would be better to use EQ-bench to check whether the miqu dataset was contaminated.

The real face of the MIQU model, is it mistral or llama?

In order to determine the real identity of Miqu, some netizens sent the same Russian question to Mistral-Medium and Miqu. It turned out that the answers of the two models turned out to be exactly the same in Russian.

In the end, he came to the conclusion that I now 100% believe that MIQU is Mistral-Medium.

Some netizens stayed up late to test and compare the capabilities of the MIQU and Mistral models.

It turned out that miqu and mixtral were indeed very similar, both in German spelling and bilingual grammar, as well as in some of the linguistic conventions in the responses. Overall, the Miqu outperforms the Mistral Small and Medium and lags behind the Mistral 8x7B Instruct. Therefore, MIQU may be a leaked version of the mistral model, an older proof-of-concept model. Of course, there are also some developers who believe that the MIQU is more like the LLAMA 70B than an expert hybrid model.

Based on speculation at the time, MIQU may have been an early version of Mistral Medium or a fine-tuning of LLAMA 70B on the Mistral Medium dataset. Mistral Quant Edition?

As the noise grew louder, Maxime Labonne, a machine learning researcher at JPMorgan Chase, took notice.

He posted that it's not yet certain if MIQU is the quantized version of MISTRAL, but it's certain that it will soon become one of the best open source LLMs. And thanks to @152334h, you now have an unquantified version of the MIQU.

Labonne says that now that the investigation continues, we will soon see that the fine-tuned version of the MIQU will outperform the GPT-4! In machine learning, quantization refers to the technique that makes it possible to run certain AI models on less powerful computers and chips by replacing specific long sequences of numbers in the model architecture with shorter ones.

Many people have guessed that MIQU is most likely a new mistral model that the company deliberately wanted to leak out. After all, the last magnet link incident showed that mistral has always had a tradition of releasing new models with great fanfare through esoteric technical means. Or, it could be leaked by an employee or customer. CEO confirms: Yes, it's a quantitative version of Mistral.

Today, things have finally come to light. Arthur Mensch, co-founder and CEO of Mistral, clarified on X:

We had an early access customer who docked with his overly enthusiastic staff leaking quantified and watermarked versions of old models we had trained and publicly distributed. To quickly start working with a few selected customers, we retrained the model from Llama 2 the moment we accessed the entire cluster – the pre-training was done on the day of Mistral 7b's release. Since then, we've continued to make good progress, so stay tuned!

Interestingly, the CEO did not ask for the post on HuggingFace to be removed, but instead stated that the poster may need to consider attribution . In short, stay tuned These four words show that Mistral seems to be training more than just this MIQU model that is close to GPT-4. Open source AI enters a critical moment?

The leak of the MIQU model caused such an uproar also because it could potentially be a watershed moment for open-source generative AI, and for the AI and computer science field as a whole. Released in March 2022, GPT-4 remains the world's most powerful LLM in most benchmarks. Not even Google's long-rumored Gemini can't beat it. (According to some tests, the current Gemini model is actually better than the old OpenAI GPT-3.)5 The model is even worse). If there is a model with GPT-4 performance that can be used for free commercial use, it will inevitably have a huge impact on OpenAI and its subscription services. Especially now that more and more enterprises are looking for an open source model or a mix of open source and closed source to support their applications. Relying on GPT-4 Turbo and GPT-4V, OpenAI has done its best to maintain its advantage, but the rapid catch-up of the open-source AI community cannot be ignored. Does OpenAI have enough of a lead, and does the GPT Store and other features have a moat that keeps ChatGPT at the top of the LLM list?

Approaching the big leak of GPT 4 open source model, the AI community fryer! The Mistral CEO reveale

Related Pages

A fresh take on the LLM large model GPT 4

Compared with GPT 4, is the Xinghuo cognitive model good?

The first open-source MoE model was released!GPT 4 has the same architecture, from OpenAI in Europe

Power plant Spark large model, not only a catch-up with GPT 4

Another domestic large model that comprehensively benchmarks GPT 4 is here, with medical data exceed