The open source MoE model Mixtral 8x7B outperforms GPT 3 5

Mondo Digital Updated on 2024-01-29

Mistral AI, a large-scale model startup, has finally introduced the MOE model that was "open source" two days agomixtral 8x7b

Officially, the mixtral 8x7b is of high quality with open weightsSparse Hybrid Expert Model (SMOE).with Apache 20 license open source. In most benchmarks, Mixtral outperformed LLAMA 2-70B and achieved a 6x faster inference speed. And it exceeds GPT-3 in most standard benchmarks5。

As a result, Mistral AI calls mixtral the most powerful open-weights model and the best model in terms of cost-performance trade-offs.

mixtral key features

32k context.

Available in English, French, Italian, German, and Spanish.

Exceeds LLAMA 2 series and GPT-35

Strong performance in terms of ** generation.

Get an 8 on mt-bench3 points.

Mixtral, as a sparse blending expert network, is a decoder-only model in which feedforward blocks are selected from 8 different sets of parameters. At each layer, for each token, the routing network selects two sets of "experts" to process the tokens and additively combine their outputs.

The mixtral has a total of 45b parameters, but only 12b parameters are used for each token. As a result, it processes inputs and generates outputs at the same speed and cost as the 12b model.

For more details, check out:

The open source MoE model Mixtral 8x7B outperforms GPT 3 5

Related Pages

The first open-source MoE model was released!GPT 4 has the same architecture, from OpenAI in Europe

China's open source model topped the HuggingFace rankings

Why is Alibaba Cloud going further and further on the road of open source of large models?Niu Tou Qu

Apple's largest action in large models The open-source M-core dedicated ML framework can run 7 billi

The strongest "all-open source" multi-modal segmentation of all large model APE