Mistral AI, a large-scale model startup, has finally introduced the MOE model that was "open source" two days agomixtral 8x7b
Officially, the mixtral 8x7b is of high quality with open weightsSparse Hybrid Expert Model (SMOE).with Apache 20 license open source. In most benchmarks, Mixtral outperformed LLAMA 2-70B and achieved a 6x faster inference speed. And it exceeds GPT-3 in most standard benchmarks5。
As a result, Mistral AI calls mixtral the most powerful open-weights model and the best model in terms of cost-performance trade-offs.
mixtral key features
32k context.
Available in English, French, Italian, German, and Spanish.
Exceeds LLAMA 2 series and GPT-35
Strong performance in terms of ** generation.
Get an 8 on mt-bench3 points.
Mixtral, as a sparse blending expert network, is a decoder-only model in which feedforward blocks are selected from 8 different sets of parameters. At each layer, for each token, the routing network selects two sets of "experts" to process the tokens and additively combine their outputs.
The mixtral has a total of 45b parameters, but only 12b parameters are used for each token. As a result, it processes inputs and generates outputs at the same speed and cost as the 12b model.
For more details, check out: