Recently, Microsoft announced the launch of PHI-2, a small language model (SLM), marking an important breakthrough in the field of AI and LLM. Phi-2 is positioned as Phi-15, which Microsoft claims outperforms larger models such as Llama-2, Mistral and Gemini-2 in various generative AI benchmarks.
What is a large language model:
Large language models and small language models are two different sizes of AI models that are primarily used to process and understand natural language. Here, we can understand them better with some analogies.
A large language model is like a vast library with countless books, each filled with a variety of knowledge and information. This library can answer almost any question because of its breadth of knowledge and deep understanding. In the field of artificial intelligence, a model like GPT-3 (the predecessor of ChatGPT) is a typical large language model, which has 175 trillion parameters, which is equivalent to hundreds of millions of books in a library.
What is a small language model:
A small language model is like a small study with a few books in it, but on a relatively small scale. This little den may not answer all questions, but it can be a very professional and efficient resource on a particular topic or field. In contrast to large language models, small language models may have only millions to billions of parameters. While it's not as powerful as a large model in breadth and depth, in some contexts, a small language model may be a better fit, like a small but delicate encyclopedia.
The choice between a large or small model depends on the specific needs of the task. Large models are suitable for handling a variety of problems, while small models are better suited for efficiently handling domain-specific tasks with limited resources. This analogy can help us better understand their role and use in the field of artificial intelligence.
The launch of PHI-2 is the result of the tireless efforts of Microsoft's research team and is part of a series of new initiatives announced by Satya Nadella at Ignite 2023. The transformer-based model was trained using "textbook quality" data, including synthetic datasets, general knowledge, theory of mind, and daily activities, demonstrating strong performance.
Compared with large models, phi-2 is simpler and less expensive to train. Microsoft says training with tens of thousands of A100 Tensor Core GPUs takes just 90-100 days, not months. PHI-2 is not limited to language processing, but also solves math and physics problems, and even identifies errors in students' calculations.
Small language models (SLMs) have made their mark in the field of natural language processing, and the introduction of PHI-2 has made it a highly competitive competitor. Compared with traditional large language models, SLM has obvious advantages in terms of computational efficiency, fast inference, resource friendliness, energy saving, reduced training time, interpretability, and cost-effectiveness. These benefits are redefining the landscape of language processing technology, providing more flexible and efficient solutions for a variety of specific use cases and contextual needs.
It is worth emphasizing that the decision to choose a large or small model should depend on the specific requirements of the task. While large models excel at capturing complex patterns, the advantages of small models in terms of efficiency, speed, and resource constraints make them invaluable in specific scenarios. The launch of PHI-2 marks the beginning of a new era in which small language models challenge the giant LLMs. This change is not only a technological advancement, but also a disruptive challenge to the field of AI.