Visual China.
On January 29, Baichuan Intelligence, a Chinese artificial intelligence start-up, released Baichuan 3, a large language model with more than 100 billion parameters. In a number of authoritative general competency evaluations such as CMMLU, Gaokao, and Agi-Eval, Baichuan 3 has demonstrated excellent capabilities. In a number of Chinese evaluation lists such as CMMLU, Gaokao, Humaneval and MBPP, it has surpassed GPT-4 to show its advantages in Chinese tasks.
Different from the training of tens of billions and tens of billions of parameter models, the requirements for high-quality data, training stability, and training efficiency of more than 100 billion parameter models are several orders of magnitude higher in the training process. In order to better solve related problems, Baichuan Intelligent has proposed a variety of innovative technical means and solutions such as "dynamic data selection", "importance maintenance" and "asynchronous checkpoint storage" in the training process, which has effectively improved the capabilities of Baicuan 3.
In terms of high-quality data, traditional data screening relies on manual definition, and filters data through methods such as filtering, quality scoring, and textbook filtering. Baichuan Intelligence believes that data optimization and sampling is a dynamic process, which should be optimized with the training process of the model itself, rather than relying solely on manual prior data sampling and screening. In order to comprehensively improve the data quality, Baichuan Intelligent has designed a set of dynamic training data selection scheme based on causal sampling, which can dynamically select the training data during the model training process and greatly improve the data quality.
Baichuan Intelligence's medical knowledge level has also made breakthroughs. The number of tokens in Baichuan3's medical dataset exceeds 100 billion, and its medical capacity is close to GPT-4. In order to inject rich medical knowledge into Baichuan3, Baichuan Intelligent has built a medical dataset of more than 100 billion tokens in the model pre-training stage, including medical research literature, real electronic medical record data, professional books and knowledge base resources in the medical field, and Q&A materials for medical problems. The dataset covers all aspects of medical knowledge from theory to practical operation, from basic theory to clinical application.
At present, Baichuan Intelligence did not disclose the number of model parameters, but only announced that Baichuan3 is a large model with more than 100 billion parameters. In contrast, GPT-35 is the 175 billion parameter.
Baichuan Intelligence was co-founded by Wang Xiaochuan and Ru Liyun, and was established in April 2023 based on the Sogou team. According to reports, the company has received $50 million in start-up capital since its inception.
The speed of Baichuan Intelligence has been very fast. Less than 100 days after its establishment, Baichuan Intelligent released two open-source and free commercial Chinese large models of Baichuan-7B and Baichuan-13B. From baichuan 10 to now 30, it only took 9 months.
Just a month ago, on December 19, 2023, Baichuan Intelligent announced the opening of the Baichuan2-Turbo series API based on search enhancement, including Baichuan2-Turbo-192K and Baichuan2-Turbo, and on the basis of supporting 192K context windows, it also added the ability to search for an enhanced knowledge base.
Compared with baichuan2-192k, baichuan3 has a reduction in the length of the allowable input text. When baichuan2-192k was launched, it allowed users to enter up to 350,000 words of text, claiming that it could read a copy of "Three-Body Problem 2" at one time, making it the world's largest model with the longest processing context window. Currently, baichuan3 allows you to enter text up to 4096 characters, which is equivalent to 2000 Chinese characters or 3000 English words.
In the past year's large-scale model entrepreneurship, training the industry's vertical large-scale model through industry data is considered to be the main path for the large-scale model to land on the B-side. According to Jiazi Lightyear, Baichuan Intelligent upgraded the vector database to a search-enhanced knowledge base, which improved the ability of large models to obtain external knowledge; The combination of a search-enhanced knowledge base and an extra-long context window allows the model to connect to all enterprise knowledge bases and network-wide information.
At the ** communication meeting at the end of last year, Wang Xiaochuan revealed that the first focus of Baichuan Intelligence's C-end products is the medical direction, and the products are expected to be launched in 2024.