Build a "bridge" between people and large models.Author Sukhoi.
Edited by Chestnut.
In GPT-35Before it emerged, You Yang, President of the National University of Singapore and Founder and Chairman of Luchen Technology, realized that large models would become an important development direction in the future.
As early as 2018, he participated in the training of Google's BERT model, and successfully shortened the pre-training time from three days to 76 minutes, and this optimization method is still used by many companies today.
In 2020, OpenAI launched GPT-3, the world's largest pre-trained language model, which piqued You Yang's interest in large-scale model development. In 2023, the field of artificial intelligence will usher in the year of the explosion of large models. The craze quickly swept the world, and AI became a battleground for all industries.
According to IDC (International Data Corporation)**, the AI software market size will reach 76 by 2026$900 million. Each of us can clearly feel that artificial intelligence is moving from perceiving and understanding the world to the stage of generating and creating the world, and promoting the acceleration of industrial intelligence into an inflection point.
As a researcher of high-performance computing, You Yang is also paying close attention to the latest developments in the large model industry.
He is very optimistic about China's AI background. "Thanks to national policy support, capital and talent convergence, the domestic AI industry is currently experiencing a period of rapid growth. "Our research universities and research institutions play a central role in basic AI research, technology development and talent training, and these efforts are continuously enhancing the competitiveness of China's AI industry in the world." ”
However, at the same time, You Yang also realized that both AI beginners and industry insiders are facing the "threshold" of large models.
Beginners entering the field of large models need to overcome high-tech "thresholds".The complexity of large models and the continuous updating of technologies have increased the challenge of understanding and mastering these technologies.
The "threshold" in front of practitioners lies in how to skillfully "harness" this cutting-edge technology. In order to stand out in the fierce market competition, practitioners need to find strategies to maximize the potential of large models to reduce costs and increase efficiency.
In order to help people cross this "hurdle", You Yang came up with the idea of writing a "practical guide to large models". He hopes to build a "bridge" between people and large models.
He told Jiazi Lightyear: "I want to share my knowledge and experience in the field of high-performance computing and AI large models with more people, and I hope that through this book "Practical AI Large Models", readers can provide readers with my personal insights and suggestions, and related topics with more people." ”
Within a week of its launch, "Practical AI Model" ranked first among artificial intelligence books on the Jingdong Book List, **provided by the interviewee.
Before understanding the "Practical AI Model", it is necessary to get to know the author of this book, Professor You Yang.
You Yang graduated from the University of California, Berkeley. During his graduate studies, he was the first author of the Best of the 2015 International Conference on Parallel and Distributed Processing (IPDPS). During his time at Berkeley, You Yang earned a LotFi AThe Zadeh Prize, which is given to Berkeley Ph.D. graduates who have made outstanding contributions to soft computing and its applications. In 2017, his team broke the world record for ImageNet training speed, which was widely reported by NSF, ScienceDaily, Science Newsline, and i-Programmer.
You Yang delivered a keynote speech at the 2023 Jiazi Gravity Year-end Ceremony, **Jiazi Lightyear shooting.
What really makes You Yang famous in the AI industry is the series of AI training methods he proposed.
In 2018, during his Ph.D., Yang You published "ImageNet Training in Minutes" as the first author, which won the Best Award at the International Conference on Parallel Processing (ICPP), ranking first among 313 papers, and the proposed Lars optimizer refreshed the world record for ImageNet training speed and shortened the training time of the AlexNet model to only 24 minutes.
In 2019, Yang You once again proposed the lamb optimizer as the first author, and successfully shortened the pre-training time of BERT from the original three days and three nights to 76 minutes, which is 72 times faster than the ADAM optimizer, becoming a mainstream optimizer in the field of machine learning.
In addition, You Yang's team also developed the CowClip algorithm, which significantly improved the training speed of the CTR model. In 2021, he was selected into the Forbes 30 Under 30 list (Asia) and won the IEEE-CS Supercomputing Outstanding Newcomer Award.
It is precisely because of his years of deep cultivation and achievements in the field of AI large models that You Yang has seen the huge gap between the theory and practice of large models.
For today's people, after a whole year of ** reports, the name of the AI model is no longer unfamiliar, and practitioners in some fields have already begun to use the AI model for business optimization.
For example, AI image generation products such as Midjourney, Stable Diffusion, and Dall-E allow users to generate images from text descriptionsIn the audio space, Microsoft's Speech Studio service enables users to create virtual avatars that resemble their own voices.
However, these products only allow users to enjoy the convenience that AI brings to their own work. However, for more professional technicians or more demanding enterprise-level users, it is not enough to only know at the application level.
For example, what are the Transformer model, the BERT model, and the GPT model, and what are their characteristics?What are the advantages of each of the different models?The difficulty of training is **?
You Yang thinksOnly by mastering the basic concepts, classical algorithms, and network architecture of deep learning can we better understand and apply large AI models.
This is the original intention and goal of You Yang to write this "Practical AI Model". He hopes that through this book, readers will be provided with a detailed guide and reference, providing a comprehensive perspective that combines theory and practice, so that readers can understand and apply AI models.
In You Yang's view, each model, whether it is BERT, GPT or PALM, is the crystallization of the evolution of artificial intelligence technology, and contains a deep theoretical foundation and practical experience behind it. That's why he chose to discuss each model separately to ensure that the depth and breadth of each model was adequately covered.
The techniques required to train these models are comprehensively covered in the book. From high-performance computing (HPC) to parallel processing, from large-scale optimization methods to memory optimization, each technology has been carefully selected and deeply researched, and they are the cornerstone of AI large model training and the key to building high-performance AI systems.
For example, the Transformer model has become central to the field of natural language processing (NLP) through its unique "attention mechanism", which has significantly improved the accuracy of machine understanding and text generation
The BERT model enhances the accuracy and flexibility of text processing through a bidirectional training mechanism, and is widely used in language understanding tasks
The Albert model, as an optimized version of BERT, solves the NLP challenge with higher efficiency and smaller model size
The T5 model demonstrates the ability of a unified framework to handle multiple text tasks, which is of great significance for the versatility of AI systems
The GPT series has made significant progress in NLP tasks with its powerful text generation capabilities;
Google's Palm model is a milestone in the field of large models, showcasing the latest advances in AI understanding and generating language.
Of course, there's much more to this book. In addition to the detailed introduction of the principles, training methods, and application scenarios of each model, this book also covers key technologies such as distributed systems, parallel strategies, and memory optimization.
Kai-Fu Lee, founder and CEO of Sinovation Ventures and Zero One Things, spoke highly of the book: "This book not only explains the core concepts of the AI model in simple terms, but also closely fits AI 20 This is the most important technological revolution in history. ”
Mastering theoretical knowledge is only the starting point of practice.
In the application of AI, we need to solve a series of challenges in large model training, such as the management of computing resources and the optimization of training efficiency.
In order to achieve the perfect combination of theory and practice, You Yang specially introduced the Colossal-AI system in the book.
It's an integrated, large-scale deep learning system. It effectively disperses the computing and storage burden through strategies such as data parallelism, model parallelism, and pipeline parallelism, making it possible to train large models with limited resources.
GPT-3 consumes 3200G of memory when it doesn't do anything. You Yang pointed out that since 2016, the scale of AI models has experienced exponential growth. From Microsoft's 20 million parameter model to GPT-4's about 1 trillion to 100 trillion parameters, the model size grows at least 40 times every 18 monthsSince 2019, this growth rate has reached about 340 times.
However, GPU memory only grows by 17 times, which makes it difficult for existing hardware devices to meet the huge computing resources and storage space required when training large models.
In other words,At present, the hardware cannot keep up with the development speed of the model, which is the most important problem for large models to overcome.
To address this challenge, distributed training technology may be the best solution. By splitting and simultaneously executing the training task of a large model on multiple compute nodes, you can make more efficient use of computing resources and accelerate the training process. Even ordinary engineers can train large models with good results by integrating publicly available free datasets such as C4, GitHub, and Books. In addition, selecting an appropriate benchmark model, such as a GPT-3-based design idea, is also a key step in the training process.
The training of large models requires a large amount of GPU and memory resources. Taking high school math as an example, training even a very small model requires a lot of computational operations and memory resources. Technologies such as distributed optimization, efficient communication mechanisms, data parallelism, and distributed storage are essential for training and deploying enterprise-level large models. At the same time, the selection of an appropriate pedestal model and the combination of data parallelism and tensor parallelism have a decisive impact on the realization of efficient training.
As an advanced large model training tool created by Professor You Yang, the Colossal-AI system solves the memory limitation problem encountered when training large models on a single GPU, which is also a part of the special emphasis in "Practical AI Large Model".
Demonstration of the use of colossalchat, ** provided by the respondent.
For example, Colossal-AI is the world's first open-source solution that is closest to the original ChatGPT technical solution. It is based on the LLAMA model, which includes a complete RLHF process of the Chat-like model reproduction scheme ColossalChat. It only takes less than 10 billion parameter model fine-tuning to achieve GPT-35 and the effect of chatgpt.
In addition, based on the accumulation of professional technology in the democratization of large models, Colossal-AI has open-sourced and complete stable Diffusion pre-training and personalized fine-tuning schemes, which can accelerate the pre-training time and reduce the economic cost by 65 times, 7 times lower hardware cost for personalized fine-tuning!What's more, it can quickly complete the fine-tuning task process on the RTX 2070 3050 on the PC, putting AIGC models such as Stable Diffusion at your fingertips.
Through Colossal-AI, I provide detailed practical tutorials in the book, including the steps to train models such as BERT, GPT-3, PALM, VIT, and conversational systems, and explain the key technologies and advantages of the system in depth, helping users improve their research and work efficiency. Finally, through practical tutorials, the theoretical knowledge is transformed into practice. You Yang introduced Jiazi Lightyear," he saidAfter all, hands-on practice is the key to understanding and mastering complex AI models.
You Yang's original intention to develop Colossal-AI stems from his area of expertise - high-performance computing.
His main goal is to improve the efficiency and reduce the cost of large model training. Colossal-AI provides a variety of training methods, such as mixed-precision training, gradient accumulation, and techniques such as data parallelism, tensor parallelism, and pipeline parallelism. Through these methods, the process of model training can be optimized, and the model can be effectively scaled across nodes, which is precisely what traditional training methods cannot achieve.
In addition, its API design is simple and easy to use, so adaptors can get started quickly, and spend more time and energy on model design and optimization instead of solving low-level technical problems.
The route of Colossal-AI is mainly divided into three parts:
Firstly, the system Colossal-AI, which is suitable for GPT, LLAMA and other models, is developed to save time and cost
Second, train industry-specific large models with parameter sizes between 10 billion and 20 billion
Finally, the PaaS platform is developed to integrate customers who need to train large models into the platform to form a positive cycle.
At present, You Yang's focus is to continue to develop and optimize Colossal-AI, while assisting enterprises in the privatization deployment of large models, and plans to further develop in terms of commercialization in the future.
He has always believed thatThe openness of the AI industry is essential to the development of technology。AI technology does not have absolute intellectual property rights, and technology can go further through open source.
This openness and ecosystem building, i.e., attracting a large number of users to use and providing feedback, is the key to the future competition of AI technology. Only by constantly iterating and optimizing can we attract more users, which is essential for building a strong AI ecosystem. You Yang explained.
Starting from the academic research of high-performance computing, it finally moved towards the commercial application of AI technology. You Yang's experience has made him deeply aware of the reliance on high-performance computing for AI when processing large-scale data.
This also inspired You Yang to create the idea of the Colossal-AI platform. He hopes to use Colossal-AI to improve the efficiency of AI processing and computing, and help AI companies speed up product development and save costs.
This idea eventually prompted You Yang to embark on the path of entrepreneurship. After becoming the first President of the Department of Computer Science at the National University of Singapore, You Yang returned to China in July 2021 to found Luchen Technology.
Thanks to its accumulation in technological innovation, Luchen Technology has attracted the support of a number of investment institutions.
In August 2021, Luchen Technology received a seed round of financing of more than 10 million yuan co-invested by Sinovation Works and Zhen **In September 2022, it received another $6 million angel round of financing led by BlueRun Ventures.
Not long ago, You Yang led the team to win the AAAI 2023 Outstanding ** Award, which attracted wide attention in the AI industry. Then Luchen Technology announced the completion of Series A financing in May this year, with an amount of hundreds of millions of yuan. According to the company, this is the third round of financing in the 18 months since the establishment of Luchen Technology, and the funds will be mainly used for team expansion and business development.
In November of the same year, Luchen Technology announced the completion of nearly 100 million yuan in Series A+ financing, which was led by a Fortune 500 technology giant, and the Greater Bay Area** and Singapore Telecom Investment Company (Singtel Innov8) also participated in the investment.
At Luchen Technology, Yang You and his team are committed to overcoming the problems in the training and application of large models. The company has launched a series of services, including Colossal-AI, which are fully open source and cover heterogeneous management systems, parallel technologies, and system deployment, designed to help users efficiently deploy AI models. "I want to reduce the cost of fine-tuning to a few hundred yuan, so that everyone can train GPT models at the lowest cost," You Yang said. ”
Our vision is to provide businesses with a seamless deployment and training experience. In the future, he hopes that users can define models on servers, terminals, and even mobile phones, and deploy them to the cloud through Colossal AI, supporting multiple hardware platforms such as CPU, GPU, TPU, FPGA, and various programming frameworks such as TensorFlow, PyTorch, Keras, and TheAno. This will be more efficient in helping startups deploy their own models and systems by maximizing efficiency and minimizing costs.
According to the company's ecological map, the user growth rate of Colossal-AI has exceeded that of traditional software, attracting users from all over the world, including China, the United States, Europe, India and Southeast Asia, and has obtained more than 35,000 GitHub stars, ranking first in the world in the subdivision track. The Colossal-Llama open-source model exceeded 180,000 in Hugging Face** within three weeks.
In the era of large models, opportunities and challenges coexist.
By effectively using distributed training technology and enterprise-level large model solutions, the training process can be accelerated, the training intensity can be improved, and the application of large models can be promoted to a new level. You Yang called for joint efforts to promote the advancement of large-scale model technology and bring wider benefits to science, business and society.