Author: Wang Congbin.
Since attending this year's New York Summit, I have felt that generative AI has become a top priority for the development of Amazon Web Services, and through generative AI, Amazon Web Services is rethinking its IT infrastructure.
In fact, earlier on, Amazon Web Services has established three levels of full-stack generative AI capabilities, the lowest layer is the infrastructure layer for training and inference, the middle layer is the tool service for fine-tuning model requirements, and the upper layer is the application layer for building generative AI.
Chen Xiaojian, general manager of the product department of Amazon Web Services Greater China, said that Amazon Web Services continues to invest in the three different levels of end-to-end generative AI, and continues to solve the problem of customization capabilities, because there is no ready-made basic model that can be directly applied to the production environment of all walks of life, and the large model needs to be combined with business data to produce results.
Full-stack generative AI capabilities evolve
On generative AI, Amazon Web Services provides some infrastructure under the hood for training foundational models and running those models in productionThe middle tier provides the most convenient way to access the underlying model, giving builders with no AI experience all the tools they need to apply generative AI directly to build their own applicationsAt the top layer, there are out-of-the-box applications built using the basic model, so that business users who do not have a technical foundation can directly use generative AI in specific scenarios.
At the bottom level, Amazon Web Services saw the value of GPUs as accelerated computing chips 13 years ago, so it was also the first cloud vendor to bring GPUs to the cloud, and took the lead in providing NVIDIA V100 GPUs in Amazon EC2 P3 instances.
Amazon Web Services is the world's first major cloud provider to bring NVIDIA's latest chip, H100 GPUs, and Amazon EC2 P5 instances to market, Amazon EC2 P5 instances are 4x faster to train than Amazon EC2 P4 instances at 60% of the cost of Amazon EC2 P4.
This year, Amazon Web Services CEO ADAM and NVIDIA CEO Jensen Huang expanded their strategic cooperation, Amazon Web Services became the first cloud vendor to launch NVIDIA GH200 NVL32 instances, and launched the "Project CEIBA" cooperation project to use the world's fastest GPU-driven AI supercomputer and NVIDIA DGX cloud supercomputer for NVIDIA AI training, research and development, The development of a customized model, it will have 160,000 of the latest GH200 superchips, providing an astonishing computing power of up to 65 Exaflops.
Amazon Web Services has also launched Amazon Trainium2, a second-generation chip designed for training AI systems, which improves performance by 4 times compared to Amazon Trainium, and is specially tuned for large model training with hundreds of billions or even trillions of parameters.
The middle layer, which has been in the spotlight since the release of Amazon Bedrock in April this year, has also been upgraded to support Anthropic Claude 2 on Re:Invent1 and Meta Llama 2 70b. It also released model fine-tuning, retrieval enhancement generation, and pre-training capabilities based on Amazon Titan large models.
Agents for Amazon Bedrock is also generally available, with the ability to understand user requests, break down complex tasks into multiple steps, and be automated and managed.
At the application layer, Amazon Q is also a reasonable and unexpected release, we can see a lot of intelligent assistants on the market, Amazon Q is the industry's first generative AI assistant that can be tailored for enterprise business scenarios, and helps enterprises solve problems based on internal data of enterprises.
Amazon Q is an assistant that allows users to build, deploy, and operate applications and workloads on Amazon Web Services, helping developers quickly find the services they want through natural language Q&AIt can be a business expert for users, connecting to enterprise data to provide fast, accurate, and relevant answers to business questions securely and privately;This can be a business intelligence expert who can bring it into a variety of services and applications to provide generative AI-based assistanceYou can be a contact center expert and bring in the cloud contact center application Amazon Connect. And Amazon Q will also provide a generative AI assistant for the upcoming Amazon Supply Chain.
Amazon Q is first of all professional, but also provides customization capabilities, which allows users to realize their own customization, and finally ensures maximum security. Chen Xiaojian believes that this is just the beginning, and in the future, more and more products will integrate the capabilities of Amazon Q, which will bring a new user experience of a chatbot chatbot, which is a supplement to other products. The developer session may be the first scenario for Amazon Q to be applied, such as being able to upgrade more than 1,000 J**A applications from J**A8 to J**A17 in a few days, and there are also scenarios where knowledge base and business intelligence can be quickly implemented.
Technology evolution from cloud to edge
In addition to a large number of innovations in AI services, Amazon Web Services is also continuously investing in infrastructure, which now spans 32 geographical regions around the world, and has announced plans to add about 5 new regions. And today Amazon Web Services has three times as many data centers, 60 percent more services, and 40 percent more features than the second-largest cloud provider.
As we all know, the world's first cloud service is Amazon S3, which offers a wide variety of storage tiers from S3 Standard to S3 Glacier Deep Archive. Amazon Web Services also sees some very challenging scenarios that require high-performance analytics for financial transaction analysis, fraud detection, machine learning-based quantitative trading, etc., which require millions of stored requests to be accessed within minutes, and its latency needs to be kept at the level of milliseconds.
Re:Invent has released Amazon S3 Express One Zone, the fastest object storage service in the cloud, which uses a new architecture of Amazon S3 storage class, using specially designed hardware and software to accelerate data processing, and can store access data and high-performance computing applications to be deployed nearby. Delivers up to 10x faster performance than Amazon S3 standard storage, while being able to process millions of requests per minute with consistent millisecond latency while reducing request costs by 50%.
In terms of general-purpose chips, in 2018, Amazon Web Services launched the industry's first self-developed general-purpose chip, Amazon GR**ITON, which has been released for three and a half generations so far. Amazon gr**iton processors are based on the ARM architecture, and Amazon Web Services has launched the latest generation of gr**iton4 on Re:Invent, which improves the performance of gr**iton4 by up to 30%, increases the number of independent cores by more than 50%, and increases the memory bandwidth by more than 75%, and Amazon EC2 R8G instances based on gr**iton4 are currently available in preview.
In addition, Amazon Elastic Compute Cloud (Amazon EC2) M7G general-purpose, C7G compute-optimized, and R7G memory-optimized instances based on Amazon GR**ITON3 processors have been launched in China on the 15th.
Amazon Web Services is the first cloud provider to launch serverless services in the industry, and last year Amazon Web Services also became the first cloud provider to implement serverless data analysis services, and this year, Peter Ander, senior vice president of Amazon Web Services, announced three serverless innovations in databases and application areas.
Amazon Aurora Limitless Database is the only sharded database that is truly serverless, so users no longer need to manually manage the underlying physical server resources, serverless can automatically manage and scale resources, and pay on demand to help users reduce costs to the greatest extent. Amazon ElastiAche Serverless enables customers to create highly available caches in less than a minute and scale vertically and horizontally in real time to support complex applications without having to manage infrastructure. A preview of Amazon Redshift Serverless' new AI-based scaling and optimization capabilities that automatically tune resources and perform optimization actions across multiple workload dimensions to meet pre-set price/performance targets.
Amazon Web Services also extends the cloud from the center to the edge through services such as Amazon Local Zone, Amazon Outposts, Amazon Snowball, and Amazon Private 5G, and also accelerates satellite data migration to the cloud through Amazon Ground Station, bringing cloud computing into space.
At the same time, the "Kuiper Project" is also a project under construction by Amazon Web Services, which will provide faster and more reliable broadband services for hundreds of millions of people on the earth who lack reliable Internet connections through a satellite network of thousands of low-Earth orbit satellites, and narrow the digital divide.