Three layers of generative AI technology stack, Amazon Web Services re Invent to help users innovate

Mondo Technology Updated on 2024-01-28

A year ago, at Amazon Web Services Re: Invent 2022, generative AI was barely mentioned. But a few days later, the OpenAI ChatGPT chatbot was born, which instantly set off a frenzy of change, enveloping the whole world into a new era of generative AI.

In just one year, generative AI has become the center of gravity in the technology sector. Amazon Web Services highlighted how the technology is at the top of the cloud giant's agenda at this year's Re: Invent 2023 conference.

In this year's keynote, Amazon Web Services CEO Adam Selipsky said, "Innovation around generative AI models is the best thing to do. He added: "It will reinvent every app we interact with at work and at home." We're taking the whole concept of generative AI in a completely different way than we've ever done before. ”

He also introduced Amazon Web Services' "Generative AI Technology Stack", which aims to provide customers with generative AI applications, new tools for building large language models, and infrastructure to accelerate model training and inference.

A new generative AI technology stack.

Building and deploying generative AI models and applications in the rapidly evolving AI landscape often presents a unique set of challenges. Amazon Web Services' response is a new set of generative AI infrastructure, which consists of three layers of technology stacks, namely the infrastructure layer, the basic model service layer, and the AI application layer, hoping to help customers easily innovate on top of these three layers.

In this year's nearly two-and-a-half-hour keynote at Re:Invent, Selipsky provided a lot of details about generative AI strategies, and Selipsky believes that their new AI technology stack offers advantages in model selection, chip cost, and performance to help AI developers get a head start on the foundational model when building, training, and running generative AI applications.

Layer 1 of the stack: A major revolution in storage and compute.

As the demand for generative AI continues to grow, there is a shortage of GPUs. According to reports, NVIDIA's best-performing chips may have been sold out by 2024. TSMC's CEO recently said that he is not optimistic about the outlook, believing that the GPU shortage of NVIDIA and its competitors could continue until 2025. In order to reduce their reliance on GPUs, Amazon Web Services is one of several capable tech giants that are developing custom chips for creating, iterating, and productizing AI models.

With the Nitro hypervisor and chip families such as Gr**Iton, Trainium, and Inferentia, Amazon Web Services has accumulated a wealth of experience in chip development technology, which also gives it a significant advantage in the cloud and generative AI space. In a previous interview with foreign media, Selipsky explained the practical benefits of these innovations and emphasized the importance of striking a balance between computing power and cost levels. "Generative AI workloads have extremely high compute density, so price/performance is absolutely critical. ”

At this conference, Amazon Web Services launched Amazon Trainium2, a cloud AI chip designed for generative AI and machine Xi training, and Amazon GR**iton4, a self-developed server CPU chip.

Amazon Trainium2 is optimized for training base models with hundreds of billions or even trillions of parameters, delivering up to 4x better performance and 2x more energy efficiency than the first generation of Trainium, which was launched in December 2020. Trainium2 will be used in Amazon EC TRN2 instances in Amazon Web Services, a cluster of 16 chips that scales up to 100,000 chips in the Amazon EC2 UltraCluster product. Amazon Web Services says that using a cluster of 100,000 Trainium chips to train a large AI model with 300 billion parameters can reduce the training time from months to just a few weeks.

Another chip released is the ARM-based GR**ITon4, which focuses on inference. According to Selipsky, gr**iton4 has 30 percent faster processing speeds, 50 percent more cores and 75 percent more memory bandwidth than the previous generation Gr**Iton processor running on Amazon EC2, Gr**Iton3 (but not the newer Gr**Iton3E).

In addition, Amazon Web Services announced a major update to its S3 object storage service: a new high-performance, low-latency tier, S3 storage class, Amazon S3 Express One Zone, designed to deliver single-digit, millisecond-level hundreds of thousands of data accesses per second for latency-sensitive applications. Amazon S3 Express One Zone has 10x faster data access speeds, 50% lower request costs, and 60% lower compute costs than Amazon S3 Standard Edition.

Layer 2 of the stack: Join forces with OpenAI's strongest competitor to fight back against Microsoft.

In a previous interview with foreign media, Selipsky shared his views on Sam Altman's sudden departure and eventual return, "For enterprises, they must strive to expand technology access**;No single model or quotient should be dominant. What has happened recently has once again justified the route chosen by Amazon Web Services. According to Selipsky, "Reliable models and reliable vendors are critical, as are cloud providers that provide options and are committed to supporting the technology." ”

Selipsky focused on the Amazon Bedrock platform, saying that it is already used by tens of thousands of users. The Amazon Bedrock platform is a large-scale model development platform launched by Amazon Web Services in April and fully opened in September, which supports users to call and customize diversified models from Amazon's own Titan models, as well as third-party models such as AI21 Labs, Anthropic, and Stability AI.

In particular, Amazon Web Services also specially invited Anthropic's CEO Dario Amodei to share at the scene. In the conversation, they mentioned that Anthropic has built exclusive customization features around Amazon Web Services, which users can only enjoy through Amazon Bedrock and Anthropic's first-party products. "These services will provide significant fine-tuning and customization capabilities and will only be available on Amazon Bedrock for a limited time through Anthropic's first-party products. This is the only one, and there is no semicolon. ”

Anthropic was founded in 2021 by a former OpenAI engineer whose founders "had a different vision from the beginning when it came to model security." On September 25 this year, Amazon Web Services and Anthropic announced a strategic cooperation, Amazon Web Services said that it would invest up to $4 billion in Anthropic, which is almost comparable to the cooperation between OpenAI and Microsoft in terms of volume. It can be said that in the race for advanced AI base models, the strategic partnership between Amazon Web Services and Anthropic has become an important part of its base model service layer.

Custom AISpecifically, Amazon Bedrock is a platform that provides access to a hosted base model. These include Amazon Web Services' in-house developed Amazon Titan family of large language models (LLMs), as well as neural network options from other vendors and open source ecosystems. Amazon Web Services also announced two new features: fine-tuning and continuous pre-training, which allow customers to customize large models in bedrock for specific tasks.

Custom neural networks are all about training models using new data that is not included in the knowledge base. For example, e-commerce businesses can use product documentation to train models to answer product-related questions from customers. This customization process can significantly improve the accuracy of large models.

The first customization feature introduced by Amazon Web Services is fine-tuning, which allows developers to train supported bedrock models on labeled datasets. These datasets contain sample inputs, common prompt words, and pre-written AI answers to those prompt words. These records are organized in the form of questions and answers, which can be used by AI models to quickly learn from Xi with examples.

Another custom feature introduced by Amazon Web Services is Continued Pretraining, which is targeted at a different set of use cases. It allows enterprises to customize large bedrock models on very large datasets, such as libraries involving billions of tokens. A token is a unit of data that corresponds to several characters or numbers. This new feature also allows the training dataset to be refreshed periodically with new information.

It also allows customers to perform continuous pre-training on unlabeled datasets. Such datasets contain sample inputs, but often do not have the output examples required by AI models. Now users don't need to create output examples, so the effort to create a training dataset can be greatly reduced, thereby reducing the cost of AI customization.

Antje Barth, Chief Developer Evangelist for Generative AI at Amazon Web Services, said in a blog post, "Users can specify up to 100,000 training data records, and generally see significant customization results after submitting at least 1 billion tokens. ”

AI security.

This month, it was reported that Microsoft employees were banned from using ChatGPT, the product in which it invested heavily in OpenAI. "Many AI tools are no longer available to employees due to security and data concerns," it is said to have been a message from within Microsoft at the time. According to Microsoft, "While Microsoft did invest in OpenAI and ChatGPT has built-in protections against inappropriate use, the ** is still a third-party external service." ”

One of the interesting points in this keynote speech was that when Selipsky talked about Bedrock's focus on security and privacy, the big screen showed this news report about ChatGPT.

Selipsky did not name Microsoft by name, but he expressed surprise at the behavior of "friends" releasing early versions of AI products without comprehensive security guarantees, "I can't believe that a friend company actually released an early version of an AI product without comprehensive security guarantees. They are not confident in their model and in the security of their data. ”

Layer 3 of the stack: Amazon Q preview version of AI assistant is generally available.

In today's keynote, Amazon Web Services also announced the preview of Amazon Q, an application at the top of the technology stack. Some analysts believe that Amazon Q is the most important announcement at this year's Re:Invent. "It's arming developers with AI to help them succeed. ”

Amazon Q is able to answer questions such as, "How do I build a web application using Amazon Web Services?".Something like that. Trained with the knowledge that Amazon has accumulated over the past 17 years, Amazon Q is able to answer a variety of questions and provide explanations for why.

Adam Selipsky, CEO of Amazon Web Services, said in his speech, "You can use Amazon Q to easily have conversations, generate content, and take action. Amazon Q fully understands your systems, data repositories, and operational needs. ”

Users can connect Amazon Q to applications and software specified by their organization (such as Salesforce, Jira, Zendesk, Gmail, and Amazon S3 storage instances) and customize configurations. Amazon Q indexes all associated data and content to "Xi" every aspect of your current business, including organizational structure, core concepts, and product names.

For example, a company can ask Amazon Q through a web application to analyze which features customers are experiencing problems with and how they can improve them; It is also possible to upload files directly as you would with ChatGPT (Word documents, PDFs, electronic **, etc.) and ask content-related questions. Amazon Q provides response and reference through connections, integrations, and data, including business-specific data.

Amazon Q not only answers questions, but also acts as an assistant to generate or summarize blog post content, newsletters, and emails. It also provides a set of configurable plugins for general operations at work, including automating the creation of service tickets, through specific teams in Slack, and updating dashboards in ServiceNow, among other things. To prevent errors, Amazon Q requires users to review their recommendations for action before taking action and present the results for verification.

As you can imagine, Amazon Q can be accessed through the Amazon Web Services Management Console, various web applications, and chat applications such as Slack, and has a thorough understanding of the Amazon Web Services family of products and services. According to Amazon Web Services, Amazon Q understands the nuances of various application workloads on Amazon Web Services, and can accept Amazon Q's guidance and operations even for applications that only need to run for a few seconds or have little access to stored content.

On stage, Selipsky showed an example of a high-performance encoding and transcoding application. According to Selipsky, when asked which EC2 instances are best suited for the current use case, Amazon Q has a list that includes performance versus cost factors.

I firmly believe that this will be a productivity change, and I hope that people from different industries and roles will benefit from Amazon Q." ”

Amazon Q, combined with the Amazon CodeWhisperer service, builds and interprets applications**. In supported IDEs, such as Amazon Web Services' CodeCatalyst, Amazon Q can generate tests for customers** to measure their level of quality. Amazon Q also creates new software features, performs transformations, and updates drafts and documentation for packages, repositories, and frameworks, refining and executing plans using natural language.

According to Selipsky, a small team within Amazon Web Services managed to upgrade thousands of applications from J**A 8 to J**A 17 using Amazon Q in just two days, and even completed the corresponding tests.

Amazon Q's conversion feature only supports upgrades from J**A 8 and J**A 11 to J**A 17 (to come). .NET Framework to cross-platform. .NET conversion), and all related features, including conversions, are required with the CodeWhisperer Professional subscription service. It is unclear whether there will be any relaxation of these requirements in the future.

Selipsky also reaffirmed the importance of Amazon Web Services' responsibility for security, reassuring potential generative AI customers that "if your users don't have access to something in the first place, they still won't have access to it after using Amazon Q." Amazon Q understands and respects the user's current identity, role, and permissions....We will also never use business content to train the underlying model. ”

Write at the end. It is clear that the core strategy of Amazon Web Services to maintain its dominance in the AI cloud space is to continue to enhance its cloud infrastructure and develop a unique generative AI technology stack for the market.

Selipsky believes that Amazon Web Services' generative AI stack has unique advantages, "Our unique generative AI stack provides customers with a comparative advantage over other cloud vendors. Not every competitor is willing to innovate at every level, and customers don't know how long it will take them to close the gap. ”

The rise of generative AI has opened up a huge new market for large cloud providers, and the industry has felt the importance of generative AI adaptability and innovation in this rapidly evolving process. As Selipsky says, "Adaptability is the most valuable ability you can have." Amazon Web Services also demonstrated these innovative elements through gr**iton cutting-edge chips, dedicated chips such as Trainium, model platforms, and Amazon Q application products.

It can be seen that Amazon Web Services has invested heavily in its unique three-layer generative AI technology stack, hoping to support diversified AI models and platforms, strategic partnerships, the most cost-effective services, and richer technology options.

Related Pages