Amazon Web Services announced last week that it will allow NVIDIA to operate cloud services in AWS data centers, making AWS the last major cloud** provider to offer this service. According to The Information, through the new service, called DGX Cloud Services, NVIDIA will lease servers from AWS that contain its graphics processing units, and then give access to its own customers** servers.
As part of the deal, AWS says it will be the first cloud** to bring NVIDIA's latest graphics processing unit, known as the Grace Hopper superchip, or GH200 chip, to the cloud and use NVIDIA's networking equipment to connect the chips together. The GH200 chip combines a GPU with NVIDIA's general-purpose computing chip, called a processing unit, to increase memory. Google, Microsoft, and Oracle offer NVIDIA's H100 chips through DGX cloud services, and they are expected to buy GH200 chips as well.
Summary
Amazon Web Services will offer NVIDIA's DGX cloud service.
AWS is the last major cloud** vendor to offer this service.
The relationship between companies is complicated: Amazon is making chips, while NVIDIA is providing cloud services.
Cloud service providers such as AWS are among the biggest buyers of NVIDIA GPUs, but the relationship between the two companies is very complicated. AWS is developing its own AI chips, Trainium and Inferentia, to compete with NVIDIA's chips. At the same time, NVIDIA's DGX Cloud is an effort by chipmakers to get closer to users and generate additional revenue, which poses a potential threat to cloud service providers such as AWS, according to The Information.
At AWS's Re:Invent conference, The Information spoke with D**e Brown, VP of Compute and Networking at AWS, to find out why cloud service providers agreed to the DGX Cloud deal, which was previously unreported. In the interview, Brown also discussed how AWS can alleviate chip and power shortages.
The following interviews have been edited to shorten the length and improve clarity.
The Information: Why AWS and NVIDIA have partnered to launch DGX Cloud to deploy the new GH200 chip?
This is a very, very difficult engineering problem. Therefore, we believe that these two companies are best positioned to solve this problem.
GPUs are now larger than servers. Even today, running a GPU is very complicated. That's why most companies don't do GPU computing inside their own data centers, which is not possible. So they turned to cloud service providers, but I think we're now in a world where we need the best cloud service providers to be able to do it in a highly available way.
AWS is the last major cloud service provider to sign a DGX Cloud agreement with NVIDIA. Why is this happening?
We weren't initially involved. We just don't think it's the right time. We want to be able to really differentiate ourselves from what is available in the market on AWS.
Can't you guys make a difference with the highly sought-after H100 chip?
It's going to take some time, and it's going to take a real understanding, "How do we create differentiation together?"."Differentiation with partners takes time. You need to know exactly what they have to offer, and they need to know what you have to offer. You need to have a deep understanding of each other's technologies and understand that the combination of each other's strengths can create a better product for the end customer.
We chose not to participate in the first collaboration, but that doesn't mean we said we wouldn't do it in the future. It's really just a matter of time before we can find a differentiated product.
How is AWS's DGX Cloud different from what other cloud service providers offer?
Other cloud service providers] may also have GH200 chips, but they don't have multi-node [NVLink, which is the technology AWS will use to connect 32 GH200 chips on a single server rack]. This is the first time someone has used water instead of air in the cloud to cool an NVIDIA GPU.
Today, we have eight GPUs on our servers. With eight GPUs, air can be used for cooling. But when you start using 32 GPUs. The density is too high to cool [GPU servers] with air anymore. This introduces a lot of engineering complexity, and AWS is in the best position for this.
We have the Nitro system [a chip that can externalize some of the computing work outside of the server], and we know that the Nitro system not only provides better security, but also better performance. And then we have the Elastic Fabric Adapter [Elastic Fabric Adapter, the network system for AWS]. Flex cabling adapters are very similar to infiniband and are also used by other providers, but based on Ethernet. [Flex Cabling Adapter] is a protocol that we developed ourselves, and it's the protocol we use in all real-time GPU clusters.
When you look at the whole solution, it's completely different.
Who will be able to access these GPU clusters with DGX Cloud?
NVIDIA will use that cluster for [its] internal workloads, and then they want to help customers with model training as well." This is DGX Cloud. We're also going to be going to end customers with these GPUs, just like we have P5 servers [NVIDIA H100S] today, and you'll be able to get access to this cluster outside of the DGX Cloud. So, DGX Cloud has 16,384 GPUs, and on top of that, we will be offering more GPUs to our customers on AWS.
Are you interested in the latest GPUs from Advanced Micro Devices (AMD)?These GPUs are called:MI300 series, which is considered to compete with NVIDIA GPUs and may be better on some tasks.
There are a number of factors that we consider before bringing a chip to AWS, and part of that is how to make sure that the chip runs flaw-free on AWSHow do we make sure that we have a whole ecosystem in the GPU space that can support the chips that we have?
For now, we've chosen to focus on NVIDIA and Trainium [AWS's in-house AI training chips], but that certainly doesn't mean we won't be looking at other accelerators – Intel, startups, or whatever. If there's something that we think our customers really want, we're definitely bringing it to AWS.
One of the biggest bottlenecks in deploying GPUs is the power in the data center** because GPUs are very power-hungry. How does AWS solve this problem?
In our region, the power available in a certain geographical area is limited. A few years ago we established a local area [a data center close to the end user]. We started in Los Angeles and now have about 40 local areas around the world. It's an AWS data center that's far away from [regional hubs with multiple data centers].
This is usually done for reasons of latency in order to be close to the user. But in the case of GPUs, we are able to use the local area where there are plenty of power resources. Arizona is a great example of this. We established a local area in Arizona. Now, there's a lot of [GPUs] out there.
We don't have to look for other providers like other cloud service providers. [Microsoft recently struck deals with CoreWe**e and Oracle.] We were able to find a data center, find power, and then quickly put it into service as a local area. It will take several months.
A few weeks ago, you announced a service called Capacity Blocks that aims to make it easier for customers to rent servers with GPUs. What inspired you to create this service?
It's a very fast-paced field for us. We built capacity blocks [this new service] within a few months. We realized that the current way of selling cloud servers was not working well for GPUs in a constrained environment.
The situation is that as soon as there is any GPU available, it will be snapped up right away. It's actually hard for startups to get these GPUs. Often, only larger, more funded organizations are always looking for GPUs. Therefore, the model of instant service does not apply.
Even without restrictions. Are you willing to spend money on GPUs you don't need?I think organizations are grappling with questions like, "How do I get GPUs when I need them?".”
Then another challenge is that you need to deploy them in a network with all the GPUs in the same cluster. The spot market doesn't really work in training because you might have a GPU here and a GPU there, and it's not a clustering solution.
Capacity Blocks guarantees access to these GPUs, and pricing varies. If you're running on weekends, it's lower.
Should we expect to see more in this regard?
You'll see how quickly we iterate - different regions, different instance types, different ways to buy. You'll see us working with other startups, and there will be a lot of opportunities in this space. So keep an eye on it.