Stephen Kalangarn, Senior Principal Engineer of Amazon Web Services, shared with the audience the innovative experience of Amazon Web Services overseas servers in the process of building global network infrastructure at the recent re:invent conference. This in-depth presentation covers the latest advancements in physical networking and how to improve availability, reliability, and performance with an intent-driven network model.
In his speech, Karangahn emphasized the achievements of Amazon Web Services overseas servers in terms of physical networks and their vision for the future in the process of deploying overseas markets. This global network of millions of overseas servers is spread across hundreds of nodes in 190 countries and handles trillions of customer requests every day. He shared some customers' urgent concerns about the current network infrastructure and long-term planning of Amazon Web Services overseas servers, as it relates to the cloud services they rely on, such as EC2, S3, and Lambda.
Calangarn divides the network into multiple domains, including equipment, control plane, management plane, and planning, emphasizing reliability and high availability as intrinsic characteristics of Amazon Web Services network infrastructure. He delves into the underlying physical infrastructure, involving components such as physical routers, switches, and servers on the backplane, revealing these elements that are critical to maintaining reliability.
Although most people believe that many of Amazon Web Services' overseas server infrastructure is primarily operated in the cloud, Karangarn emphasized that he and his team have been working on the underlying physical infrastructure. It was important for him to understand the components inside these machines, the software that runs on them, and the overall systems set up to maintain reliability. He used some examples to show the hardware customized by Amazon Web Services, such as the parallel update board that shortened the update cycle from 132 to 1, which enabled the switch optics to be updated.
When analyzing the infrastructure expansion of Amazon Web Services Overseas Server (AWS), Karangarn outlines an approach to planning for future growth based on factors such as the number of regions, capabilities, and scale. He discussed at length how technical promises such as availability partition isolation translate into real-world network requirements, and Amazon Web Services' cloud computing services have incorporated these commitments directly into their network intent, enabling a manifesto of expected behavior across devices, zones, regions, and planning.
Kalangarn further delved into the categories of intent defined by Amazon's overseas server cloud computing service, including operational intent, routing intent, prefix intent, and recovery target intent. These intentions are standardized so that behaviors can be disseminated on the network. For example, an intent can force communication between two EC2 instances in the same Availability Zone to be limited to that partition with a "latency of less than 2 ms".
Karangarn then demonstrated how Amazon Web Services cloud computing services can leverage intent to develop new network topologies through a customer case study on machine learning workloads. When the Ultra Cluster service is upgraded from "P4 instances with 400Gbps bandwidth" to "P5 instances, 32Tbps", the team needs to reduce latency and hop count to improve the performance of ML training jobs. This prompted them to reimagine the two-tier network structure and introduce a new routing protocol called CIDER.
In the concluding part, Karangahn talked about how Amazon Web Services Overseas Cloud Computing Service applies formal methods and automated reasoning techniques to validate network configurations to prevent failures. By testing as early as possible and mathematically proven, Amazon Web Services is able to reason about large-scale problems, building a more robust system to support its network infrastructure that launches more than "1 million compute instances" per day worldwide.
Throughout the presentation, Karangarn emphasized how the Amazon Web Services Overseas Server Cloud Computing Service leverages intent to improve network availability and innovation while reducing complexity. Intended to support consistency, visibility, and automated inference across devices and systems, providing the infrastructure for services running mission-critical applications in 190 countries with millions of customers around the world. By detailing their journey towards an intent-driven networking model, Amazon Web Services Overseas Server Cloud Computing Services is demonstrating how next-generation infrastructure can meet the needs of customers running cutting-edge workloads on a global scale. This presentation provided the industry with in-depth insights into the cutting-edge progress of Amazon Web Services in building the network infrastructure of the future.