The best of both worlds: a new form of service mesh that combines sidecarless and sidecar patterns

Mondo Social Updated on 2024-01-31

Author: Shi Zehuan, Yin Hang.

In September 2022, the Istio community released a new architecture called Ambient that is also known as the Sidecarless pattern due to its architectural shift from sidecars to layers 4 and 7. Alibaba Cloud Service Mesh (ASM) is the industry's first managed service mesh that supports ambient mode.

This article is based on the record of the latest progress sharing of Alibaba Cloud Service Mesh ASM product technology at the 2023 Apsara Conference, and Shi Zehuan and Yin Hang from the Alibaba Cloud Cloud Native Product Line Service Mesh Team will use 4 partsThis paper introduces how ASM implements this new form of service mesh that integrates sidecarless and sidecar patterns, as well as the serverless service mesh.

As the service mesh becomes more and more popular under the cloud native technology system, more and more enterprise technical teams use the service mesh in the production environment, I believe that many friends have already understood the service mesh, in order to facilitate the students who have just started to contact the service mesh to understand the follow-up content more smoothly, let's first introduce the classic sidecar mode of the service mesh.

The classic service mesh architecture is divided into the control plane and the data plane according to the responsibilities of the components, the control plane plays the role of the big steward of the service mesh, providing different configurations for the data plane components according to their needs, and the data plane acts as the executor, it can manipulate the traffic according to the configuration issued by the control plane, and is the real executor of the service mesh capability.

In order to achieve service mesh capabilities such as traffic routing, load balancing, fault injection, request response manipulation or authentication, and zero trust network, the injector of the service mesh will inject a container dedicated to executing service mesh capabilities into the application pod, which is in the same pod as the application and shares the network namespace, which is the sidecar of the service mesh.

Since the sidecar shares the network namespace with the application container, the application traffic can be easily intercepted to the grid process located within the container through iptables rules, which is the classic sidecar architecture.

ASM decouples the control plane from the data plane and manages deployments, which offers some significant advantages over Istio. First of all, ASM has a complete lifecycle management capabilityWith the help of ASM's lifecycle management capabilities, users can create, delete, and upgrade service mesh instances with one click without considering configuration issues or compatibility issues.

Second, ASM provides the ability to mask misconfigurations and diagnosesASM converts these problems into inspection items and diagnostic items, intercepts and alarms the wrong configurations as soon as possible, and helps service mesh O&M personnel find and solve problems in a timely manner. As the network infrastructure of cloud-native applications, ASM provides the ability to quickly connect multiple cloud services with one click to help users quickly connect to the cloud-native ecosystem.

Finally, ASM provides enterprise-grade multi-cluster supportASM allows users to quickly mesh multi-plane clusters.

ASM currently server-less some control plane components, and serverless components can be automatically scaled out and used on demand. Based on the capabilities of the serverless base, it can achieve higher scheduling efficiency and boot speed optimization, and significantly reduce the readiness time and boot time through mirrored caching.

The sidecar mode is very intuitive and effective, but it still has some drawbacks:

1) The injection of sidecars is intrusive to the workload, injecting or canceling the injection requires restarting the workload, and adjusting the configuration of the sidecar (such as resources) may also require restarting the workload, and the sidecar and the workload are strongly bound.

2) The resource utilization is not ideal, in order to cope with the worst-case scenario, each sidecar needs to reserve a part of the resources, the larger the cluster size, the more idle resources.

3) Traffic capture, protocol identification, and other Layer 7 processing is computationally expensive, but not all requests are HTTP protocols or need to be processed by sidecars.

Based on the above reasons,We need a way that is less invasive and less expensive to useMake the service mesh applicable to more scenarios. So, in September 2022,The Istio community has introduced the ambient mode of sidecarlessThis mode splits the functions of the sidecar into layers 4 and 7**, and the deployment of both layers 4** and 7** is separated from the workload, which makes up for the shortcomings of the sidecar mode in some scenarios. ASM is the industry's first managed service mesh to support ambient mode.

In ambient mode, Layer 4 focuses on the observability, routing, and communication encryption of the transport layer, while Layer 7 is based on the Layer 7 protocol for more complex behavior processing in the dimensions of traffic management, security, and observability. Users can gradually choose whether to enable the application and which layer to enable according to the actual business needs.

We call the L4 of the data plane ztunnel, which is an L4 processing layer, and the L4 processing layer will be responsible for all the Layer 4 communication of the application. After ztunnel is deployed in daemonset mode, use CNI to configure traffic interception rules on the node to intercept the traffic of pods in the grid to the ztunnel instance, ztunnel will transmit the traffic after MTLS encryption, and the peer ztunnel will decrypt the traffic and then send it to the application. With the help of the above paths, ztunnel can also collect TCP monitoring metrics, access logs, and more.

We refer to Layer 7 as Waypoint, which is Envoy-based, like sidecars in the classic architecture, for more advanced, Layer 7 protocol-based capabilities in Ambient Mesh mode. For example, it can apply advanced policies of a service mesh, such as circuit breaker, traffic shaping, traffic splitting, retries, fault injection, role-based access control authorization policies, and more, based on request headers and credentials. Compared with ztunnel, which is based on node-level deployment, WayPoint is deployed at the service level, and users can enable or disable Layer 7 for a service, or scale the deployment scale at will, deploy on demand, and improve the utilization of resources in the cluster.

Now that we understand the specific capabilities of L4 and L7, let's take a look at the network topology in ambient mode where L4 and L7 are decoupled

Let's take a look at what the traffic path looks like in ambient mode, starting with L4:

1.When an application pod in ambient mode starts, the CNI plugin writes its IP address to the ipset in the node network namespace.

2.When a request is initiated, the traffic packet arrives at the node network namespace through the pod's Veth Pair interface, and the packet from the address in the ipset is captured and processed by the iptables rule on the node.

3.The iptables rule marks the packet as 0x100.

4.The policy-based routing rule on the node specifies that any packet marked as 0x100 is to be directed to destination 192 through the istio outbound network interface168.127.2。

5.The transparent iptables rule in the ztunnel pod sends packets from pistioout to ztunnel outbound port 15001.

6.ztunnel processes the packet and sends it to the IP address of the destination service (httpbin). When the address is on Node B, the HTTPBIN is the Veth device address, and the packet is therefore routed to Node B.

7.Once the packet arrives at Node B, the rules for inbound traffic ensure that the packet is routed to the Istioin interface.

8.The packet passes through a tunnel consisting of Istioin and Pistioin to bring the packet into the ztunnel pod.

9.The iptables rule in the ztunnel pod captures packets from pistioin and directs them to port 15008 based on the tag.

10.The ztunnel pod processes the packet and sends it to the destination pod.

When Layer 4 outbound data is used, if Layer 7 is enabled for the target application, Layer 4 will pass the traffic through the HBeBone tunnel to Layer 7 of the target application, and the traffic will enter Layer 7 through the Connect Terminate Listener listening on port 15008, which will be processed by a specific filter, unwrap the HBeBone traffic and complete the authentication, and then send the traffic to the main internal *** for follow-up processing. On the main listener, the traffic is matched based on the service ip+port, and the Layer 7 traffic policy is executed to determine the target cluster. Finally, the traffic enters an internal listener called Connect Originate, which continues to send traffic to the upstream destination using the HBone tunnel.

Readers who are interested in ambient mesh traffic paths can also refer to another article by the author, in which the traffic path in ambient mode is analyzed in more depth.

In the new architecture, the sidecar mode does not conflict with the sidecarless mode, and users can mix the two modes for mixed deployment, with the help of this feature, it is possible to gradually complete the switch according to the needs. ASM's managed ambient mode can reduce resource overhead by up to 60%, reduce operational effort by 50%, and reduce request latency by up to 40% in certain scenarios.

ASM provides a configuration management platform for the data plane through a unified control plane API, and delivers different configurations on demand for sidecars in sidecar mode, 4 layers in sidecarless mode, and even managed ones in the converged form.

Since Layer 7 carries richer service mesh capabilities, in production practice, Layer 7 is more likely to need to scale at the same time as the business application scales. Based on the flexibility of Layer 7 deployment under the new architecture, ASM provides a managed Layer 7, which is deployed in a serverless form to shield the O&M complexity of Layer 7, so that users do not need to plan the capacity of Layer 7 in advance, nor do they need to undertake any O&M work of Layer 7, and can deploy Layer 7 at any time according to business needs, as well as one-click scaling.

Let's take a look at the technical architecture of Managed Layer 7**, users can declare Layer 7** through the K8S standard Gateway API, the WayPoint Proxy Controller located on the ASM hosting side will Watch the Gateway API, when the Gateway CR is created and changed, the WayPoint Proxy Controller will be based on the Spec pair of the Gateway API Waypoint Proxy workloads for lifecycle management. Users can specify that Layer 7** is deployed in an ASM-hosted WayPoint Proxy pool or on an ECI Serverless node in a user's cluster via the Gateway API.

Lixun Logistics is a service provider that focuses on the fashion industry and provides professional logistics and first-chain solutions for enterprises. Lixun Logistics has 70+ omni-channel physical cloud warehouses and 6 central e-commerce warehouses across the country, with a total area of 1 million+ square meters, covering 300+ cities and 3000+ business districts, providing omni-channel distribution services for many well-known fashion brands and their brand stores.

At present, Lixun Logistics has switched 100% of its core production system to ASMASM helped Luxion Logistics shorten the implementation cycle by at least 50% during the grid implementation processThe managed O&M-free architecture and rich productization capabilities also help the O&M team of Lixun Logistics to achieve O&M efficiency in network traffic management and security configuration managementIncrease by at least 40%. After Lixun Logistics switched to ASM, O&M personnel completed all the configurations related to traffic rules, security, and observability through various APIs provided by ASM. The ASM gateway is used to access the custom authentication service to enhance the ingress security, with the help of ASM's external service registration capabilities, the communication between internal and external services in the grid is opened, and the productization capabilities provided by ASM are efficiently connected to the unified observability platform, which observable data generation, collection, query, and search are obtainedA complete set of observable solutions, such as dashboard visualization, real-time insight into grid topology, and grid service health assessment.

Finally, let's take a look at the enterprise-level service mesh capabilities provided by Alibaba Cloud Service Mesh. From the level of service mesh capabilities, it includes: unified governance of heterogeneous services, network management of multi-cluster and cross-cluster applications, grayscale release and deployment of applications integrated with CI CD tools, smooth evolution of application cloud architecture, and AI elastic service management based on KSERVE. In terms of integration and compatibility, ASM supports a web user interface, as well as a complete OpenAPI, providing a powerful and flexible ability to be integrated, and at the same time, ASM is fully compatible with the ISTIO usage specification and supports configuration changes to the configuration of service mesh instances through the standard K8S API.

The core components of the ASM control plane are fully managed, and the Standard and Enterprise Editions use a unified flexible architecture, providing complete multi-version support, and providing a number of powerful customization capabilities including traffic management enhancements, protocol enhancements, adaptive XDS optimization, software and hardware integration optimization, grid diagnostic analysis, extension center, heterogeneous service registration integration, etc.

ASM is a network platform for cloud-native applications that provides unified grid governance capabilities for application services running on heterogeneous computing infrastructure. Based on ASM's powerful heterogeneous computing infrastructure support capabilities, ASM helps users connect workloads running on K8S nodes, serverless workloads running on ECI nodes, hosted serverless components, and other workloads and heterogeneous facilities in public clouds or IDCs for unified management and O&M. For more information on ASM capabilities, please visit the ASM homepage or visit the ASM website.

ASM Home:

Click on the link below to view the ASM Ambient Mesh Mode Hands-on Tutorial.

Related Pages