Seata creates the industry s first distributed transaction product

Author: Ji Min, Head of Distributed Transaction Product at Alibaba Cloud and founder of Seata Open Source Project.

Pain points in microservice development

In 2019, we collected more than 2,000 submissions on "What are the core issues in a microservice architecture that developers are most concerned about?" based on the Dubbo Ecosystem Meetup". In the end, distributed transaction issues accounted for the largest proportion of the survey, accounting for about 54%.

Before the advent of Seata, the attitude of distributed transactions was to avoid them, and most of them relied on writing some complex business logic and message eventual consistency to solve the data consistency problem. But with Seata's open-source, these problems have become much easier. For example, the lossless online and offline mentioned here is mainly concerned with the availability of services or other issues. From my understanding perspective, I think it's ultimately about data. No matter how the front-end business interacts, it will eventually precipitate changes in the data. If the data of the business is not aligned, no matter what kind of architecture it is, it becomes meaningless, because in the end the data is the core asset of the enterprise.

So what are the scenarios that will encounter the problem of distributed transactions?

In the first scenario, after the monolithic architecture is split into a microservice architecture, different services may be maintained and responsible by different teams, which involves the release and deployment linkage of upstream and downstream services. For example, when the C service is released, the A service and the B service will not be notified, and the data consistency problem caused by the end service going online and offline will be encountered.

In the second scenario, system stability problems caused by unreliable and unstable infrastructure lead to data consistency problems, such as business problems caused by unstable storage, computing, and network at the infrastructure layer.

In the third scenario, timeout is a difficult state for service calls in a distributed architecture, once a service call times out, we can't determine whether the business logic is finally executed or not, and it will eventually evolve into a problem of data consistency. With the evolution from monolithic to distributed architecture, services have evolved from in-process calls to cross-network calls, and network factors are integrated in the link, which exacerbates the probability of data consistency problems.

In the fourth scenario, in addition to databases, other third-party data components will be involved in the service link. For example, inventory services generally involve Redis components, and how to achieve data consistency between databases and caches is also a business scenario that needs to be considered in the transaction chain.

In the fifth scenario, when upstream and downstream services are invoked, some business parameters need to be passed in the call link, and the upstream service needs to perform logical verification on these parameters. If the parameter logic verification is invalid, how to roll back the data of the upstream service to the initial state of the data before execution to ensure the data consistency of the service call chain?

In summary, the business scenarios of distributed transactions mainly include data consistency across databases, services, and diverse resources. Distributed transactions pay special attention to transaction rollback scenarios caused by business exceptions, which mainly include business exceptions and system exceptions.

So is distributed transactions a problem unique to microservices architectures?Actually, no, it has the same problem in monolithic applications, but the problem is more prominent in microservice architectures.

What are the scenarios for distributing transactions in a monolithic architecture?For example, a monolithic multi-module application needs to operate multiple databases. The local transaction is done through a sqlsession on the database connection, as long as the local transaction is crossed, it will involve the problem of distributed transactions, even if different microservices modify the same database, this will also involve the problem of distributed transactions, because sqlsession is based on socket connection, so it itself cannot pass objects to other services through serialization and deserialization.

On the whole, distributed transactions are a common problem of various application architectures, and the application scenarios are very wide.

Solutions for distributed transactions

There are several types of distributed transaction solutions on the market:

XA mode, which performs a little worse than other transaction patterns, but guarantees the tightest data consistency. XA mode needs to be set to a serialized isolation level, which is equivalent to adding a read/write lock to the data. In addition, connection resources need to be maintained throughout the transaction period, resulting in resource locking issues that affect the throughput of concurrent transactions. TCC mode and SAGA mode, which can be boiled down to a distributed transactional solution at the business level, as it does not intercept data. For example, the tcc mode has try, confirm and cancel interfaces, and the rollback and commit logic in this interface need to be implemented at the business level, and only responsible for distributed coordination at the framework level. This requires business developers to have a full understanding and design ability of business logic, otherwise there may be failure to commit rollback, asymmetry of commit and rollback logic, idempotency of interfaces, and timing problems. Message eventual consistencyIts biggest advantage is that it can realize asynchronous decoupling, and at the same time, it can combine the advantages of peak shaving and valley filling of messages. But it has some problems of its own, and the message is more of a one-way one-way notification scenario. In some business scenarios, there may be scenarios where message consumption fails, but it cannot roll back the data of the message sender. For example, in the cash red envelope business, the first step is to deduct the amount of the red envelope from my account, and the second step is to withdraw the money to my bank card through message consumption. My account has been cancelled when the message is consumed, so the message consumption will always be in the state of consumption failure even if it is retried. For this kind of problem, the message can no longer roll back the data of the message sender. Therefore, messages are suitable for asynchronous one-way notification scenarios where consistency requirements are not strong. Compensation for scheduled tasksIt has a low learning Xi cost, but a relatively high development cost. In particular, there is a link of multi-hop nodes in the microservice, and if there is a problem in any intermediate node, it is necessary to consider how to compensate and correct all the data that has been changed, and the compensation logic needs to be written in a very detailed and rigorous manner, which is more suitable for simple business scenarios with low requirements for data consistency and real-time. at modeIt takes consistency and performance into account, and its main features are simplicity and non-intrusiveness, strong consistency, and low learning Xi cost. The disadvantage is that there are certain development protocols that need to be followed, and it is not supported for all SQL types, so there are certain restrictions on its use. It is suitable for general business scenarios, but it is not suitable for high-concurrency scenarios of hot data, such as the deduction of inventory of SKUs. Because the at mode has a global lock at the application layer, when the same data modification operation needs to be performed, the global lock needs to be used to control the concurrent timing of transactions, and there is a lock queue waiting mechanism on the implementation. The origin of Seata

Seata's product code name within Alibaba is TXC, which originated from Alibaba Group's colorful stone project. At that time, Ali Group was doing the evolution of the architecture of IOE, from a monolithic architecture to a distributed architecture, and in the process of evolution, it will inevitably involve a lot of middleware to solve the problem of distribution, and the main role of TXC is to ensure the data consistency of the service.

TXC has deep integration with three major parts of the group, including: a product similar to the service invocation framework HSF and the open source Apache DubboThe database layer has a TDDL component for database and table shardingMetaq component of an asynchronous message. These three items basically meet the general needs of business development, and TXC eliminates the need for developers to design and implement consistency logic through deep integration, and the framework level can naturally ensure that consistency is completely transparent and imperceptible to developers. TXC is also widely used in the group, with an average daily transaction call of 10 billion transactions, and the throughput of a standard 3-node cluster can reach nearly 10W TPS.

The SLAs for TXC products include availability SLAs and performance SLAs. Because for distributed transactions, in addition to ensuring basic data consistency, it is also necessary to ensure the performance and throughput of the system, and the SLO metric defines that the additional RT overhead for each distributed transaction call cannot exceed xx ms. At present, the invocation and processing of distributed transactions can reach the millisecond level, which is basically close to the theoretical upper limit value we derived, and can ensure that there is no fault throughout the year in terms of stability.

Distributed transaction model definition

Let's take a look at the definition of the distributed transaction model.

When we first define the distributed transaction model, we need to fully consider what layer the distributed transaction should be implemented at

From the developer's point of view, the application architecture mainly includes: the top layer is the application development framework, and each company has a set of development frameworks, which may be self-developed frameworks or Spring Cloud systems. The next level is that the service invocation framework is undertaken by a framework like Apache Dubbo, which is widely used in China. The next layer is data middleware, which mainly includes ORM frameworks, transactions, synchronization, reconciliation and other types of middleware. The lowest layer is the connection layer to the database, such as the jdbc j**a version of mysql-connector-j**a.

Let's do a simple comparison of which layer is more appropriate to implement distributed transactions, whether it is the database, data middleware layer, or application development framework layer

In the application framework layer, the consistency is relatively weak. Because there are a lot of complex factors in the development framework layer, such as the exception of the service call. For example, the timeout and retry mechanism of service calls, which is why the TCC mode now has some idempotency, anti-suspension and no-rollback problems, because it incorporates the RPC factor, so it will bring some uncertainty problems. In fact, Seata already solves this kind of problem at the framework level.

In the data middleware layer, its consistency is better than the application framework, and its main problem is that it is not implemented in the database layer, so there is a way to bypass the middleware to modify the database directly, at this time, there will be the problem of transaction concurrency and dirty writing, in the final analysis, because the data middleware is not the only interface for modifying data, but it can be used through certain development protocols to avoid problems.

The best data consistency is achieved at the database layer, but the main problems with implementing at the database layer are:

1.It mainly depends on the implementation of database vendors, and the capabilities are uneven. For example, mysql is in 57.Version 7 only made XA truly usable, and in previous versions it had problems persisting XA prepare.

2.Distributed transactions across services cannot be constrained. The distributed transactions implemented at the database layer take the database as the core resource, and the scope of the transaction can only be limited to the database itself. If you want to do distributed transactions under a larger scope, for example, if the application needs to be cross-service, it cannot be constrained for cross-service distributed transactions, so there needs to be a third-party coordinator to coordinate cross-service data consistency from the whole world.

Eventually, we brought the differentiated AT transaction model to the database middleware layer, and the TCC and SAGA transaction patterns to the application development framework layer.

The definition of distributed transaction model is not only to define a development component, but also to implement a complete set of development process system, including the definition of programming model, performance, operation and maintenance, security, data audit, observability and high availability system. In terms of theoretical models, our theories were relatively lacking at first, and then gradually improved the definition of the role model and the extension of the Spring transaction model. The reason why the existing model is extended, rather than completely creating a new theoretical system, is that it can greatly reduce the cost of learning Xi for developers. In addition, we have also made some basic theoretical definitions, including the definition of consistency, whether it is to solve the consistency of multi-node, or the consistency of business application architecture data, the schema definition of transactions, the transaction interaction model and the isolation of transactions, etc.

What is a distributed transaction and what problems does it solve on a day-to-day basis?

For example, when I go to make a bank transfer, I transfer 100 yuan to you, and it happens that there is a network timeout at this time, so is there any deduction for this 100 yuan? If there is no certainty, there may be a problem of asset loss, and it may even affect the goodwill of the enterprise.

We're all talking about distributed architectures, but from a macro perspective of the application as a whole, not all components are actually distributed. For example, if you look at the database from the level of an application's architecture, of course, including today's popular distributed database, it is still a centralized data storage from the perspective of the entire application level. In the distributed application architecture, each node only has a part of the information, and if you want to troubleshoot some problems, you must need a centralized component to master the information of the whole link. For distributed transactions, its core work is to do distributed coordination, so it needs to grasp the global transaction and data information.

That's why Seata has the role of Transaction Coordinator, for distributed transactions, it has to have a God perspective, acting as a third-party coordinator, and it is the Resource Manager that really performs database operations, you can think of it as the soul of the database, acting as the proxy of the database. The transaction manager that really controls transactions with business applications is the transaction manager, which controls the boundaries and resolutions of transactions along with the execution chain of the business. Together, the Transaction Coordinator, Resource Manager, and Transaction Manager form the role model for distributed transactions.

Architectural evolution of distributed transactions

In January 2019, Seata was officially open-sourced, starting from 0Version 1 we will focus on the open source of the at transaction mode, 0Version 4 officially incorporated the TCC transaction pattern. The AT mode needs to be adapted to different databases for development, and it is difficult to meet the support for all relational databases and the cache resource support of the business link in the existing stage, which requires the TCC transaction mode to supplement the AT transaction mode.

As Seata continues to iterate and release, there is a growing demand for long-running transaction solutions. At 0Version 9 is officially incorporated into the SAGA transaction mode, which mainly solves the problem of data consistency of microservices with long links, and takes into account the visual orchestration of microservices. In 1Version 1, Seata incorporates XA's transactional pattern. Why include the XA pattern?After Seata supports AT, TCC, and SAGA transaction modes, other transaction modes on the market mainly include message eventual consistency and XA patterns. The community has also received feedback from users that users have used the AT transaction mode in the newer business, and users expect that it can be interconnected with the original XA mode to form a unified distributed transaction selection.

Seata drives the evolution of technology through a community-driven model, creating a one-stop solution for distributed transactions. For different business scenarios, Seata can use different transaction modes to meet the needs of the business. As shown in the diagram above, why does Seata have four transaction patterns?At present, there is no distributed transaction mode on the market that can solve the data consistency problem in different business scenarios.

Distributed transactions are deeply coupled with services, including synchronous and asynchronous, long and short transactions, strong and weak consistency, and performance trade-offs from the perspective of data interaction links. So the community has incorporated the current four transaction patterns, each with its own advantages in terms of transformation cost, performance, and isolation, which will not be introduced here.

The current state of the SEATA open source community

Let's start by looking at the current state of SETA's open source community. SEATA has incorporated four transaction modes: AT, TCC, SAGA, and XA, and has extensive support for mainstream relational databases and RPC frameworks on the market, and has also been actively and passively integrated by many third-party communitiesAt present, it has made open source ecosystem integrations with more than 30 communities, and these integrations rely on Seata's existing pluggable expansion mechanism design.

Seata has a rich multilingual ecosystemIn addition to the initial j**a version, the support for golang has become more and more mature, and everyone is welcome to try the golang version and give more valuable comments and suggestions to the community. In addition, the community is also building multi-language versions including PHP, Python, etc.

At present, Seata's open source products have been applied in business systems by thousands of enterprises, and financial enterprises have also piloted them. Financial services have strong requirements for distributed transactions, and the requirements for product capabilities are very strict. For example, China CITIC Bank, China Everbright Bank, and Agricultural Bank of China have also established cooperation with the community to use SETAA to successively transform some of their core accounting systems to ensure the data consistency of the accounting system. This is partly a reflection of the maturity of Seata's open source products.

At present, the number of stars in the Seata community has reached 24K, and there are more than 300 contributors. Seata is a very open community, and everyone is welcome to participate in the building of the Seata community.

Seata's Enterprise Practice Case

Let's take a look at some of Seata's more typical enterprise cases.

The first case: TravelSky Travel Project. TravelSky is the earliest angel user of Seata, and the access is Seata 0Version 2. If you travel a lot, you should use their Voyage app. TravelSky used Seata to solve the data consistency problem of the ticket and coupon business, and in the earliest version, users stepped on a lot of pitfalls with the community.

The second case: Didi Chuxing two-wheeler division. It's in seata 06.Version 1 was introduced into various businesses of the Two-Wheeler Division, including the management of Qingju Bicycle and internal assets that you can see in the market.

The third case: Meituan infrastructure. Based on the open source Seata, the Meituan infrastructure team encapsulates the internal distributed transaction SWAN project as a basic component for Meituan's internal businesses to solve the distributed transaction problem.

The fourth case: Hema Town. In the game interaction of Hema Town, the process of picking and stealing flowers is controlled through Seata, and the development cycle is reduced from the original 20 days to 5 days, which greatly reduces the development cost.

In summary, we can find two major values of distributed transactions.

The correctness of business data, because it only makes sense to talk about architecture if the data is consistent. Architects and developers can focus on business design and development without worrying about data consistency. Seata Ecosystem Expansion

The diagram above shows the hierarchical structure of Seata's open ecosystem extension points. The top layer defines the API layer, the middle layer includes the registration configuration center, AT database resources, distributed locks and load balancing policies, etc., and the lower layer includes the storage mode of the cluster, SQLPARSER, protocol layer and transport control. In the cluster mode, stateless DB or Redis-based storage modes are supported, and the community is working on a Raft-based cluster mode with no separation of storage and compute.

SETAA pluggable expansion points

Seata refers to the design of the Dubbo SPI in defining the pluggable expansion mechanism. Seata's extensibility points are divided into server extensibility points and client extensibility points. There are about 30+ client extension points, which are mainly oriented to configuration, service registration, discovery, authentication, SQL parsing, and executors. The extension points on the server side include locks, storage, and transaction mode processing.

Judging from the current expansion mechanism of SETA, for example, if you want to support a domestic information and innovation database such as Dameng and Renmin Jincang, the cost of secondary development is relatively low. You only need to follow the documentation provided by the community to implement and configure the corresponding database of the existing SQL parsing and executor SPI, and you can run the whole process. Seata extends the implementation of common RPC frameworks and relational databases on the market.

SETAA & RPC Integration Extensions

At present, Seata has supported 11 common RPC frameworks, and the adaptation community to Dubbo has supported the early version of Alibaba Dubbo and the subsequent Apache Dubbo version. The integration of SEATA and RPC framework is relatively lightweight, and the core is to pass the transaction context of SEATA to the upstream of the service through the service call link, and bind and clear the transaction context. Seata can implement the above extension logic of Seata at the request response extension point provided by the RPC framework, and common RPC frameworks basically support the loading of custom filters or interceptors.

On the right is an example of the current implementation of RPC interface extensions, and you are welcome to use SEATA in the Dubbo ecosystem to solve the problem of cross-service data consistency.

Seata & Database Integration Extension

Currently, the Seata at schema database supports MySQL, Oracle, PostgreSQL, TiDB, OceanBase, SQLServ, and other databases. There are still some database adaptation-related PRs in the community that are still in the review state and have not been merged into the trunk, such as Dameng and IBM DB2. As mentioned above, as long as the implementation is based on the extension point of the current database, it is possible to achieve the adaptation of relational databases that have not yet been supported by the community. In the recent Summer of Programming activity, the community has proposed support for the polarDB topic, which has also entered the test state, and the follow-up community will fully support the TOP20 relational databases, and will also increase the support for the Xinchuang database.

Seata originated from Alibaba's internal e-commerce business system to solve the problem of service consistency in the process of servitization, and after years of standardization construction and the baptism of large-scale promotion traffic, Seata has become a standardized component in transactions, payment and logistics scenarios. After Seata is open-sourced, it accelerates the exploration and evolution of technology in a community-driven way, and is committed to creating a one-stop distributed transaction solution, including AT, TCC, SAGA, and XA transaction modes to meet the needs of users in different scenarios. Seata has done a lot of integration and integration in the community ecology, and has reserved rich extension points through plug-in solutions to meet the expansion needs of users in different scenarios for service invocation frameworks, databases, and registration and configuration centers. Going forward, the Seata community will continue to refine its ecosystem based on distributed transaction solutions, while exploring and expanding the broader DataOps ecosystem.

Seata creates the industry s first distributed transaction product

Related Pages

Distributed matrix system

Apache SINGA is an engine for distributed deep learning Xi

What is a distributed VPN?How does DataSky implement a distributed VPN?

China's new frigate raids, creating a distributed kill chain, and teaching the United States a lesso

Application of Ankerui distributed photovoltaic monitoring system in the new energy industry