Although Oceanbase, TiDB, and CockroachDB are all native distributed databases, they have obvious architectural uniqueness with each other: TIDB clusters are composed of at least three different node roles, which has a large single-point risk; However, CRDB clusters only have one node role, which is a completely peer-to-peer deployment architecture, which is the most consistent architecture for ultimate distribution. OB is somewhere in between, achieving peer-to-peer deployment, but the key functional modules in the system still use centralized services.
The main reason for the difference in architecture is due to the different transaction processing mechanisms. The database must ensure the ACID feature during data calculation. All databases, without exception, use the same basic method, which is to assign an increasing "flag" to each transaction, which can reflect the sequence of occurrence by comparing the size of two transactions, so as to provide a factual basis for resolving the conflict. In addition, before the database can manipulate data, it must know which node, disk, directory, and file the data is stored on, and these routing information is maintained by metadata. In a native distributed database, each compute node can accept SQL connections and process transaction requests. How to ensure that transactions on different nodes can obtain globally incrementing identities? And how do you make every node have access to valid metadata? The simplest solution is to deploy a separate management node, and all computing nodes apply for transaction identifiers and query metadata on the management node when a transaction occurs, so as to avoid data inconsistencies. (Just like a department, set up a leader, everything is done or not, how to do it, listen to the leader, avoid everyone doing their own thing, and finally the work can not be merged together) tidb is the use of such an architecture, the whole cluster is composed of a number of nodes with different roles, including: tidb node, which is a computing node imitating the mysql engine; tikv nodes, which are nodes that store data, are composed of the Raft protocol and RocksDB; Then there are the PD nodes, which are the separately deployed management nodes, which handle transactions and metadata, and it also manages the state of the cluster, such as scaling, fault identification and response.
Although there are multiple PD nodes deployed, only one of them is responsible for the core task (transaction management), and the others are slave nodes prepared for high availability. This architecture is simple, convenient, and easy to implement, but there are several flaws: 1) What should I do if the leader of the management node fails? If the main leader in the PD fails, the follower can re-select the leader. However, as we all know from those of us who have done database operation and maintenance, this kind of active/standby switchover needs to go through a series of processes such as discovery, reconnection, fault confirmation, and switchover. If the number of retries and response time parameters are small, the system will often switch between active and standby, and the stability will deteriorate. If the setup is large, it will take a long time once the failover is made. During the switchover process, all compute nodes need to wait for a new leader to be generated, and then assign a transaction ID, so the cluster will be completely tamped and services will be suspended. 2) What happens when the management nodes are all faulty? Number of 3-5 management nodes.
10. For large clusters of hundreds of nodes, after all, it is a small number of physical nodes, and metadata can only be stored in these few fixed nodes. Therefore, the life and death of the system completely depends on the health of a few PD nodes. 3) Cross-center multi-active deployment is even more impossible. Although it can be deployed in different places across regions, remote nodes can only be used for disaster recovery and cannot undertake transactional services, because remote transactions cannot bear the performance consumption generated by going to PD leaders to apply for transaction identities and access metadata across hundreds or thousands of kilometers. Although TIDB still has these small flaws, it is undeniable that it is already a very good distributed database, and it should be enough to be applied in general business systems. But we have to think about whether there is a more perfect solution in the distributed architecture, which can liberate the distributed database from the constraints of a single small number of management nodes, so that the computing nodes can achieve greater freedom, more reliable systems, and more secure data?
- You can go and understand the architecture of the CRDB, or (** World Observation) and wait tomorrow I will share how the CRDB and OB have done.