GBASE NTU General Technology Sharing:
Some time ago, when I was flipping through the information on the Internet, I saw an article about database disaster recovery solutions, with the title prominently written "second-level RTO disaster recovery solution", and the whole article talked about disaster recovery around the concept of "second-level". This surprised me that disaster recovery should sacrifice the efficiency of a small part of production to increase redundancy to achieve the overall data security of the information system and ensure the stable operation of the system. It is true that RTO is a very important evaluation indicator for disaster recovery, and a disaster recovery plan should describe the system data care provided by the all-round disaster recovery system to ensure the availability of every link of the system, so as to achieve the purpose of 24*7 uninterrupted operation of the system, rather than only emphasizing the speed of this indicator. In my opinion, a disaster recovery solution should be expressed by an effective disaster recovery system.
The disaster recovery system described by the entry compilation and application project of "Popular Science China" is as follows:
Disaster recovery system, for IT, is to provide an environment for computer information systems that can cope with various disasters. When the computer system suffers from irresistible natural disasters such as fires, floods, wars, and man-made disasters such as computer crimes, computer viruses, power failures, network communication failures, hardware and software errors, and human operation errors, the disaster recovery system will ensure the security of user data (data disaster recovery), and even, a more complete disaster recovery system can also provide uninterrupted application services (application disaster recovery).
It can be seen that data protection is the purpose of disaster recovery, and providing uninterrupted services is an effective expression of the success of disaster recovery.
So, how to achieve a full range of system data care?
We know that in general, there are three main components of information systems: storage, processing services, and transmission. As a disaster recovery solution, to ensure the security of each link, in fact, it is to cast redundancy in these three links to achieve the requirements of the overall system without interruption, so as to meet the requirements of the "five nines" in the financial industry.
The redundancy of these three aspects is manifested as:
1. Storage security - non-interruption: done through data redundancy;
2. Service security - non-interruption: completed through database server redundancy;
3. Transmission security - non-interruption: done through system redundancy.
Taking gbase as an example, let's take a look at how a comprehensive disaster recovery system is built. We know that the general database information system can be divided into two categories: transactional processing system and analytical processing system. GBASE has two different database service engines: transaction processing engine and analytical processing engine, which are GBASE 8s and GBASE 8A, respectively. Let's take a look at the composition of the disaster recovery solutions of these two database server engines when processing transactional data and analytical data, so as to explain how they protect our data security, achieve meticulous care in the real sense, and achieve the mission of disaster recovery.
Storage security - no interruption.
In most cases, RAID 5 is the choice for storage-level protection. RAID 5 is indeed helpful for disk data protection, especially recovery, but if it is not a disk array, the protection at the database level will be more effective for the logical verification of important data.
In terms of transaction processing engine, gbase 8s provides disk mirroring technology to perform disk-level redundancy of important data to ensure the correctness of data not only numerical values, but also logical correctness.
In terms of analysis and processing engine, the multi-copy technology provided by GBASE 8A provides storage-level redundancy of data to ensure uninterrupted data provision.
It should be noted that disk mirroring and node data redundancy are different at the disk level and at the database level, and no logical verification is performed at the disk level.
Service security - no interruption.
Transaction Processing Engine:
Service uninterrupted is expressed in the transaction engine as a shared cluster, which we call an SSC cluster, and when the primary service server fails, the SSC alternate server can take over immediately. The number of alternative servers can be customized, and generally 2-3 alternative servers are selected as redundancy of the primary server, and the entire cluster shares a share of data. The construction situation is as follows:
SSC uses the mode of sharing disks between the alternate node and the host, which avoids the problem of duplicate data storage, saves space, and makes installation and configuration easier. In addition, it can quickly take over when the host fails, and we can easily configure multiple SSC standby nodes to achieve load balancing.
Because the SSC Alternate Node utilizes the primary server's disks and can be booted up easily and quickly, it is ideal for scale-out scenarios, and because the SSC Alternate Server is very close to the primary server (i.e., they share the same disks), it is best suited as a failover server if the primary server encounters problems.
The basic principle of how an SSC cluster works:
For SSC secondary, the primary server only needs to send the log location of the logical log page to the SSC secondary. By using the log location received from the primary server, the SSC secondary server reads the logical log pages from disk and applies them to the memory data buffer.
The SSC secondary server does not write anything to the shared disk block and does not flush data from the shared memory to disk, even if a checkpoint operation occurs. If the SSC secondary server needs to flush the shared memory data, they will write it to a temporary 'paging file' until the next checkpoint operation. At the same time, the primary server does not clear the data pages in the shared memory until it is confirmed that the SSC secondary server no longer needs the data pages to disk.
The following figure is the construction plan for the localization of the intermediate business of a rural commercial bank
The solution adopts the national production configuration to carry intermediate services such as ETC recharge business, channel service integration, and business flow control. The SSC configuration solution not only meets the requirements of 24/7 business continuity, but also has the ability of automatic and fast switching in seconds when a fault occurs, and also realizes the ability of load balancing. Reached:
High performance: The response time of 100 million row-level tables is milliseconds, which meets the peak processing capacity of the platform's business.
High availability: Automatic and fast switchover with transparent faults, with a switching time of less than 30 seconds, ensures the continuity and security of the business system.
High stability: to ensure the bank's money-related transaction business needs 7*24;
Localization: integrated solution for domestic production platform.
Analytical Processing Engine:
Service uninterrupted is expressed in the form of a federated architecture on the analytic processing engine. The GBASE 8A MPP Cluster product consists of three core components: GCware, gCluster, and Gnode. Their functions are:
gcluster: responsible for SQL parsing, SQL optimization, distributed execution plan generation, and execution scheduling.
gcware: provides operable nodes to control the data consistency status of each node when sharing information (including cluster structure, node status, node resource status, etc.) among gcluster instances and controlling data operations on multiple replicas.
Generally, gclusters and gcware components are deployed on the same physical nodes, collectively referred to as coordinators.
Coordinator provides pooled management, multiple coordinator servers are placed in the management pool for public use, and any node service problem will not affect the normal operation of the system, and there is no need to switch. The federal structure is built in the following form:
The advantages of the federated architecture - non-stop operation, assisting hundreds of financial institutions to conduct business safely and smoothly, and extremely high availability, make GBASE 8A the first choice for procurement and analysis databases in the financial industry.
GBASE NTU General Technology Sharing: