In large enterprise applications such as Sina Weibo and JD.com, the scale of Redis clusters can reach thousands of nodes.
Use one Redis instance as the host and the rest as the backup machine. The data of the master and the backup machine are exactly the same, and the master supports various operations such as writing and reading data, while the slave only supports synchronization and reading with the host data.
Since the number of masters and slaves is almost the same, commands that write data can be sent to the master for execution, and commands to read data can be sent to different slaves for execution, so as to achieve the purpose of read/write splitting (as shown in Figure 7-45).
Redis master-slave replication works as follows: After the SL**e slave service is started and connected to the master, it will actively send a sync command. After receiving the synchronization command, the master node of the master service will start the background save process, collect all the commands received to modify the dataset, and after the background process is executed, the master will transfer the entire database file to sl**e to complete a full synchronization.
The SL**e slave node service saves and loads the database file data into memory after it receives it. After that, the master node continues to transmit all the collected modification commands and new modification commands to SL**E in turn, and SL**E will execute these data modification commands this time to achieve the final data synchronization.
From redis 2Version 8 introduces the Sentinel mode, which is based on master-slave replication and uses Sentinel to achieve automatic fault recovery. Generally speaking, the appearance of the sentry mode is to solve the problems that need to be operated manually in the master-slave replication mode, and all of them are automatically operated (as shown in Figure 7-46).
Monitoring: The sentinel constantly checks that the master and slave nodes are working properly.
Automatic failover: When the primary node is not functioning properly, the sentinel performs an automatic failover operation, which selects a slave node to upgrade to the new primary node and has the other slave nodes copy the data from the new primary node instead.
Configuration provider: The client obtains the address of the current Redis service by connecting to the sentinel during initialization.
Notification: The sentinel can send the result of the failover to the client.
In earlier versions of Redis, the master-slave mode was used for clustering, but after the master went down, you needed to manually configure SL**E to change to master. redis 2.Version 8 introduces the sentry mode, which has a sentry monitoring master and sl**e, and if the master is down, the sl**e can be automatically converted to master. But there is also a problem with Sentinel Mode, that is, Redis nodes cannot be dynamically expanded, so from Redis 3X proposes a cluster pattern.
Redis Cluster defines a total of 16 384 hash slots for the entire cluster, and uses the CRC16 hash function to modulo the key, and routes the results to the corresponding nodes that have been pre-allocated with hash slots (as shown in Figure 7-47).
Redis Cluster automatically shards the data, and each master places a portion of the data to provide built-in high availability support, so that some masters can continue to work when some masters are unavailable. Each server in the Redis Cluster architecture must open two port numbers, for example, one is 6379 and the other is 16379 with 10000. Port 6379 is the entrance to the Redis server (client access port), and port 16379 is used for inter-node communication.
Cluster Bus is used to notify the IP address of the instance, cache slot information of shards, add new nodes, detect faults, update configurations, and grant failover authorization. Cluster Bus uses a binary protocol called the gossip protocol, which is used for efficient data exchange between nodes, taking up less network bandwidth and processing time.
As the name suggests, the gossip algorithm is inspired by office gossip, as long as one person gossips, all people will know the information of the gossip in a limited time, and this way is also similar to the spread of viruses, so gossip has many aliases - "gossip algorithm", "gossip algorithm", "virus infection algorithm".
The gossip process is initiated by seed nodes, and when a seed node has a state that needs to be updated to other nodes in the network, it randomly selects a few nodes around it to spread the message, and the nodes that receive the message will repeat the process until finally all nodes in the network have received the message. This process can take some time, and it is an eventual consensus protocol because although there is no guarantee that all nodes will receive the message at a certain time, but theoretically all nodes will receive the message eventually.
The biggest benefit of the gossip protocol is that even if the number of nodes in the cluster skyrockets, the load on each node does not change much. This allows Redis Cluster or Consul clusters to scale out to thousands of nodes. Each node in a Redis cluster maintains a copy of the current state of the entire cluster from its own perspective, including the following statuses:
The hash slot information that each node in the cluster is responsible for and its migration status.
The master-sl**e status of each node in the cluster.
The survivability status and suspected failure status of each node in the cluster. The main message types of gossip are as follows.
meet: Nodes in an existing cluster will send an invitation to a new node to join the existing cluster by using the cluster meet ip port command, and then the new node will start communicating with other nodes.
ping: A node sends a ping message to other nodes in the cluster at a configured interval, with its own status, the cluster metadata maintained by itself, and the metadata of some other nodes.
pong: Nodes are used to respond to ping and meet messages, which are similar in structure to ping messages, and also contain their own status and other information, and can also be used for information broadcasting and updates.
fail: After a node fails to ping, the node broadcasts a message that the node is downed to all nodes in the cluster, and the other nodes mark that the node has gone offline after receiving the message.
The scale of MySQL processing data is generally in the tens of millions, while MongoDB has mature application experience in tens of billions of data on many platforms such as Alibaba Cloud, 58.com, and Didi Chuxing.
Typical MongoDB cluster deployments include master-slave cluster mode, replica set mode, and sharded cluster mode.
In the master-slave cluster mode, the master is used to receive and write data, and the slave is used to receive data queries (as shown in Figure 7-48).
In a master-slave replicated cluster, when the primary node fails, you can only manually intervene to designate a new primary node, and the secondary node will not be automatically promoted to the primary node. At the same time, during this time, the cluster architecture can only be in a read-only state.
The master-slave mode of MongoDB is rarely used and is generally not recommended. MongoDB officially recommends that replica set patterns be used instead of master-slave replication.
A replica set has a master node and multiple slave nodes, similar to the master-slave replication model, and is responsible for similar tasks as the master-slave nodes. The main difference between replica sets and master-slave replication is that when the master node in the cluster fails, the replica set can automatically vote to elect a new master node and guide the remaining slave nodes to connect to the new master node, and this process is completely transparent to the application (as shown in Figure 7-49).
Replica sets provide redundant storage for data while improving cluster reliability. Guarantee that data is not lost due to a single point of damage by keeping copies on different machines; In addition, replica sets can cope with risks such as data loss, machine damage, and network abnormalities at any time, greatly improving the reliability of the cluster.
The primary server is important and contains a log of all data changes. However, the Replica server cluster contains all the primary server data, so when the primary server goes down, a new one on the Replica server becomes the primary server.
1 Replica set node role
Replication of mongoDB requires at least two nodes. One of them is the master node, which is responsible for handling client requests, and the rest are slave nodes, which are responsible for replicating the data on the master node. The common collocation methods of each node of MongoDB are as follows: one master and one slave, and one master and multiple slaves. The master node records the oplog of all operations on it, and the slave node periodically polls the master node to obtain these operations, and then performs these operations on its own copy of the data, so as to ensure that the data of the slave node is consistent with that of the master node.
Nodes in a MongoDB replica set are divided into the following roles:
Primary: Receives all write requests and synchronizes the changes to all secondary. A replica set can have only one primary node, and when the primary is suspended, other secondary or arbiter nodes will re-elect a primary node.
By default, read requests are sent to the primary node, and the client connection configuration can be modified to support reading from the secondary node.
Secondary: Maintains the same dataset as the primary node. When the master node hangs down, participate in the selection of the master.
Arbiter: does not keep data, does not participate in the election of the master, and only votes for the leader.
Using Arbiter can reduce the hardware requirements for data storage, as Arbiter has little to no major hardware resource requirements, but it is important to note that it should not be deployed on the same machine as other data nodes in a production environment.
2 Replica set schema patterns
A replica set can be based on two architecture modes: PSS (as shown in Figure 7-50) and PSA (as shown in Figure 7-51). In PSS (primary + secondary + secondary) mode, the total number of nodes in the replica set should be an odd number, so that the majority of the primary vote can be executed. The PSA (Primary + Secondary + Arbiter) mode requires a referee node that does not participate in the election and does not synchronize data from the primary. In PSA mode, an even number of data nodes is required.
In MongoDB replica set mode, a secondary node is automatically promoted to primary through an automatic election mechanism after the primary node fails, thereby improving the availability of the cluster.
However, if the business system needs to process massive amounts of data, hundreds or even thousands of data nodes will be needed, and a primary in the replica set mode cannot cope with a large number of high-concurrency data writes (insufficient data throughput), and the primary becomes a performance bottleneck and security risk of the system.
In this case, MongoDB sharding technology is required, which is also the large cluster deployment mode of MongoDB (as shown in Figure 7-52).
1 Character for a shard set
As shown in Figure 7-52, three important components are required to build a MongoDB sharded cluster: shard server, config server, and route server.
route server: This is an independent mongos process, mongos is the entrance of the database cluster request, all requests are coordinated through mongos, no need to add additional route selectors, mongos itself is a request distribution center, it is responsible for the corresponding data request ** to the corresponding sharding server.
In a production environment, there are usually multiple mongos as the entry point for requests, and there is no way to prevent one of them from hanging up all mongos requests. The mongos server itself does not save data, it loads the cluster information from the configserver to the cache at startup, routes the client's requests to each shardserver, aggregates the results from each shard server, and returns them to the client.
Shard server: Sharding is the process of splitting a database and spreading it across different machines. Spread data across different machines, without the need for a powerful server to store more data and handle larger loads. The basic idea is to divide the set into small pieces, which are spread out into slices, each of which is responsible for only a portion of the total data, and finally the shards are balanced by an equalizer.
Each shardserver is a MongoDB instance that stores the actual data blocks. The entire database collection is divided into multiple chunks and stored in different shard servers.
In actual production, a shard server can be composed of several machines to form a replica set to prevent the entire system from crashing due to a single point of failure of the primary node (as shown in Figure 7-53).
config server: configures the configuration of the server to store all database meta information (routes and shards). Mongos itself does not physically store the shard server and data routing information, but is cached in memory, and the configuration server actually stores the data.
The first time mongos starts or closes the reboot, it will load the configuration information from the config server, and if the configuration server information changes, it will notify all mongos to update their status, so that mongos can continue to route accurately. In actual production, there are usually multiple config servers, because it stores the metadata of the shard route, which can prevent data loss.
2 How to shard
Ensemble sharding is a solution to cope with high throughput and large data volumes. Using sharding reduces the number of requests that need to be processed per shard (reduced concurrency pressure). By scaling horizontally, clusters can also increase their own storage capacity and data throughput.
For example, if a database has a 1 TB dataset and 4 shards, each shard may hold only 256 GB of data (as shown in Figure 7-54). If there are 40 shards, then each shard may only have about 25 GB of data.
The combination of MongoDB's sharding and replication capabilities ensures that data is sharded across multiple servers, and that each copy of data is backed up accordingly, so that if one server goes down, other repositories can immediately pick up the broken parts and continue to work.
In a shard server, MongoDB will still divide the data into chunks, and each chunk represents a part of the data inside the shard server. Chunk is produced for the following two purposes.
splitting(**When the size of a chunk exceeds the chunk size in the configuration (64MB by default), the mongoDB daemon will split the chunk into smaller chunks to avoid chunks that are too large (as shown in Figure 7-55).
Balancing: In MongoDB, the balancer is a background process that is responsible for the migration of chunks to balance the load on the various shard servers. The chunk size is 64MB by default, and it is best to select the appropriate chunk size (for example, 128MB or 256MB) on the production library. MongoDB automatically splits and migrates chunks (as shown in Figure 7-56).