• notice
  • Congratulations on the launch of the Sought Tech site

Why does Kafka, which is also a distributed architecture, need a Leader and Redis doesn't?

The view that Redis does not need a leader is actually ambiguous and inaccurate. The essence of the problem is actually related to data fragmentation and data replication consistency.

1. Redis Cluster Architecture
Beginning with Redis 3.0, Redis introduced a decentralized cluster architecture, using the pre-sharding mode, all nodes in a cluster correspond to a total of 16384 slots, and a key is written to , first take the hashcode for the key, and then calculate the modulo to map to a specific node. The deployment architecture is shown in the following figure:


The data stored in each of the above nodes is different, that is, each node stores a part of the overall data, and in order to achieve decentralization, each node needs to store the node information (ie, the routing information of the key) corresponding to all keys in the cluster. , so that when the client sends a request to query key1 to the redisA node, but the key1 is actually stored in the redisB node, then the A node needs to route the node to the node that actually stores the key, and implement a redirection internally to achieve access Any node can query the stored value.

In the above architecture, there is no need for a leader, which is also the key to the so-called cluster decentralization design idea, but the problem is, if any node in the cluster goes down and becomes unavailable, the data stored in the node will be lost. , in order to solve this problem, a master-slave architecture is usually introduced, and the architecture diagram is as follows:


The specific method is to introduce one or more slave nodes for each master node to copy the data of the master node. Each dotted box in the above figure represents a replication group, also called a replica, and the data expectations between replicas Exactly.

How to ensure data consistency in a master-slave architecture? Usually the interaction between the master-slave cluster and the client is as follows:

The client sends a write request to the master, and when the master node writes successfully, it returns to the client, while the slave node replicates data asynchronously. There is a delay in the master and slave, and there is a risk of data loss when the master node goes down.
The client sends a write request to the master. After the master node writes successfully, it needs to wait for the slave node to write successfully before returning success to the client. This method will increase the delay and increase the master-slave data delay, but it is still unavoidable. The master-slave data is inconsistent.
In the above two cases, the consistency of data on the master and slave nodes cannot be guaranteed.

Why can't synchronous double-write ensure data consistency?


The client will only receive a response after the master and slave are successfully written synchronously. At first glance, it can provide consistency, but it is not. Imagine, for example, the data of key1 is first written to the master node, and then written to the slave node. If an error occurs during the process, the client will receive a write failure, but at this time, when it goes to the master to query the data of key1, it can query the data of the last request failure, that is, although the client receives the write failure, the master However, the node was successfully written, resulting in inconsistency in data semantics.

That is to say, the master-slave synchronization architecture has natural shortcomings in the confirmation mechanism of master-slave nodes and clients. In order to solve this problem, distributed copy data strong consistency protocols such as Raft have appeared.

2. Strong consistency protocol between replicas
In order to solve the high availability of data and avoid single point of failure, data is usually synchronized into multiple copies. High availability is solved, but it brings another problem. How to ensure consistency, in order to solve this problem, there are consistency protocols such as raft and paxos.

The data replication diagram of the Raft protocol is as follows:

1616439148 (1).png

In the figure, the client initiates a write request to the Raft protocol cluster, and the leader node in the cluster processes the write request. First, the data is stored in the leader node, and then it needs to be broadcast to all its slave nodes, and the slave node receives the data push from the leader node. Store the data, and then report the storage result to the master node. The leader node will arbitrate the storage result of the log. If more than half of the number of clusters have successfully stored the data, the master node will return the write success to the client. Otherwise the write to the client fails.

And, if only the master node writes successfully, but other slave nodes do not, even if the data is written to the leader node, this part of the data is visible to the client.

The Raft protocol is mainly divided into two parts: leader node election and log replication.

Leader node election: elect a leader node from the cluster to process data reading and writing. The slave node is only responsible for synchronizing data from the leader node, and if the leader node goes down, the election will be automatically triggered and a new leader node will be elected.

Log replication: After the data is written to the master node, the master node needs to forward the data to the slave node. Only when more than half of the nodes in the cluster successfully write a piece of data will it return success to the client.

The implementation details of the Raft protocol are not going to be discussed in depth in this article. If you are interested, you can check the author's column on the Raft protocol at the end of the article. This article only analyzes why the Raft protocol can achieve data consistency from the design level.

The author believes that the Raft protocol can ensure data consistency, mainly by introducing the global log sequence number and committed pointer.

2.1 Introduced the global log sequence number
In order to facilitate the management and identification of logs, the raft protocol numbers the messages one by one. When each message reaches the master node, a globally unique incremental number will be generated, which can be quickly judged according to the log sequence number. Whether the data is consistent during the master-slave replication process.

2.2 Committed pointer
We know that the log is written to the master node first, and then propagated. This log cannot be considered to be successfully written until more than half of the nodes in the cluster are successfully written, even though it has been stored in the master node. To make this log invisible to the client, the Raft protocol introduces a committed pointer, and only data less than or equal to the committed data can be perceived by the client.

A sufficient and necessary condition for a log to be submitted is that the log is successfully appended by more than half of the nodes in the cluster before it can be considered submitted and returned to the client successfully. Data consistency semantics between .

In order to give everyone a deeper understanding of the data consistency problem of the Raft protocol, the following questions are given. Will master-slave switching cause Raft to lose data?

For example, there are 3 nodes in a Raft protocol, and the writing of each node is as follows:

Node1: 100

Node2: 89

Node3: 88

Among them, Node1 is the leader node. If the Node1 node goes down, the entire cluster triggers a re-election, will data be lost?

The answer is definitely not.

First of all, we must first understand that in the above state, the submitted pointer is 89, because two nodes in the cluster have successfully written 89, that is, the data returned to the client successfully is also the data with the serial number of 89. During the election process , Node3 cannot be elected as the leader, because the data stored in Node3 is smaller than the data stored in Node2. When Node2 is elected as the new leader, Node3 will synchronize data with Node2.

3. Summary
This article starts from a non-critical problem on Zhihu, and explores the essence of the problem: data fragmentation and high availability of distributed data storage (avoiding single point of failure), which leads to new problems (between data copies) consistency)


Technical otaku

Sought technology together

Related Topic


Leave a Reply