• notice
  • Congratulations on the launch of the Sought Tech site

Understand the principle of Redis data storage, don't just set and get

My core module is shown in Figure 1-10.

Figure 1-10

  • Client client, officially provides a client developed in C language, which can send commands, performance analysis and testing, etc.

  • The network layer event-driven model, based on I/O multiplexing, encapsulates a short, high-performance ae library, the full name is a simple event-driven programming library.

  • In the ae library, I adapt the four I/O multiplexing implementations of epoll , select, kqueue, and evport through the aeApiState structure , so that the upper layer caller cannot perceive the I/O multiplexing in different operating systems. Differences in multiplexing.

  • Events in Redis can be divided into two categories: one is network connection, read, and write events; the other is time events, which are events triggered at a specific time, such as performing rehash operations regularly.

  • The command parsing and execution layer is responsible for executing various commands of the client, such as SET, DEL, GET, etc.

  • Memory allocation and recovery, allocate memory for data, and provide different data structures to store data.

  • The persistence layer provides two persistence strategies, RDB memory snapshot file and AOF, to achieve data reliability.

  • The high-availability module provides replicas, sentinels, and clusters to achieve high availability.

  • Monitoring and statistics, providing some monitoring tools and performance analysis tools, such as monitoring memory usage, benchmark testing, memory fragmentation, bigkey statistics, slow instruction query, etc.

After mastering the overall architecture and modules, enter the src source code directory and use the following command to execute the redis-server executable program to start Redis.

./redis-server ../redis.conf1

I will abstract each started service into a redisServer, and the source code is set in the redisServer structure of server.h.

This structure contains the database instance storing key-value pairs, redis.conf file path, command list, loaded Modules, network monitoring, client list, RDB AOF loading information, configuration information, RDB persistence, master-slave replication, client Side cache, data structure compression, pub/sub, Cluster, Sentinel and other necessary information for the operation of a series of Redis instances.

There are many structure fields, so I won’t list them one by one. Some core fields are as follows.

truct redisServer {
    pid_t pid; /* Main process pid. */
    pthread_t main_thread_id; /* main thread id */
    char *configfile; /*redis.conf file absolute path*/
    redisDb *db; /* RedisDb instance for storing key-value pair data */
   int dbnum; /* DB number */
    dict *commands; /* The list of commands that the current instance can handle, the key is the command name, and the value is the entry to execute the command */
    aeEventLoop *el;/* event loop processing */
    int sentinel_mode; /* true means start as a sentinel instance */

   /* network related */
    int port;/* TCP listening port */
    list *clients; /* list of clients connected to the current instance */
    list *clients_to_close; /* list of clients to be closed */

    client *current_client; /* The client currently executing the command */

1.2.1 Data storage principle

Among them, the redisDb *db pointer is very important. It points to a redisDb array with a length of dbnum (default 16), which is the core of the entire storage. I use this thing to store key-value pairs.


The redisDb structure is defined as follows.

typedef struct redisDb {
   dict *dict;
   dict *expires;
   dict *blocking_keys;
   dict *ready_keys;
   dict *watched_keys;
   int id;          
   long long avg_ttl;
   unsigned long expires_cursor;
   list *defrag_later;
   clusterSlotToKeyMapping *slots_to_keys;
} redisDb;

dict and expires

  • dict and expires are the two most important attributes. The underlying data structure is a dictionary, which is used to store key-value pair data and key expiration time respectively.

  • expires, the underlying data structure is a dict dictionary, which stores the expiration time of each key.

MySQL: "Why separate storage? "

Good question, the reason why they are stored separately is that the expiration time is not set for each key, it is not an inherent property of the key-value pair, although it requires two lookups after separation, it can save memory overhead.

blocking_keys and ready_keys

The underlying data structure is a dict dictionary, which is mainly used to implement blocking commands such as BLPOP.

When the client uses the BLPOP command to block and wait for list elements to be taken out, I will write the key into blocking_keys, and the value is the blocked client.

When the PUSH command is received next time, I will first check whether there is a key waiting for blocking in blocking_keys, and if it exists, put the key in ready_keys. During the next Redis event processing, it will traverse the ready_keys data and take it out from blocking_keys Blocked client response.


It is used to realize the watch command and store the key of the watch command.


The unique ID of the Redis database, a Redis service supports multiple databases, 16 by default.


Used to count the average expiration time.


Counts the number of times the expired event loop executes.


Holds a list of keys that are defragmented one by one.


It is only used in Cluster mode. When using Cluster mode, there can only be one database db 0. slots_to_keys is an array used to record the mapping relationship between keys and hash slots in cluster mode.


Redis uses a dict structure to store all key-value pairs (key-value) data, which is a hash table, so the time complexity of key query is O(1).

The so-called hash table, we can compare it to HashMap in Java, which is actually an array, and each element of the array is called a hash bucket.

The source code of dict structure is defined in dict.h.

struct dict {
   dictType *type;

   dictEntry **ht_table[2];
   unsigned long ht_used[2];

   long rehashidx;

   int16_t pauserehash;
   signed char ht_size_exp[2];

In the structure of dict, there are three very important structures: dictType *type, **ht_table[2], and long rehashidx.

  • type stores functions such as hash function, copy of key and value;

  • ht_table[2], an array with a length of 2, uses ht_table[0] to store key-value pair data by default. I would use ht_table[1] for progressive reahsh operation.

  • rehashidx is an integer value, used to mark whether the rehash operation is being performed, and -1 means no rehash is being performed. If rehash is being performed, its value indicates the index of the dictEntry array in ht_table[1] where the current rehash operation is performed.

  • pauserehash indicates the state of rehash, when it is greater than 0, it means that rehash is paused, and if it is less than 0, it means that there is an error.

  • ht_used[2], an array with a length of 2, indicates how many key-value pair entities (dictEntry) are stored in each hash bucket. The larger the value, the higher the probability of hash collision.

  • ht_size_exp[2], the size of each hash table, that is, the number of hash buckets.

Focus on the ht_table array. Each position of the array is called a hash bucket, which stores all key-value pairs. The type of each hash bucket is dictEntry.

MySQL: "Redis supports so many data types, how to save the hash bucket? "

His secret lies in dictEntry, each dict has two ht_tables, which are used to store key-value pair data and realize progressive rehash.

The dictEntry structure is as follows.

typedef struct dictEntry {
    void *key;
    union {
       // pointer to actual value
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    // Linked list generated by hash table conflict
    struct dictEntry *next;
    void *metadata[];
} dictEntry;

  • *key is a pointer to the key of the key-value pair, pointing to an sds object, and the keys are all string types.

  • v is the value of the key-value pair, which is a union (union). When its value is uint64_t, int64_t or double number type, no additional storage is needed, which is beneficial to reduce memory fragmentation. (In order to save memory, I broke my heart) When the value is a non-numeric type, it is stored with a val pointer.

  • *next points to another dictEntry structure, and multiple dictEntry can be connected into a linked list through the next pointer. It can be seen from here that ht_table uses the chain address method to handle key collisions: when multiple different keys have the same hash value, A hash table connects these keys with a linked list.

The hash bucket does not save the value itself, but a pointer to a specific value, thus realizing the requirement that the hash bucket can store different data types .


The value pointed to by the *val pointer of dictEntry is actually a redisObject structure, which is a very important structure.

My key can only be a string type, and the value can be a data type such as String, Lists, Set, Sorted Set, and hash table.

The values of key-value pairs are all packaged into redisObject objects, and redisObject is defined in server.h.

typedef struct redisObject {
   unsigned type:4;
   unsigned encoding:4;
   unsigned lru:LRU_BITS;
   int refcount;
   void *ptr;
} robj;

  • type : Records the type of the object, such as string, set, hash, Lis, Sorted Set, etc., according to the type to determine which data type it is and what API operation to use.

  • encoding : Encoding method, indicating the specific data structure of the data type pointed to by ptr, that is, what data structure is used by this object as the underlying implementation to save data. There are obvious differences in the memory usage of the same object using different encodings, and internal encoding is very important for memory optimization.

  • lru:LRU_BITS : The time when the object was last accessed under the LRU policy. If it is an LFU policy, the lower 8 bits indicate the access frequency, and the upper 16 bits indicate the access time.

  • refcount : Indicates the reference count. Since the C language does not have the memory recovery function, Redis added this attribute in its own object system. When the reference count of an object is 0, it means that the object has not been referenced by any object. Then it can be garbage collected.

  • ptr pointer : Points to the underlying implementation data structure of the object, a pointer to the value .

As shown in Figure 1-11, it is a relationship diagram of redisDb, dict, dictEntry, and redisObejct:

Figure 1-11

Note that at the beginning, I only used the ht_table[0] hash table to read and write data, and ht_table[1] pointed to NULL. When the capacity of this hash table is insufficient, the expansion operation is triggered, and a larger hash table will be created at this time ht_table[1].

Then I will migrate the data of ht_table[0] to ht_table[1] in a progressive rehash method. After all the migration is completed, I will modify the pointer so that ht_table[0] points to the expanded hash table and recycle the original The hash table, ht_table[1] again points to NULL.


Technical otaku

Sought technology together

Related Topic


Leave a Reply