• notice
  • Congratulations on the launch of the Sought Tech site

What is MySQL semi-synchronization?


Years later, when I was working on the two sides of Tencent, the first question I asked after writing the algorithm was, what is the semi-synchronization of MySQL? I was stunned.I thought it was a two-stage submission of MySQL? After I confirmed it, it was not a two-stage submission.Then the interviewer saw that I didn't even know what I was asking, so I skipped this question and talked about the next one. So this time to summarize the knowledge content of this part, the text content is more, it may be a little boring, but it is still more interesting for those who are interested in this area.

MySQL master-slave replication

In our large-scale projects, we generally use MySQL's replication function to create MySQL master-slave clusters. Mainly, data can be synchronized by configuring one or more standby databases for the server. The function of replication is not only conducive to the construction of high-performance applications, but also the basis for high availability, scalability, disaster recovery, backup, and data warehouse work .

To put it simply, the separation of reads and writes through MySQL's master-slave replication improves the performance of the business system and optimizes the user experience compared to the read and write of a single-point database. In addition, the high availability of the database is realized through master-slave replication.When the master node MySQL is down, the slave database can be used to top it.

Replication methods supported by MySQL

MySQL supports three replication methods:

  • Statement-based replication (also known as logical replication) mainly refers to that the SQL statement executed on the primary database will be executed repeatedly on the secondary database . MySQL uses this kind of replication by default, which is more efficient. But there are also certain problems.If functions such as uuid() and rand() are used in SQL, the data copied to the slave library will be biased.

  • Row-based replication refers to copying the updated data to the slave database instead of executing a side statement . Only supported from MySQL 5.1.

  • Hybrid replication uses statement replication by default.When it is found that the statement cannot accurately replicate data (for example, the statement contains functions such as uuid(), rand()), line-based replication is used .

Principle of master-slave replication

An overview of MySQL replication principles can be roughly divided into these three steps

  1. Change the data on the main library and record it in the Binary Log.

  2. The slave library copies the log on the master library to its own relay log (Relay Log).

  3. The standby database reads the events in the relay log and replays them on the standby database data.

The main process is as follows:
MySQL replication process
Let's talk about the three steps of copying in detail:

The first step is to record the binary log on the main library.First, the main library must turn on the binlog logging function and authorize the slave to have access to it . One thing to note here is that the order in the binlog log is recorded according to the order in which the transaction is submitted instead of the execution order of each statement.

Step 2: Copy binLog from the library to its local RelayLog. First, the slave library will start a worker thread, called the I/O thread.The I/O thread establishes an ordinary client connection with the main library, and then a special binary dump (binlog dump) thread is started on the main library.The storage thread will read the events in the binlog . After catching up with the main library, it will go to sleep and will continue to be awakened until the main library informs that there is a new update statement.
In this way, through the I/O thread on the slave library and the binlog dump thread on the main library, the binlog data is transferred to the relaylog on the slave library.

Step 3: Start a SQL thread from the database, read events from the relaylog and execute them in the standby database, so as to update the standby database data.

This replication architecture realizes the decoupling of acquisition events and replay events, and the running I/O thread can work independently of the SQL thread. However, this architecture also limits the replication process.The most important point is that queries running concurrently on the main database can only be executed serially in the standby database, because there is only one SQL thread to replay events in the relay log.

Speaking of the serialized execution of master-slave replication, I thought of a problem I encountered in my work before.There is such a business scenario.One of our operations is to initialize a batch of data, and the data is from one The data is obtained from the interface of the external system, and then I obtain data from the interface of the external system in parallel through multiple threads in the thread pool.After each thread obtains the data, it is directly inserted into the database. Then after all the data is stored in the database, the batch query is executed, and the data that has just been inserted into the database is queried and placed in ElasticSearch. As a result, the data put into the ES is always incomplete every time.Later, after studying for a long time, it will not work.In the end, the problem is solved by the main database that allows the query to go. At the time, I didn't know that this problem was caused by the serialization of MySQL master-slave replication.

MySQL master-slave replication mode

MySQL's master-slave replication actually supports multiple replication modes such as asynchronous replication , semi-synchronous replication , and GTID replication .

Asynchronous mode

The default replication mode of MySQL is the asynchronous mode, which mainly refers to the I/O thread on the MySQL master server.When data is written to the binlong, it is directly returned to the client.The data update is successful, regardless of whether the data is transmitted to the slave server or not.Write to the relaylog . In this mode, copying data is actually risky.Once the data is only written to the binlog of the master library and it is not too urgent to synchronize to the slave library, it will cause data loss.

But this mode is indeed the most efficient, because the function of changing data is only done in the main library, and copying data from the library will not affect the data writing operation of the main library.
Asynchronous replication
I also said above that although this asynchronous replication mode is highly efficient, the risk of data loss is high, so there is a semi-synchronous replication mode that will be introduced later.

Semi-synchronous mode

Starting from version 5.5, MySQL supports a semi-synchronous master-slave replication mode in the form of a plug-in. What is the semi-synchronous master-slave replication model?
Here is a way of comparison to illustrate:

  • Asynchronous replication mode : We have already introduced the asynchronous replication mode.After the main library executes the transaction submitted by the client, as long as the execution logic is written to the binlog, it will immediately return to the client, regardless of whether the slave library is executed.If successful, there will be a hidden danger, that is, when the binlog executed by the main library has not been synchronized to the slave library, the main library hangs, and the slave library will be forcibly promoted to the main library at this time, which may cause data Lost.

  • Synchronous replication mode : After the master library has executed the transaction submitted by the client, it needs to wait until all the slave libraries have also executed the transaction before returning to the client for successful execution. Because you have to wait until all the slave libraries are executed, the execution process will be blocked, waiting for the return result, so there will be a serious impact on performance.

  • Semi-synchronous replication mode : Semi-synchronous replication mode can be said to be a replication mode between asynchronous and synchronous.After executing the transaction submitted by the client, the main library must wait for at least one slave library to receive binlog and data Write to the relay log before returning to the client a successful result. The semi-synchronous replication mode improves data availability than the asynchronous mode, but also produces a certain performance delay, requiring at least a round-trip time for a TCP/IP connection.

In the semi-synchronous replication mode, you can clearly know that after a transaction is successfully committed, the transaction will exist in at least two places, one is the main library and the other is one of the slave libraries. The main principle is that when the master's dump thread informs the slave library, an ACK mechanism is added, that is, it will confirm whether the slave library has received the transaction identifier.The master's dump thread not only sends the binlog to the slave library, but also is responsible for it.Receive slave's ACK. When an exception occurs, Slave does not have an ACK transaction, then it will automatically be downgraded to asynchronous replication, and then automatically become semi-synchronous replication after the exception is repaired

The process of MySQL semi-synchronous replication is as follows:

MySQL semi-synchronous replication mode

The hidden dangers of semi-synchronous replication

The semi-synchronous replication mode also has certain data risks.When the transaction is waiting for the ACK from the slave library after the main library is submitted, if the Master is down, there will be two problems at this time.

  • The transaction has not been sent to the Slave : If the transaction has not been sent to the Slave, the client will resubmit the transaction after receiving the failure result.Because the resubmitted transaction is executed on the new Master, it will be executed successfully.If the previous Master recovers, it will join the cluster as a Slave.At this time, the previous transaction will be executed twice.The first time was executed when this machine was used as the Master, and the second time it was executed.It is synchronized from the main library after Slave.

  • The transaction has been synchronized to the Slave : Because the transaction has been synchronized to the Slave, when the client receives the failure result and submits the transaction again, then the transaction will be executed twice on the current Slave machine.

In order to solve the above hidden dangers, MySQL has added a new semi-synchronous method starting from version 5.7. The execution process of the new semi-synchronous mode is to move the step of Storage Commit " to the back of " Write Slave dump ". This ensures that only after the Slave transaction ACK, the main library transaction is submitted . MySQL 5.7.2 version adds a new parameter for configuration:, rpl_semi_sync_master_wait_pointthis parameter has two values configurable:

  • AFTER_SYNC : When the parameter value is AFTER_SYNC, it means that the new semi-synchronous replication method is adopted.

  • AFTER_COMMIT : It represents the semi-synchronous replication mode of the previous old method.

New semi-synchronous replication
MySQL started from version 5.7.2, the default semi-synchronous replication method is the AFTER_SYNCmethod, but the solution is not a panacea, because the AFTER_SYNCmethod is to submit the transaction of the main database after the transaction is synchronized to the Slave, if the main database waits for the successful synchronization of the Slave process When the middle master hangs, the master transaction commit fails, and the client also receives the result of the transaction execution failure, but the content of binLog has been written to the Relay Log on the Slave.At this time, the Slave data will be more.But generally the problem is not serious if there is more data, more is better than less. MySQL, when there is no way to solve the problem of distributed data consistency, what it can guarantee is not to lose data.It is better to have more data than to lose data.

Here are a few parameters of the semi-synchronous replication mode:

mysql> show variables like'%Rpl%';
+-------------------------------------------+----- -------+
| Variable_name | Value |
+-------------------------------------------+----- -------+
| rpl_semi_sync_master_enabled | ON |
| rpl_semi_sync_master_timeout | 10000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
| rpl_stop_slave_timeout | 31536000 |
+-------------------------------------------+----- -------+
- Semi-synchronous replication mode switch
- Semi-synchronous replication, timeout period, in milliseconds, when this time is exceeded, it will automatically switch to asynchronous replication mode
- Introduced in MySQL 5.7.3, this variable sets how many slave responses the master needs to wait before returning to the client.The default is 1.
- This value represents whether the number of slaves in the current cluster can still meet the currently configured semi-synchronous replication mode.The default is ON.When the semi-synchronous replication mode is not satisfied, all slaves switch to asynchronous replication, and this value will also become OFF
- Represents the way that semi-synchronous replication commits the transaction, after 5.7.2, the default is AFTER_SYNC
GTID mode

MySQL has introduced the GTID replication mode since version 5.6.GTID is the abbreviation of global transaction identifier (global transaction identifier).GTID is composed of UUID+TransactionId.UUID is the unique identifier of a single MySQL instance.When the MySQL instance is started for the first time A server_uuid will be automatically generated and written into the auto.cnf (mysql/data/auto.cnf) file in the data directory by default. TransactionId is the number of transactions executed on the MySQL, which increases as the number of transactions increases. This ensures that the GTID is globally unique in a set of replications .

In this way, the GTID can clearly see from which instance the current transaction was submitted and how many transactions were submitted.

Look at the specific form of a GTID:

mysql> show master status;
+-----------+----------+--------------+----------- -------+------------------------------------------ -+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------+----------+--------------+----------- -------+------------------------------------------ -+
| on.000003 | 187 | | | 76147e28-8086-4f8c-9f98-1cf33d92978d:1-322|
+-----------+----------+--------------+----------- -------+------------------------------------------ -+
1 row in set (0.00 sec)

GTID: 76147e28-8086-4f8c-9f98-1cf33d92978d:1-322
UUID: 76147e28-8086-4f8c-9f98-1cf33d92978d
TransactionId: 1-322

How GTID works

Due to the uniqueness of GTID in a set of master-slave replication clusters, it is ensured that each GTID transaction is only executed once on one MySQL.
So how is this mechanism implemented? What is the principle of GTID?

When the slave server connects to the master server, the GTID it has executed ( Executed_Gtid_Set: the transaction code that has been executed ) and the GTID ( Retrieved_Gtid_Set: the transaction number that the slave has received from the master database ) are passed to the master server. The master server will send the missing GTID and the corresponding transactionID from the server to the slave server, allowing the slave server to complete the data. When the main server goes down, the conf server with the most successful data synchronization will be found, and it will be directly promoted to the main server. If it is forcibly required that a slave server that is not the most successful in synchronization is required to be the master, the change command will be used to complete the GTID, and then the compulsory machine will be promoted as the master.

The main data synchronization mechanism can be divided into these steps:

  • When the master updates the data, it produces GTID before the transaction and records it in the binlog together.
  • The i/o thread on the slave side writes the changed binlog to the relay log.
  • The sql thread obtains the GTID from the relay log, and then compares whether there is a record in the binlog on the slave side.
  • If there is a record, it means that the GTID transaction has been executed, and the slave will ignore the GTID.
  • If there is no record, Slave will execute the GTID transaction from the relay log and record it in the binlog.
  • In the parsing process, it is judged whether there is a primary key, if there is no primary key, the secondary index is used, and if there is no secondary index, the entire table is scanned.

The initial structure is as shown in the figure below.
When the Master goes down, it will evolve into the figure below.
GTID copy
From the above figure, we can see that when the Master hangs up, Slave-1 finishes the Master’s transaction, and Slave-2 delays a bit, so the master’s transaction is not completed.At this time, upgrade Slave-1 as the main and Slave-2.After the new master (Slave-1) is connected, the latest GTID is passed to the new master, and then Slave-1 starts to send transactions to Slave-2 from the next GTID of this GTID. This self-seeking mode of replication location reduces the possibility of transaction loss and the time of failure recovery.

The advantages and disadvantages of GTID

Through the above analysis, we can conclude that the advantages of GTID are:

  • Each transaction corresponds to an execution ID, and a GTID will only be executed once on a server;
  • GTID is used to replace the traditional copy method.The biggest difference between GTID copy and normal copy mode is that there is no need to specify the binary file name and location;
  • Reduce manual intervention and reduce service failure time.When the host hangs up, use software to upgrade one of the many standby computers as the host;

The disadvantages of GTID are also obvious:

  • First of all, it does not support non-transactional storage engines;
  • Does not support create table...select statement replication (the main library directly reports an error); (principle: two SQLs will be generated, one is DDL to create table SQL, the other is insert into SQL to insert data; DDL will cause automatic submission, so This SQL requires at least two GTIDs, but in GTID mode, only one GTID can be generated for this SQL)
  • One SQL is not allowed to update a transaction engine table and a non-transaction engine table at the same time;
  • In a MySQL replication group, all GTIDs are required to be turned on or off.
  • Enabling GTID requires restarting (except mysql5.7);
  • After turning on GTID, the original traditional replication method is no longer used (unlike semi-synchronous replication, after semi-synchronous replication fails, it can be downgraded to asynchronous replication);
  • The create temporary table and drop temporary table statements are not supported;
  • Does not support sql_slave_skip_counter;

In fact, this part of GTID has a lot of content.If you want to study in depth, you can read this article .
Finally, a few prerequisites for opening GTID:

  • For MySQL 5.6 version, add in the my.cnf file:

gtid_mode=on (Required) #Enable gtid function
log_bin=log-bin=mysql-bin (required) #Open binlog binary log function
log-slave-updates=1 (required) #You can also write 1 as on
enforce-gtid-consistency=1 (required) #You can also write 1 as on
  • For MySQL 5.7 or higher, add in the my.cnf file:

gtid_mode=on (required)
enforce-gtid-consistency=1 (required)
log_bin=mysql-bin (optional) #High availability switch, it is best to enable this function
log-slave-updates=1 (optional) #High availability switch, it is best to turn on this function


Technical otaku

Sought technology together

Related Topic


Leave a Reply