• notice
  • Congratulations on the launch of the Sought Tech site

What are the data synchronization requirements and solutions in the context of the Internet?

write in front

In today's Internet industry, especially in the current distributed and microservice development environment, in order to improve search efficiency and search accuracy, NoSQL databases such as Redis and Memcached are used in large quantities, and full-text search services such as Solr and Elasticsearch are also used. . Then, at this time, there will be a problem that we need to think about and solve: that is the problem of data synchronization! How to synchronize the data in the real-time changing database to Redis/Memcached or Solr/Elasticsearch?

Data synchronization requirements in the context of the Internet

In today's Internet industry, especially in the current distributed and microservice development environment, in order to improve search efficiency and search accuracy, NoSQL databases such as Redis and Memcached are used in large quantities, and full-text search services such as Solr and Elasticsearch are also used. . Then, at this time, there will be a problem that we need to think about and solve: that is the problem of data synchronization! How to synchronize the data in the real-time changing database to Redis/Memcached or Solr/Elasticsearch?

For example, we constantly write data to the database in a distributed environment, and we may need to read data from services such as Redis, Memcached, or Elasticsearch, Solr, etc. Then, the real-time synchronization of data in the database and various services has become an urgent problem to be solved.

Just imagine, due to business needs, we have introduced services such as Redis, Memcached or Elasticsearch, Solr. It is possible for our application to read data from different services, as shown in the following figure.

Essentially, no matter what service or middleware we introduce, the data ends up being read from our MySQL database. So, the question is, how to synchronize the data in MySQL to other services or middleware in real time?

Note: In order to better illustrate the problem, the following content is explained by taking the data in the MySQL database to be synchronized to the Solr index database as an example.

Data Synchronization Solutions

1. Synchronize in business code

After adding, modifying, and deleting, the logic code for operating the Solr index library is executed. For example the code snippet below.

public ResponseResult updateStatus(Long[] ids, String status){
    try{
        goodsService.updateStatus(ids, status);
        if("status_success".equals(status)){
            List<TbItem> itemList = goodsService.getItemList(ids, status);
            itemSearchService.importList(itemList);
            return new ResponseResult(true, "Modify the status successfully")
        }
    }catch(Exception e){
        return new ResponseResult(false, "Failed to modify status");
    }
}

advantage:

Easy to operate.

shortcoming:

Business coupling is high.

Execution efficiency becomes lower.

2. Timed task synchronization

After adding, modifying, and deleting operations are performed in the database, the data of the database is periodically synchronized to the Solr index library through scheduled tasks.

Timing task technologies include: SpringTask, Quartz.

Haha, and my open source mykit-delay framework, the open source address is: https://github.com/sunshinelyz/mykit-delay .

When executing a scheduled task here, one skill to pay attention to is: when executing the scheduled task for the first time, query the corresponding data in reverse order by the time field from the MySQL database, and record the maximum value of the time field of the current query data. When executing the scheduled task query data for the next time, you only need to query the data whose time field in the data table is greater than the time value of the last record in reverse order of time field, and record the maximum value of the time field queried by this task, so you don’t need to. Query all data in the data table again.

Note: The time field mentioned here refers to the time field that identifies the data update. That is to say, when using a scheduled task to synchronize data, in order to avoid a full table scan every time the task is executed, it is best to add a new one to the data table. Update the time field of the record.

advantage:

The operation of synchronizing the Solr index library is completely decoupled from the business code.

shortcoming:

The real-time nature of the data is not high.

3. Synchronization through MQ

After adding, modifying, and deleting operations in the database, a message is sent to MQ. At this time, the synchronization program, as a consumer in MQ, obtains messages from the message queue, and then executes the logic of synchronizing the Solr index library.

We can use the following diagram to simply identify the process of data synchronization through MQ.

We can use the following code to achieve this process.

public ResponseResult updateStatus(Long[] ids, String status){
    try{
        goodsService.updateStatus(ids, status);
        if("status_success".equals(status)){
            List<TbItem> itemList = goodsService.getItemList(ids, status);
            final String jsonString = JSON.toJSONString(itemList);
            jmsTemplate.send(queueSolr, new MessageCreator(){
                @Override
                public Message createMessage(Session session) throws JMSException{
                    return session.createTextMessage(jsonString);
                }
            });
        }
        return new ResponseResult(true, "Modify the status successfully");
    }catch(Exception e){
        return new ResponseResult(false, "Failed to modify status");
    }
}

advantage:

The business code is decoupled and can be quasi-real-time.

shortcoming:

The code for sending messages to MQ needs to be added to the business code, and the data call interface is coupled.

4. Real-time synchronization through Canal

Canal is a database log incremental parsing component open sourced by Alibaba. Canal is used to parse the log information of the database to detect changes in the table structure and data in the database, thereby updating the Solr index library.

Using Canal can achieve complete decoupling of business code, complete API decoupling, and quasi-real-time.

Canal open source address: https://github.com/alibaba/canal .


Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+