• notice
  • Congratulations on the launch of the Sought Tech site

RocketMQ message loss scenario analysis and how to solve it!

Since MQ is used in the project, it is inevitable to consider the problem of message loss. In some scenarios involving monetary transactions, message loss is still fatal. So what kinds of message loss scenarios exist in RocketMQ?

Let's start with the simplest consumption flow chart:


The above picture roughly includes the following scenarios:

  • The producer generates a message and sends it to RocketMQ
  • After RocketMQ receives the message, it must be stored in the disk, otherwise the data will be lost after power failure or downtime
  • The consumer obtains the message consumption from RocketMQ. After the consumption is successful, the whole process ends.

All three scenarios may result in message loss, as shown in the following figure:


1. In scenario 1, when the producer sends a message to Rocket MQ, if there are problems such as network jitter or abnormal communication, the message may be lost
2. In scenario 2, the message needs to be persisted to the disk. At this time, there are two situations that cause the message to be lost.

  • In order to reduce the IO of the disk, RocketMQ will first write the message to the os cache, instead of writing it directly to the disk. The consumer gets the message from the os cache, which is similar to getting the message directly from the memory. The speed is faster, and after a while The time will be asynchronously flushed to the disk by the os thread, and the persistence of the message is truly completed at this time. During this process, if the message has not been flushed asynchronously and the Broker in RocketMQ goes down, the message will be lost.
  • If the message has been flushed to the disk, but the data has not been backed up, once the disk is damaged, the message will also be lost

3. The consumer successfully obtains the message from RocketMQ. When the message has not been completely consumed, it informs RocketMQ that I have consumed the message, and then the consumer goes down, but RocketMQ thinks that the consumer has successfully consumed the data, so Data is still lost.
So how to guarantee zero loss of messages?


1. The solution to ensure that messages are not lost in scenario 1 is to use the transaction mechanism that comes with RocketMQ to send messages. The general process is as follows:

  • First, the producer sends a half message to RocketMQ. At this time, the consumer cannot consume the half message. If the half message fails to be sent, the corresponding rollback logic is executed.
  • After the half message is sent successfully, and RocketMQ returns a successful response, the core link of the producer is executed.
  • If the producer's own core link fails, roll back and notify RocketMQ to delete the half message
  • If the core link of the producer is successfully executed, the RocketMQ commit half message is notified so that the consumer can consume this data

There are also some RocketMQ that have not received the response from the producer for a commit/rollback operation for a long time. For details of the callback producer interface, you can refer to:


After the RocketMQ transaction is used to successfully send the producer's message to RocketMQ, it can be guaranteed that the message will not be lost at this stage
2. To ensure that messages are not lost in scenario 2, you first need to change the asynchronous flushing strategy of the os cache to synchronous flushing. In this step, you need to modify the Broker configuration file and change the flushDiskType to the SYNC_FLUSH synchronous flushing strategy. The default is ASYNC_FLUSH asynchronously flushes the disk.
Once the synchronous flushing returns successfully, it must be guaranteed that the message has been persisted to the disk; in order to ensure that the disk is damaged without losing data, we need to use a master-slave organization for RocketMQ, cluster deployment, and the data in the leader is in multiple Followers All are backed up to prevent a single point of failure.
3. In scenario 3, the message arrives at the consumer, and RocketMQ can ensure that the message will not be lost in the code

//Register a message listener to process messages
consumer.registerMessageListener(new MessageListenerConcurrently() {
     public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext context){
      //Open the child thread to process the message asynchronously
      new Thread() {
    public void run() {
     // process the message
         return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;

If the newly opened sub-thread processes the message asynchronously, it may happen that the message has not been consumed, the consumer tells RocketMQ that the message has been consumed, and the message is lost due to the downtime.
Using the above set of solutions can guarantee zero message loss when using RocketMQ, but performance and throughput will also be greatly reduced

  • Using the transaction mechanism to transmit messages will take many more steps than ordinary message transmission and consume performance
  • Compared with asynchronous brushing, one is stored in disk and the other is stored in memory, and the speed is not an order of magnitude at all.
  • For master-slave institutions, the leader needs to synchronize data to the follower
  • You cannot consume asynchronously when consuming, you can only wait for the consumption to complete and then notify RocketMQ that the consumption is complete

Zero message loss is a double-edged sword. If you want to use it well, it depends on the specific business scenario. It is best to choose an appropriate solution.


Technical otaku

Sought technology together

Related Topic


Leave a Reply