write in front
Many small partners go to large factories for interviews, and almost always encounter some open-ended questions. These open-ended questions have no fixed answers, but they can truly reflect the interviewer's more real system design capabilities and technical skills. If you answer perfectly, then, through this open-ended question, you can stand out from the crowd of interviewers. Today, let's talk about it together. When we go to a large factory for an interview, a relatively common open-ended question: If you were asked to design a high-concurrency message middleware, what would you do?
Knowledge points involved in message middleware
If you want to design a message middleware with high concurrency, you must first understand the specific knowledge points involved in message middleware. In general, designing a good message middleware requires at least the following conditions:
Producer and consumer model.
Distributed architecture is supported.
High availability of data.
Message data is not lost.
Next, we will talk about these technical points separately for message middleware.
Producer Consumer Model
I believe that many small partners have a good understanding of the producer and consumer models. Simply put: the message middleware can enable other applications to produce messages, and can also enable other applications to consume corresponding messages.
For the producer and consumer models, we need to consider more issues. Next, I will guide you step by step to guide you to think.
First, let's think about this question: if the producer produces a message, how should the message middleware store the corresponding data? stored in memory? stored on disk? Or is it both in memory and on disk?
If the message data is stored in both memory and disk, what should we do with this data? After the producer delivers the message to the message middleware, we immediately write the data to disk? Or does the data reside in memory first and then flush to disk every once in a while? If it is flushed to the disk every once in a while, then we have to consider the partitioning of disk files, that is, how many disk files need to be divided into message data? (You can't put all the data in one disk file). If it needs to be split into multiple disk files, what are the rules for splitting?
The above issues are all issues that we need to consider when designing a message middleware. However, this is only a small part of the problem. If you want to stand out in an interview, then you need to continue reading, and there are some important points to pay attention to.
If the file is divided into multiple disk files according to certain rules, is it necessary to manage metadata to identify the specific information of the data (just like the NameNode in Hadoop stores the metadata information of the DataNode, the NameNode node Through these metadata information, the DataNode nodes can be better managed)? These metadata can include: the offset of the message data, or the unique ID of the message data.
After considering the storage of data, we also need to consider: how does the message middleware deliver the data to the corresponding consumers?
When designing producers and consumers, there is another very important question that we need to consider: what consumption mode do we use when designing message middleware? Will the data be evenly distributed to consumers? Or will the data be delivered to the consumer through some other rules?
Support distributed architecture
If we design message middleware, it will carry terabytes of data every day with high concurrency and high throughput write operations. Here, we need to consider designing the message middleware into a distributed architecture.
When designing a distributed architecture, we also need to consider storing relatively large data into shards, and sharding the data.
In addition to these, we also need to consider another core issue: for message middleware, it is necessary to support automatic expansion operations.
There is also whether data sharding is supported, and how to realize the expansion of data sharding and automatic data load balancing migration.
high availability of data
The high availability of general Internet applications is achieved through local heap memory, distributed cache, and a copy of a piece of data on different servers. At this point, the failure of any storage node will not affect the overall high availability. We can also refer to this idea when designing message middleware.
Message data is not lost
At this point, we need to provide a mechanism for manual ACK, that is, when the consumer actually finishes consuming the message, the message middleware returns the "processing completed" sign, and the message middleware deletes the corresponding processed message.
However, to refine, here, we need two sets of ACK mechanisms:
An ACK corresponds to the production side. If the ACK message has not been received, the producer needs to resend a message to ensure the success of the production message.
Another ACK corresponds to the consumer. Once a message is consumed and processed successfully, an ack must be returned to the message middleware, and then the message middleware can delete the message. Otherwise, once the consumer is down, the message must be resent to other consumer instances to ensure that the message will be processed successfully.