Technology selection and architecture design of a distributed IM instant messaging system

Mondo Technology Updated on 2024-01-30

This article was shared by Binghe and is written by the blog binghegitcode.host, the original question "How to write this distributed IM instant messaging system on a resume?".I've got it sorted out for you!This document has been revised and modified.

The distributed IM instant messaging system is essentially the management of online chats and users.

For the chat itself, the core requirements are:Send text, file, voice, message cache, message storage, message unread, read, recall, offline messages, historical messages, one-to-one chats, group chats, multi-terminal synchronization, and other needs.

For user management, the existing requirements include:Add friends, view friend list, delete friends, view friend information, create group chats, join group chats, view group member information, leave group chats, modify group nicknames, pull people**, kick people out of groups, dissolve group chats, fill in group announcements, modify group notes, and other user-related needs.

In order to better understand the design of the distributed IM instant messaging system, I stand in the perspective of the architect, after fully understanding the system requirements, business processes and technical processes, set the program objectives for the system from a global perspective, select the technical solutions, design the overall architecture and hierarchical architecture of the system, and sort out the interactive links for sending messages, single chats and group chats. Hope it helps.

Technical Communication:

- Introductory article on mobile IM development: "One article is enough for beginners: developing mobile IM from scratch".

Open source IM framework source code: alternate address click here).

(This article was also published at.)

Before the technology selection and overall architecture design, it is necessary to clarify one thing, that is, no matter which scheme the system adopts and which architecture design is adopted, the business objectives, technical objectives and architecture objectives of the scheme need to be clarified, and the overall performance of the system should be continuously evaluated in the research and development process, and the system bottlenecks should be found and continuously optimized.

In general, the distributed IM instant messaging system we build and develop needs to meet the following program objectives.

Specifically:

Business goal: to meet the various requirements scenarios in the requirements design chapter;

Technical goal: support unlimited expansion, millions of users at the same time

Architecture objectives: high concurrency, high performance, high availability, monitoring, early warning, scalability, and unlimited expansion.

In terms of technology selection, in addition to using basic frameworks such as springboot, containerization solutions will also be used.

At the same time, in order to reduce the technical threshold as much as possible, the technology selection of the entire distributed IM instant messaging system mainly adopts the more popular technical frameworks and solutions on the market.

The specific selection is as follows:

Development frameworks: SpringBoot, SpringCloud, SpringCloud Alibaba, and Dubbo

Cache: Redis distributed cache + gu**a local cache;

Databases: MySQL, TiDB, and HBase;

Traffic gateway: openresty+lua;

Service gateway: SpringCloud Gateway + Sentinel

Persistence layer frameworks: mybatis, mybatis-plus;

Service Configuration, Service Registration & Discovery: nacos;

Message middleware: rocketmq;

Network Communication: netty;

File storage: minio;

Log visualization governance: elk;

Containerized management: swarm, portainer;

Monitoring: Prometheus, Grafana;

Front-end: vue;

Unit test: junit;

Benchmark: jmh;

Stress test: jmeter.

For the IM instant messaging system, covering the instant messaging back-end service, large back-end platform, SDK access service, OpenAI access service, and large front-end UI, I believe that many small partners can more or less draw the architecture diagram of the IM instant messaging system, roughly as shown in the following figure.

In fact, this kind of architecture design is also more common, in this architecture design, Kong OpenResty NGINX only does load balancing and reverse **, R & D personnel pay more attention to the development of the business layer and the basic layer, the traffic is relatively small, this architecture design generally will not have any problems. However, once the traffic is relatively large, when the user calls the interface of the backend platform to send messages, the instant messaging SDK synchronously calls the interface of the instant messaging service will cause performance problems.

Because each terminal can only establish a connection with one IM instant messaging service instance at the same time, if a large number of user terminals happen to be connected to an IM instant messaging service, the instant messaging SDK will frequently call the interface of the same IM instant messaging service, and the performance bottleneck will occur. In this case, when a performance bottleneck occurs, it will not only affect the IM instant messaging service, but also affect the business of the backend platform to receive requests.

Since the architecture design shown in the diagram above has a performance bottleneck, how do we optimize it?

To this end, we have optimized the architecture based on the previous figure, and the optimized architecture is shown in the following figure.

Comparing the two figures, it can be seen that on the premise of shielding the technical implementation details, we will front-end the verification and traffic control of the business, and amplify the responsibilities of Kong OpenResty NGINX, so that these software not only have the functions of reverse ** and load balancing, but also realize the functions of rate limiting, blacklist and whitelist, traffic control, and service verification.

That is to say, in this architecture mode, we give full play to the entrance responsibility of the entire distributed IM instant messaging system, make full use of the high concurrency and high throughput capabilities of Kong OpenResty NGINX, and try to block most of the invalid requests from the entire system. For example, without logging in to the system, the user tries to call the interfaces such as sending messages, adding friends, adding ** groups, and so on. This will greatly reduce the business pressure on the back-end platform.

In addition to the implementation of rate limiting, blacklist and whitelist, traffic control, and service verification in Kong OpenResty NGINX, we also introduce service gateway clusters to implement functions such as rate limiting, degradation, circuit breaker, flow control, verification, and authentication to further ensure the stability and security of downstream systems.

In order to solve the performance problem caused by a large number of user terminals that happen to be connected to the same IM instant messaging service instance, the IM instant messaging SDK frequently calls the interfaces of the same IM instant messaging service instance. We have introduced a RocketMQ cluster between the IM Instant Messaging Service SDK and the IM Instant Messaging Service.

Each IM instance in an IM cluster has a unique ID in the cluster, and each IM instance listens only to the topics related to its ID in RocketMQ after it is started. In this way, each IM instant messaging service will only receive the messages in the topic related to its own ID, and will not receive all the messages.

When the user logs in to the system, a persistent connection will be established with the IM instant messaging service, and the user ID and terminal will be used as the key, and the IM instant messaging service ID will be used as the value, and it will be stored in the distributed cache. At the same time, the user ID and terminal will be used as the key, and the persistent connection between the user terminal and the IM instant messaging service will be used as the value, and the local memory of the IM instant messaging service will be stored in the local memory.

When the user calls the interface of the backend platform to send a message, the ID of the target user will be carried with it, and the terminal device logged in by the user will be specified in the IM instant messaging SDK, and finally the message will be sent to RocketMQ through the IM instant messaging SDK.

In this case, the IM SDK obtains the ID of the IM instant messaging service connected by the target user from the distributed cache based on the target user ID and the terminal, and sends a message to the topic related to this ID. In this case, the IM instant messaging service that establishes a persistent connection with the target user receives the message in RocketMQ, and then obtains the persistent connection established with the user terminal from the local cache based on the user ID and terminal, and pushes messages to the user based on the persistent connection.

In addition, in the actual implementation, in order to prevent a large number of users from connecting to only one service instance in the IM instant messaging service cluster at the same time, the user will do hash and modulo operations on the IP, browser fingerprint, mobile phone device, etc., so that it is evenly distributed to each service instance in the cluster as much as possible.

So the question is, is there room for further optimization of this architectural design?

In order to further enhance the performance, availability and elastic scalability of the distributed IM instant messaging system, we can design the containerized architecture of the distributed IM instant messaging system, as shown in the following figure.

It can be seen that we have further optimized the architecture design of the distributed IM instant messaging system and adopted the containerized architecture design. On the basis of the original architecture, we have made the following improvements and optimizations.

1) Basic support services:The basic support services will be implemented by various basic middleware, data storage services, and monitoring services, including: MySQL database, Tidb database, HBase, Redis cache, RocketMQ message queue, Prometheus monitoring, Portainer container management and other basic middleware implementation, and the basic support services will provide the most basic data, transmission, monitoring and container management services for the entire distributed IM instant messaging system.

2) Containerization:At the containerization level, it will be implemented through Docker, Swarm, and Portainer, where containerization will be managed based on Swarm and Portainer.

3) Other basic function implementation:In addition to the above-mentioned hierarchical architecture, for the construction of a distributed IM instant messaging system, it is also necessary to consider anomaly monitoring, service registration and discovery, visualization, service degradation and data recovery, service rate limiting, service disaster recovery, capacity planning and scaling, and full-link stress testing.

In the distributed IM instant messaging system, whether it is a large back-end platform or an IM instant messaging service, we will adopt a hierarchical business architecture for the business layer.

Here, we can learn from the hierarchical architecture idea of DDD, and divide it into four layers: display layer, application layer, domain layer and infrastructure layer.

However, considering the particularity of the distributed IM instant messaging system, it will not be designed in strict accordance with the principle of DDD, as shown in the figure below.

It can be seen that the distributed IM instant messaging system will draw on the design ideas of DDD, but it will not be completely designed in the way of DDD.

1) Display layer:The presentation layer, also known as the user UI layer, is the top layer of DDD design, which provides API interfaces, receives client requests, parses parameters, returns result data, and handles exceptions.

2) Application Layer:The application layer, also known as the application layer, mainly handles business scenarios that are prone to change, and can handle related events, scheduling, and other aggregation operations.

3) Domain Layer:The domain layer, also known as the domain layer, can be said to be the essence of DDD design, which is to abstract the relatively unchanged parts of the business system and encapsulate them into a domain model. In the design of distributed IM instant messaging system, the domain layer basically does not depend on other layers, nor does it depend on the infrastructure layer, which is different from DDD design.

4) Infrastructure Layer:The infrastructure layer, also known as the infrastructure layer, provides general basic capabilities for other layers, including caches, general utility classes, messages, and system persistence mechanisms in a distributed IM instant messaging system.

In the distributed IM instant messaging system, we ignore some other details and focus on the interactive link logic of sending messages. Whether it is a single chat or a group chat, the message needs to be pushed to the user's terminal through the IM instant messaging service. The process of sending a message at this time is shown in the following figure.

It can be seen that:When the user sends a message in the distributed IM instant messaging system, whether it is a single chat or a group chat, the final message will be pushed to the terminal device where the user logs in. Suppose that user A sends a message to user B, or user A and user B are in the same group, and user A sends a message to the group, the main process of receiving messages for user B is as follows:

Specifically:

User A calls the API of the backend platform to send a message to User B, and the message will contain the ID of User B and the terminal information

The back-end platform caches the messages and writes them asynchronously to the message library

The backend platform obtains the ID of the IM instant messaging service connected by user B from REDIS

After the backend platform obtains the ID of the IM messaging service connected to user B, it sends a message to the topic corresponding to the ID of the IM messaging service connected to user B in RocketMQ

The IM instant messaging service listens for the messages of the topic in RocketMQ corresponding to its service ID, and the IM instant messaging service connected to user B receives the message

After receiving the message, the IM instant messaging service will obtain the connection between user B and the IM instant messaging service from the cache according to the ID of user B and the terminal information, and push the message to user B through this connection.

In order to implement the process of sending a message as above, the following conditions must be met:

The back-end platform meets the conditions of distribution and can be horizontally scaled at any time

The IM instant messaging service meets the conditions of distribution and can be scaled horizontally at any time

Each launched IM instant messaging service instance has a unique ID in the cluster

Each IM service listens only to the messages of the topic in RocketMQ corresponding to its ID

After the user logs in to the distributed IM instant messaging system, a persistent connection will be established with the IM instant messaging service, and the persistent connection will be cached according to the user ID and the terminal where the user is located, and the ID of the connected IM instant messaging service will be cached to Redis according to the user ID and the terminal where it is located

When a user sends a message, the IM instant messaging service ID is obtained from Redis based on the target user's ID and terminal, and then sends a message to the RocketMQ topic corresponding to the current IM instant messaging service ID

After the corresponding IM instant messaging service listens to and receives the RocketMQ message, it will obtain the user's connection information from the cache based on the user's ID and terminal, and push the message to the user.

One-to-one chat is a one-to-one chat between a user and another user in a distributed IM instant messaging system. In this scenario, it is very likely that among the two users who chat alone, the user is not **.

For example:When User A sends a message to User B, User B may not **.

At this point, we need to store the messages sent by user A to user B.

In fact, in the distributed IM instant messaging system we realized, no matter whether user B is ** or not, message records will be stored. When user B logs in to the system, the message is synchronized to user B, as shown in the following figure.

As you can see, when User A sends a message to User B:

If user b**, you can send a message to user b according to the interactive link that sent the message

If user B does not **, messages cannot be pushed to user B normally. When user B logs in to the distributed IM instant messaging system, it will call the interface of the back-end platform to pull all unread messages, and push messages to user B through the user B ** process.

Group chat is a distributed IM instant messaging system in which multiple users chat in the same group.

At this time, when sending a message, we can find out all the ** users in the group through the group ID, and send the message to the ** users instantly.

Those users who have not ** will be treated as users who have not been ** in a single chat, as shown in the following figure.

As you can see, the interactive link flow of a group chat is as follows:

The user calls the interface of the backend platform to send a message to the group

The back-end platform caches the messages and writes them to the message library asynchronously

Since you are sending a message to a group and there are multiple users in the group, you will get a list of the IM instant messaging service IDs connected to all users from Redis

The users are grouped according to the service ID, and the users under the same service ID are grouped in the same logical group, which is convenient for subsequent push messages, and the list of users who have not ** will be recorded

Messages are sent in a loop to the topic in RocketMQ corresponding to each service ID

The broadcast handles the unread message ID of the user who did not **

The IM instant messaging service listens to the topic corresponding to its own service ID and receives messages pushed to its own service at any time

When the IM instant messaging service receives the message, the user is disconnected, or the user does not **, the push message to the user will fail, or the connection between the user and the IM instant messaging service will not be queried, and the message will not be pushed to the user

When the user logs in to the distributed IM instant messaging system, it will pull historical (offline) messages from the back-end platform and push messages to the user through the user's process

Well, seeing this, do you understand how to design a highly scalable distributed IM instant messaging system?

1] *Architectural design of the IM system.

2] Briefly describe the pitfalls of mobile IM development: architecture design, communication protocol and client.

3] A set of mobile IM architecture design practice sharing (including details**) of a large number of users

4] A set of original distributed instant messaging (IM) system theoretical architecture scheme.

5] How to ensure the efficiency and real-time efficiency of large-scale group message push in mobile IM?

6] A set of IM architecture technology for 100 million users (Part I): overall architecture, service splitting, etc.

7] A set of IM architecture technology dry goods for 100 million users (Part II): reliability, orderliness, weak network optimization, etc.

8] From novice to expert: How to design a distributed IM system with hundreds of millions of messages.

9] WeCom's IM architecture design revealed: message model, 10,000 people, read receipts, message withdrawal, etc.

10] Rongyun Technology Sharing: Comprehensively unveiling the reliable delivery mechanism of billion-level IM messages.

11] Ali IM Technology Sharing (3): The Architectural Evolution of Xianyu's 100-million-level IM Messaging System.

12] Practice-based: A summary of the technical points of a small-scale IM system with a million message volume.

13] Learn IM from source code (10): Build a high-performance IM cluster (including technical ideas + source code) based on Netty

14] Architectural practice and thinking of a set of 100,000-level TPS IM integrated messaging system.

15] The technical practice of self-developed customer service IM system from 0 to 1.

16] Architectural design and practice of IM chat rooms for massive users.

17] The most popular netty introductory long article in history: basic introduction, environment construction, hands-on practice.

18] Beginner's Introduction: By far the most thorough analysis of Netty's high-performance principles and framework architecture.

19] For beginners: Xi methods and advanced strategies for the high-performance NIO framework Netty.

20] Teach you how to use Netty to implement the heartbeat mechanism and disconnection reconnection mechanism of the network communication program.

21] The Strongest J**a Nio in History: If you're worried about getting started and giving up, read this!

(This article was also published at.)

Related Pages