Apache Kafka and RabbitMQ are two widely used messaging brokers that allow the decoupling of the exchange of messages between applications. What are their most important characteristics, and what makes them different from each other? Let’s get to the concepts.
RabbitMQ is an open-source message-broker application for communication and message exchange between parties. Because it was developed in Erlang, it is very light and efficient. The Erlang language was developed by Ericson with a focus on distributed systems.
It is considered a more traditional messaging broker. It is based on the publisher-subscriber pattern, although it can treat communication synchronously or asynchronously, depending on what is set in the configuration. It also ensures the delivery and ordering of messages between producers and consumers.
It supports AMQP, STOMP, MQTT, HTTP, and web socket protocols. Three models for the exchange of messages: topic, fanout, and direct:
Direct and individual exchange by topic or theme [topic]
All consumers connected to the queue receive the [fanout] message
Each consumer receives a message sent [direct]
The following are the components of RabbitMQ:
Producers are applications that create and send messages to RabbitMQ. They can be any application that can connect to RabbitMQ and publish messages.
The consumers are applications that receive and process messages from RabbitMQ. They can be any application that can connect to RabbitMQ and subscribe to messages.
Exchanges are responsible for receiving messages from producers and routing them to the appropriate queues. There are several types of exchanges, including direct, fanout, topic, and headers exchanges, each with its own routing rules.
Queues are where messages are stored until they are consumed by consumers. They are created by applications or automatically by RabbitMQ when a message is published to an exchange.
Bindings define the relationship between exchanges and queues. They specify the routing rules for messages, which are used by exchanges to route messages to the appropriate queues.
Architecture of RabbitMQ
RabbitMQ uses a pull model for message delivery. In this model, consumers actively request the broker’s messages. Messages are published to exchanges responsible for routing messages to the appropriate queues based on routing keys.
The architecture of RabbitMQ is based on a client-server architecture and consists of several components that work together to provide a reliable and scalable messaging platform. The AMQP concept provides for the components Exchanges, Queues, Bindings, as well as Publishers and Subscribers. Publishers publish messages to exchanges.
Exchanges take these messages and distribute them to 0 to n queues based on certain rules (bindings). The messages stored in the queues can then be retrieved by consumers. In a simplified form, message management is done in RabbitMQ as follows:
Publishers send messages to exchange;
Exchange sends messages to queues and other exchanges;
When a message is received, RabbitMQ sends acknowledgments to senders;
Consumers maintain persistent TCP connections to RabbitMQ and declare which queue they are receiving;
RabbitMQ routes messages to consumers;
Consumers send success or error acknowledgments of receiving the message;
Upon successful receipt, the message is removed from the queue.
Apache Kafka is a distributed open-source messaging solution developed by LinkedIn in Scala. It is capable of processing messages and storing them with a publisher-subscriber model with high scalability and performance.
To store the events or messages received, distribute the topics among the nodes using partitions. It combines both publisher-subscriber and message queue patterns, and It is also responsible for ensuring the order of messages for each consumer.
Kafka specializes in high data throughput and low latency to handle real-time data streams. This is achieved by avoiding too much logic on the server (broker) side, as well as some special implementation details.
For example, Kafka does not use RAM at all and writes data immediately to the server’s file system. Since all data is written sequentially, read-write performance is achieved, which is comparable to that of RAM.
These are the main concepts of Kafka that make it scalable, performant, and fault-tolerant:
A topic is a way of labeling or categorizing a message; imagine a closet with 10 drawers; each drawer can be a topic, and the closet is the Apache Kafka platform, so in addition to categorizing it groups messages, another better analogy about the topic would be tabled in relational databases.
The producer or producer is the one who connects to a messaging platform and sends one or more messages on a specific topic.
The consumer is the person who connects to a messaging platform and consumes one or more messages on a specific topic.
The concept of a broker in the Kafka platform is nothing more than practically Kafka itself, and he is the one who manages the topics and defines the way of storing messages, logs, etc.
The cluster is a set of Brokers that communicate with each other or not for better scalability and fault tolerance.
Each topic stores its records in a log format, that is, in a structured and sequential way; the log file, therefore, is the file that contains the information of a topic.
The partitions are the partition layer of messages within a topic; this partitioning ensures the elasticity, fault tolerance, and scalability of Apache Kafka so that each topic can have multiple partitions in different locations.
Architecture of Apache Kafka
Kafka is based on a push model for message delivery. Using this model, messages in Kafka are actively pushed to consumers. Messages are published to topics, which are partitioned and distributed across different brokers in the cluster.
Consumers can then subscribe to one or more topics and receive messages as they are produced on those topics.
In Kafka, each topic is divided into one or more partitions. It is in the partition that the events end up.
If there is more than one broker in the cluster, then the partitions will be distributed evenly across all brokers (as far as possible), which will allow scaling the load on writing and reading in one topic to several brokers at once. As it is a cluster, it runs using ZooKeeper for synchronization.
It receives stores, and distributes records. A record is data generated by some system node, which can be an event or information. It is sent to the cluster, and the cluster stores it in a topic partition.
Each record has a sequence offset, and the consumer can control the offset it is consuming. Thus, if there is a need to reprocess the topic, it can be done based on the offset.
Logic, such as the management of the last read message ID of a consumer or the decision as to which partition newly arriving data is written to, is completely shifted to the client (producer or consumer).
In addition to the concepts of producer and consumer, there are also the concepts of topic, partition, and replication.
A topic describes a category of messages. Kafka achieves fault tolerance by replicating the data in a topic and scaling by partitioning the topic across multiple servers.
RabbitMQ vs. Kafka
The main differences between Apache Kafka and RabbitMQ are due to fundamentally different message delivery models implemented in these systems.
In particular, Apache Kafka operates on the principle of pulling (pull) when consumers themselves get the messages they need from the topic.
RabbitMQ, on the other hand, implements the push model by sending the necessary messages to the recipients. As such, Kafka differs from RabbitMQ in the following ways:
One of the biggest differences between RabbitMQ and Kafka is the difference in the architecture. RabbitMQ uses a traditional broker-based message queue architecture, while Kafka uses a distributed streaming platform architecture.
Also, RabbitMQ uses a pull-based message delivery model, while Kafka uses a push-based model.
#2. Saving Messages
RabbitMQ puts the message in the FIFO queue (First Input – First Output) and monitors the status of this message in the queue, and Kafka adds the message to the log (writes to disk), leaving the receiver to take care of obtaining the necessary information from the topic.
RabbitMQ deletes the message after it has been delivered to the recipient, while Kafka stores the message until it is scheduled to clean up the log.
Thus, Kafka saves the current and all previous system states and can be used as a reliable source of historical data, unlike RabbitMQ.
#3. Load Balancing
Thanks to the pull model of message delivery, RabbitMQ reduces latency. However, it is possible for recipients to overflow if messages arrive at the queue faster than they can process them.
Since in RabbitMQ, each receiver requests/uploads a different number of messages, the distribution of work can become uneven, which will cause delays and loss of message order during processing.
To prevent this, each RabbitMQ receiver configures a prefetch limit, a limit on the number of accumulated unacknowledged messages. In Kafka, load balancing is performed automatically by redistributing recipients across sections (partition) of the topic.
RabbitMQ includes four ways to route to different exchanges for queuing, allowing for a powerful and flexible set of messaging patterns. Kafka only implements one way to write messages to disk without routing.
#5. Message ordering
RabbitMQ allows you to maintain relative order in arbitrary sets (groups) of events, and Apache Kafka provides an easy way to maintain ordering with scalability by writing messages sequentially to a replicated log (topic).
Saves messages on a disk attached to the broker
Distributed streaming platform architecture
Cannot save messages
Maintains orders by writing to a topic
Configures a prefetch limit
Includes 4 ways to route
Has only 1 way to route messages
Allows to maintain order in groups
Maintains orders by writing to topic
Does not require
Requires running Zookeeper instance
Has limited plugin support
RabbitMQ and Kafka are both widely used messaging systems, each with its own strengths and use cases. RabbitMQ is a flexible, reliable, and scalable messaging system that excels at message queuing, making it an ideal choice for applications that require reliable and flexible message delivery.
On the other hand, Kafka is a distributed streaming platform that is designed for high-throughput, real-time processing of large volumes of data, making it a great choice for applications that require real-time processing and analysis of data.
Main Use Cases for RabbitMQ:
RabbitMQ is used in e-commerce applications to manage the data flow between different systems, such as inventory management, order processing, and payment processing. It can handle high volumes of messages and ensure that they are delivered reliably and in the correct order.
In the healthcare industry, RabbitMQ is used to exchange data between different systems, such as electronic health records (EHRs), medical devices, and clinical decision support systems. It can help improve patient care and reduce errors by ensuring the right information is available at the right time.
RabbitMQ enables real-time messaging between systems, such as trading platforms, risk management systems, and payment gateways. It can help ensure that transactions are processed quickly and securely.
RabbitMQ is used in IoT systems to manage data flow between different devices and sensors. It can help ensure data is delivered securely and efficiently, even in environments with limited bandwidth and intermittent connectivity.
Kafka is a distributed streaming platform designed to handle large volumes of data in real-time.
Main Use Cases for Kafka
Kafka is used in real-time analytics applications to process and analyze data as it is generated, enabling businesses to make decisions based on up-to-date information. It can handle large volumes of data and scale to meet the needs of even the most demanding applications.
Kafka can aggregate logs from different systems and applications, enabling businesses to monitor and troubleshoot real-time issues. It can also be used to store logs for long-term analysis and reporting.
Kafka is used in machine learning applications to stream data to models in real-time, enabling businesses to make predictions and take action based on up-to-date information. It can help improve the accuracy and effectiveness of machine learning models.
My opinion on both RabbitMQ and Kafka
The downside of RabbitMQ’s wide and varied capabilities for flexible management of message queues is increased resource consumption and, accordingly, performance degradation under increased loads. Since this is the mode of operation for complex systems, in most cases, Apache Kafka is the best tool for managing messages.
For example, in the case of collecting and aggregating many events from dozens of systems and services, taking into account their geo-reservation, client metrics, log files, and analytics, with the prospect of increasing information sources, I will prefer using Kafka, however, if you are in a situation where you just need fast messaging RabbitMQ will do the job just fine!