Introducing Apache Kafka

What is Apache Kafka?

The Apache Kafka project page defines Apache Kafka as a publish-subscribe messaging rethought as a distributed commit log. Kafka is a high-throughput distributed messaging system.

The Kafka project page is probably the best source of information for anyone interested in Kafka and its inner workings. However, the above definitions would be more helpful to readers who already understand what a commit log is, how messaging systems work and has a basic idea of distributed system.

In this post, my attempt would be to explain these concepts in a simple way and make it understandable to anyone interested the basic architecture and workings of Apache Kafka. So, lets get started.

What is a message?

A message in its literal sense would be some information that needs to go from a source to a destination. When this information is sent from the source and is received by the destination we state that the message has been delivered. For e.g. let us consider you and your friend use a long hollow pipe to talk to each other. When your friend speaks into it from one end, you would keep your ears to the other end to hear what he just said. In this way you will receive the messages he would have to convey.

Image may be NSFW.
Clik here to view.

However, one thing to note here is that you and your friend should synchronize the activity of speaking and hearing, i.e. if your friend is speaking into the pipe, you at the other end should be ready to listen. If you are not ready, the message is lost.

What is a message in the context of a messaging system?

In its simplest form a message is nothing but data. A system that can handle transmission of data from a source to a destination is called a messaging system. In the computing world, the source could be machine that is generating data (for e.g. a Web Server generates large number of logs) or a human user. In most cases you will see the volumes generated by systems and machines to be way larger than the ones generated by human beings.

Image may be NSFW.
Clik here to view.

In a messaging system you would want to ensure that the message is delivered to the target. Even if the target is not ready to receive it, there has to be a provision to hold it. To achieve this, the messaging system should provide a way of retaining the message in the medium of communication until the target is ready to receive it.

The messaging system, with the ability to send, hold and deliver messages seems to be a good system. However, in reality, the scenarios could be such that the messages may have to be delivered to more than one destination. This introduces the need to hold the messages until it is consumed by all the destinations. This brings forward one more characteristic of a messaging system, i.e. the source is only going to deliver the messages to the communication channel/medium (think of the pipe) and the destination will be responsible for reading the message from the medium. So now, the messages can wait on the channel and the different destinations will read the messages at their own pace.

As there could be many destinations, there could also be several sources that write to the medium.

Image may be NSFW.
Clik here to view.

With this understanding, lets give some technical labels to the different components in a messaging system.

Publisher

The source or the different sources create messages. Lets call them publishers.

Subscriber

The destinations/targets are the one who read the message, in other words, they subscribe to the messages generated by the publishers. Lets call them subscribers.

There is one more component that we still need to mention. So far, we have been referring to it as the channel/medium to which publishers write data and from where subscribers read data. That is not an entirely correct way of referring to it. A more appropriate way to call it would be to refer to it as – the log. Why log?

Logs, in a general sense stand for a record of “what happened”. In our context the message can be classified as an event and the log is what stores that event. Publishers publish events to the log and the subscribers read these events from the log to know “what happened”. Following is a simple illustration of how a log appears.

Image may be NSFW.
Clik here to view.

As you can see above, the log is a sequence of numbered messages that are append only and ordered by time. The log, in the database world, is often referred to as a commit log. To understand how logs work and understand it in-depth is not the intention of this post. However, if you are interested in knowing more about logs, you should read the following blog post by Jay Kreps:

http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

This is one of the best technical posts I have read. It is like a mini-course on distributed systems and is extremely well written. You should also check out his video here – http://youtu.be/aJuo_bLSW6s

In this post we have discussed the following:

Messaging system
Publishers
Subscribers
Commit Log

In the next post, we will get a bit deeper into understanding the above terms with respect to Apache Kafka and how they all work together. We will also see how Apache Kafka maintains a distributed commit log across a cluster making it a reliable distributed messaging system.

Introducing Apache Kafka – Part One

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112