I hacked infinite retention into my open source Kafka

Well, sort of. But bear with me.

Background

A couple of days ago, Confluent announced a ZooKeeper free Kafka 2.8 RC0 available for testing. A fantastic effort, great achievement by all the contributors who made it happen.

In the typical Hacker News fashion, a post about Kafka always triggers an inevitable “Puslar vs Kafka” discussion. These always remind me of one of my main gripes related to Kafka: no infinite retention. I’ve written about it close to five years ago.

So, apparently there is a KIP for open source Kafka tiered storage which would enable this kind of behavior and take it even further. There is definitely a paid feature on the Confluent platform enabling tiered storage.

However, as a user of the open source Kafka, I can’t use it.

After 5 years of waiting

I kinda hacked it myself in. Here’s how I’ve done it.

I have forked Kafka from https://github.com/apache/kafka and checked out the 2.8 branch, the one with no ZooKeeper. To be exact, the 08849bc3909d4fabda965c8ca7f78b0feb5473d2 commit.

I then applied this diff:

Build Kafka from sources

./gradlewAll releaseTarGz

Now, I can start my new Kafka like this:

Now, every time Kafka is about to delete a log segment, it will put it in S3 first. Only the log files are stored because there is no need to have an index. Whenever I need data from an older segment, I can download the segment from S3, rebuild the index and read out all data from the segment.

I can process dozens of segments in parallel, regardless of the total partition size.

Quick and dirty test method

Create a topic with a rather small segment:

Write some data to the topic with a tool of choice. I can see new segments rolling in:

Finally, delete the topic from Kafka:

After about 60 seconds, the segments are uploaded to S3 and deleted from disk:

The upload would take less time if I was running my Kafka in EC2. This method works for deleted topics, regular segment rolls and also compacted topics. Basically, every time a segment is about to be removed from disk.

Managing director / writing software for 20+ years

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store