Creating a scalable kafka cluster with docker

Posted by Jose Estudillo on January 15, 2018

Creating a scalable kafka cluster with docker

The auto broker id features of Kafka combined with Docker compose scaling capabilities allow to create a cluster with N number of nodes with very little effort.

The Kafka docker image

We will need an image that can start a broker and use zookeeper to form the cluster.

  • Dockerfile for the kafka brokers:
FROM centos:7
MAINTAINER jose@estudillo.me

ENV K_SRC=http://apache.mirror.anlx.net/kafka/1.0.1/kafka_2.11-1.0.1.tgz

RUN yum install -y wget java-1.8.0-openjdk \
    && cd /tmp && wget -q $K_SRC \ 
    && export K_TAR=/tmp/$(ls kafka* | head -1) \
    && mkdir -p /opt/apache/kafka/ && tar -zxf $K_TAR -C /opt/apache/kafka/ \
    && cd /opt/apache/kafka && ln -s $(ls) current \
    && rm -rf $K_TAR

ENV KAFKA_HOME /opt/apache/kafka/current
ENV PATH $PATH:$KAFKA_HOME/bin

ADD resources /home/kafka

RUN groupadd -r kafka \
    && useradd -r -g kafka kafka \
    && mkdir -p /home/kafka \
    && chown -R kafka:kafka /home/kafka \
    && chmod -R +x /home/kafka/bin \
    && mkdir -p /var/log/kafka \
    && chown -R kafka:kafka /var/log/kafka

USER kafka

CMD /home/kafka/bin/run.sh

Where /home/kafka/bin/run.sh allow to define the configuration of the broker dynamically and start it:

  • resources/bin/run.sh:
#!/bin/bash
cat <<EOF > /home/kafka/broker.properties
broker.id.generation.enable = true
port=9092
log.dir=/var/log/kafka
zookeeper.connect=${KAFKA_ZOOKEEPER_HOST:=zookeeper:2181}
host.name=$(hostname)
advertised.host.name=$(hostname)
advertised.host.port=9092
EOF

kafka-server-start.sh /home/kafka/broker.properties

Kafka auto broker id generation (broker.id.generation.enable = true) makes scaling the cluster easier (in older versions the id had to be setup for each broker), this will use zookeeper to keep all the brokers that join the cluster with unique ids.

Notice that this or any other file must be located in the resources (at the same level of the Dockerfile), as in the image definition map this directory to /home/kafka (ADD resources /home/kafka).

Before running docker-compose we will need images in the docker local repo, the images can be generated using:

docker build --tag="joseestudillo/kafka:0.0.1" .
docker tag joseestudillo/kafka:0.0.1 joseestudillo/kafka:latest

Scaling with docker-compose

In the docker compose definition we will only need to entries, one for the zookeeper server, that is not available for simplicity, and one for the kafka brokers that will use KAFKA_ZOOKEEPER_HOST to join the kafka cluster.

version: '2.1'
services:
  zookeeper:
    hostname: zookeeper
    image: zookeeper:latest
    environment:
      ZOO_MY_ID: 1
    restart: always
    healthcheck:
        test: ["CMD", "zkServer.sh", "status"]

  kafka_broker:
    image: joseestudillo/kafka:latest
    environment:
      KAFKA_ZOOKEEPER_HOST: zookeeper:2181/kafka-docker-compose
      KAFKA_LOG_DIR: /var/log/kafka
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      zookeeper:
        condition: service_healthy

With this file defined, we just need to call docker-compose up --scale kafka_broker=5 and it will start a kafka cluster with 5 brokers.