Creating a kafka cluster with docker

Posted by Jose Estudillo on August 27, 2017

Updated version of this article

Creating a kafka cluster with docker

A local Kafka cluster will help to develop producer and consumers, allowing to test different scenarios specially when it comes to HA. In this article I’ll show how to create a kafka cluster using docker and docker-compose.

Defining kafka image: Dockerfile

For the creation of the docker image we will use the version 0.11.0.0 as specified in the code below in the environment var KAFKA_BIN.

  • Dockerfile:
FROM centos:7
MAINTAINER jose@estudillo.me

ENV KAFKA_BIN=http://www-eu.apache.org/dist/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz

RUN yum install -y wget java-1.8.0-openjdk \
    && cd /tmp && wget -q $KAFKA_BIN \
    && export K_TAR=/tmp/$(ls kafka* | head -1) \
    && mkdir -p /opt/apache/kafka/ && tar -zxf $K_TAR -C /opt/apache/kafka/ \
    && cd /opt/apache/kafka && ln -s $(ls) current \
    && rm -rf $K_TAR

ENV KAFKA_HOME /opt/apache/kafka/current
ENV PATH $PATH:$KAFKA_HOME/bin

ADD resources /home/kafka

RUN groupadd -r kafka \
    && useradd -r -g kafka kafka \
    && mkdir -p /home/kafka \
    && chown -R kafka:kafka /home/kafka \
    && chmod -R +x /home/kafka/scripts \
    && mkdir -p /var/log/kafka \
    && chown -R kafka:kafka /var/log/kafka \
    && mkdir -p /etc/kafka \
    && chown -R kafka:kafka /etc/kafka

USER kafka

CMD /home/kafka/scripts/run.sh

The Dockerimage will download the Kafka binaries and place then into /opt/apache/kafka and link the current version to /opt/apache/kafka/current, after that it add it into the PATH and create the required directories that need to be owned by the kafka user.

In order to configure and start the broker properly, CMD needs to call a bash script that must be added to the image ADD resources /home/kafka, putting all the content from the local directory resources in the image directory /home/kafka.

  • resources/scripts/run.sh
#!/bin/bash

cat <<EOF > /etc/kafka/broker.properties
broker.id=$(hostname | sed "s/[^0-9]*//g")
port=9092
log.dir=/var/log/kafka
zookeeper.connect=${KAFKA_ZOOKEEPER_HOST:=zookeeper:2181}
host.name=$(hostname)
advertised.host.name=$(hostname)
advertised.host.port=9092
EOF

kafka-server-start.sh /etc/kafka/broker.properties

For this to work we require:

  • Every container created from this image must contain a number in the hostname, so it can be used as broker.id.
  • Kafka requires Zookeeper to be able to work, the script will assume that the hostname for this is zookeeper, but this value can be also specified in the env var KAFKA_ZOOKEEPER_HOST (i.c. `zoo1:2181,zoo2:2181,zoo3:2181)

Creating the kafka image from the Dockerfile

In this example I use my own namespace (joseestudillo) but this is not required and can be omitted. I also add the version latest to follow docker image naming standards.

docker build --tag="joseestudillo/kafka:0.0.1" .
docker tag joseestudillo/kafka:0.0.1 joseestudillo/kafka:latest

Creating the cluster: docker compose

Having the kafka image in our system we are ready to create a cluster. For Zookeeper we will use the official image from docker hub. In the docker compose example I’ll create a 3 nodes kafka cluster, but any number of nodes can be added by adding a new entry and changing the name/hostname.

  • docker-compose.yml
version: '2.1'
services:
  zookeeper:
    hostname: zookeeper
    image: zookeeper:latest
    environment:
      ZOO_MY_ID: 1
    restart: always
    healthcheck:
        test: ["CMD", "zkServer.sh", "status"]

  kafka1:
    image: joseestudillo/kafka:latest
    hostname: kafka1
    environment:
      KAFKA_ZOOKEEPER_HOST: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      zookeeper:
        condition: service_healthy

  kafka2:
    image: joseestudillo/kafka:latest
    hostname: kafka2
    environment:
      KAFKA_ZOOKEEPER_HOST: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      - kafka1

  kafka3:
    image: joseestudillo/kafka:latest
    hostname: kafka3
    environment:
      KAFKA_ZOOKEEPER_HOST: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      - kafka1

In they YAML, I’ve defined a single node zookeeper (examples with a higher number of nodes can be found in docker hub). Notice that Zookeeper must be running to be able to create a kafka cluster, to guarantee this, we will make the first Kafka node (kafka1) depend on the Zookeeper container, then the rest of the Kafka nodes will depend of the first one that we will use as advertised host.

Running the cluster

Once the kafka image is on the system (check with docker images), we can launch the cluster using: docker-compose up that will show the logs from all the running containers. The status of the containers can be checked with docker-compose ps that as the rest of the docker compose commands, but be run from the directory where docker-compose.yml is placed, or point explicitly to it.

The cluster then can be operated from any of its nodes: docker-compose exec <container name in the docker compose file> bash