Updated version of this article
Creating a kafka cluster with docker
A local Kafka cluster will help to develop producer and consumers, allowing to test different scenarios specially when it comes to HA. In this article I’ll show how to create a kafka cluster using docker and docker-compose.
Defining kafka image: Dockerfile
For the creation of the docker image we will use the version 0.11.0.0 as specified in the code below in the environment var KAFKA_BIN
.
Dockerfile
:
FROM centos:7
MAINTAINER jose@estudillo.me
ENV KAFKA_BIN=http://www-eu.apache.org/dist/kafka/0.11.0.0/kafka_2.11-0.11.0.0.tgz
RUN yum install -y wget java-1.8.0-openjdk \
&& cd /tmp && wget -q $KAFKA_BIN \
&& export K_TAR=/tmp/$(ls kafka* | head -1) \
&& mkdir -p /opt/apache/kafka/ && tar -zxf $K_TAR -C /opt/apache/kafka/ \
&& cd /opt/apache/kafka && ln -s $(ls) current \
&& rm -rf $K_TAR
ENV KAFKA_HOME /opt/apache/kafka/current
ENV PATH $PATH:$KAFKA_HOME/bin
ADD resources /home/kafka
RUN groupadd -r kafka \
&& useradd -r -g kafka kafka \
&& mkdir -p /home/kafka \
&& chown -R kafka:kafka /home/kafka \
&& chmod -R +x /home/kafka/scripts \
&& mkdir -p /var/log/kafka \
&& chown -R kafka:kafka /var/log/kafka \
&& mkdir -p /etc/kafka \
&& chown -R kafka:kafka /etc/kafka
USER kafka
CMD /home/kafka/scripts/run.sh
The Dockerimage
will download the Kafka binaries and place then into /opt/apache/kafka
and link the current version to /opt/apache/kafka/current
, after that it add it into the PATH
and create the required directories that need to be owned by the kafka
user.
In order to configure and start the broker properly, CMD
needs to call a bash script that must be added to the image ADD resources /home/kafka
, putting all the content from the local directory resources
in the image directory /home/kafka
.
resources/scripts/run.sh
#!/bin/bash
cat <<EOF > /etc/kafka/broker.properties
broker.id=$(hostname | sed "s/[^0-9]*//g")
port=9092
log.dir=/var/log/kafka
zookeeper.connect=${KAFKA_ZOOKEEPER_HOST:=zookeeper:2181}
host.name=$(hostname)
advertised.host.name=$(hostname)
advertised.host.port=9092
EOF
kafka-server-start.sh /etc/kafka/broker.properties
For this to work we require:
- Every container created from this image must contain a number in the hostname, so it can be used as
broker.id
. - Kafka requires Zookeeper to be able to work, the script will assume that the hostname for this is
zookeeper
, but this value can be also specified in the env varKAFKA_ZOOKEEPER_HOST
(i.c. `zoo1:2181,zoo2:2181,zoo3:2181)
Creating the kafka image from the Dockerfile
In this example I use my own namespace (joseestudillo
) but this is not required and can be omitted. I also add the version latest
to follow docker image naming standards.
docker build --tag="joseestudillo/kafka:0.0.1" .
docker tag joseestudillo/kafka:0.0.1 joseestudillo/kafka:latest
Creating the cluster: docker compose
Having the kafka image in our system we are ready to create a cluster. For Zookeeper we will use the official image from docker hub. In the docker compose example I’ll create a 3 nodes kafka cluster, but any number of nodes can be added by adding a new entry and changing the name/hostname.
docker-compose.yml
version: '2.1'
services:
zookeeper:
hostname: zookeeper
image: zookeeper:latest
environment:
ZOO_MY_ID: 1
restart: always
healthcheck:
test: ["CMD", "zkServer.sh", "status"]
kafka1:
image: joseestudillo/kafka:latest
hostname: kafka1
environment:
KAFKA_ZOOKEEPER_HOST: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
zookeeper:
condition: service_healthy
kafka2:
image: joseestudillo/kafka:latest
hostname: kafka2
environment:
KAFKA_ZOOKEEPER_HOST: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- kafka1
kafka3:
image: joseestudillo/kafka:latest
hostname: kafka3
environment:
KAFKA_ZOOKEEPER_HOST: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- kafka1
In they YAML, I’ve defined a single node zookeeper (examples with a higher number of nodes can be found in docker hub). Notice that Zookeeper must be running to be able to create a kafka cluster, to guarantee this, we will make the first Kafka node (kafka1
) depend on the Zookeeper container, then the rest of the Kafka nodes will depend of the first one that we will use as advertised host.
Running the cluster
Once the kafka image is on the system (check with docker images
), we can launch the cluster using: docker-compose up
that will show the logs from all the running containers. The status of the containers can be checked with docker-compose ps
that as the rest of the docker compose commands, but be run from the directory where docker-compose.yml
is placed, or point explicitly to it.
The cluster then can be operated from any of its nodes: docker-compose exec <container name in the docker compose file> bash