Creating a scalable kafka cluster with docker
The auto broker id features of Kafka combined with Docker compose scaling capabilities allow to create a cluster with N number of nodes with very little effort.
The Kafka docker image
We will need an image that can start a broker and use zookeeper to form the cluster.
- Dockerfile for the kafka brokers:
FROM centos:7
MAINTAINER jose@estudillo.me
ENV K_SRC=http://apache.mirror.anlx.net/kafka/1.0.1/kafka_2.11-1.0.1.tgz
RUN yum install -y wget java-1.8.0-openjdk \
&& cd /tmp && wget -q $K_SRC \
&& export K_TAR=/tmp/$(ls kafka* | head -1) \
&& mkdir -p /opt/apache/kafka/ && tar -zxf $K_TAR -C /opt/apache/kafka/ \
&& cd /opt/apache/kafka && ln -s $(ls) current \
&& rm -rf $K_TAR
ENV KAFKA_HOME /opt/apache/kafka/current
ENV PATH $PATH:$KAFKA_HOME/bin
ADD resources /home/kafka
RUN groupadd -r kafka \
&& useradd -r -g kafka kafka \
&& mkdir -p /home/kafka \
&& chown -R kafka:kafka /home/kafka \
&& chmod -R +x /home/kafka/bin \
&& mkdir -p /var/log/kafka \
&& chown -R kafka:kafka /var/log/kafka
USER kafka
CMD /home/kafka/bin/run.sh
Where /home/kafka/bin/run.sh
allow to define the configuration of the broker dynamically and start it:
resources/bin/run.sh
:
#!/bin/bash
cat <<EOF > /home/kafka/broker.properties
broker.id.generation.enable = true
port=9092
log.dir=/var/log/kafka
zookeeper.connect=${KAFKA_ZOOKEEPER_HOST:=zookeeper:2181}
host.name=$(hostname)
advertised.host.name=$(hostname)
advertised.host.port=9092
EOF
kafka-server-start.sh /home/kafka/broker.properties
Kafka auto broker id generation (broker.id.generation.enable = true
) makes scaling the cluster easier (in older versions the id had to be setup for each broker), this will use zookeeper to keep all the brokers that join the cluster with unique ids.
Notice that this or any other file must be located in the resources
(at the same level of the Dockerfile
), as in the image definition map this directory to /home/kafka
(ADD resources /home/kafka
).
Before running docker-compose we will need images in the docker local repo, the images can be generated using:
docker build --tag="joseestudillo/kafka:0.0.1" .
docker tag joseestudillo/kafka:0.0.1 joseestudillo/kafka:latest
Scaling with docker-compose
In the docker compose definition we will only need to entries, one for the zookeeper server, that is not available for simplicity, and one for the kafka brokers that will use KAFKA_ZOOKEEPER_HOST
to join the kafka cluster.
version: '2.1'
services:
zookeeper:
hostname: zookeeper
image: zookeeper:latest
environment:
ZOO_MY_ID: 1
restart: always
healthcheck:
test: ["CMD", "zkServer.sh", "status"]
kafka_broker:
image: joseestudillo/kafka:latest
environment:
KAFKA_ZOOKEEPER_HOST: zookeeper:2181/kafka-docker-compose
KAFKA_LOG_DIR: /var/log/kafka
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
zookeeper:
condition: service_healthy
With this file defined, we just need to call docker-compose up --scale kafka_broker=5
and it will start a kafka cluster with 5 brokers.