Deploying a Kafka Cluster using Containers

There are many different Kafka containers available, for example Strimzi or Apache

The recommended default is to run three or mroe instances of Kafka as a separate cluster from the LogScale cluster. The instructions below are using containers from Apache Kafka.

Important

Kafka and LogScale should be run on different instances and hardware. This will help ensure performance for both systems when ingesting data.

When configuring Kafka, each Kafka host must have it's own unique node number. For example, in a three node cluster:

Host Kafka ID
kafka1 1
kafka2 2
kafka3 3

How this is configured depends on your chosen Kafka image. When using Kafka you can run in two modes:

  • In Kraft mode, Kafka does not use ZooKeeper but uses the node.id parameter, or KAFKA_NODE_ID variable in Docker to set the node ID.

    In addition, in KRaft mode you must configure a unique a cluster ID.

  • In ZooKeeper mode, Kafka uses ZooKeeper and the broker.id parameter, or KAFKA_BROKER_ID variable in Docker is used to set the unique broker ID.

When configuring each node, ensure that the listener host and port number is accessible to other hosts and that the LogScale instances can reach over the network. The default Kafka ports should be open and accessible between docker images. If in doubt, please refer to the Kafka documentation. In the configuration below, the ports 29092, 29093 and 29094 are used.

For example, to run the Docker image in KRaft mode across three hosts:

Host 1 (kafka1)
shell
docker run -d \
--name=kafka1 \
-h kafka1 \
-p 9092:9092 \
-p 29092:29092 \
-e KAFKA_NODE_ID=1 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka1:29092,PLAINTEXT_HOST://localhost:9092' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka1:29092,CONTROLLER://kafka1:29092,PLAINTEXT_HOST://0.0.0.0:9092' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0
Host 2 (kafka2)
shell
docker run -d \
--name=kafka2 \
-h kafka2 \
-p 9093:9093 \
-p 29093:29093 \
-e KAFKA_NODE_ID=2 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka2:29093,PLAINTEXT_HOST://localhost:9093' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka2:29093,CONTROLLER://kafka2:29093,PLAINTEXT_HOST://0.0.0.0:9093' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0
Host 3 (kafka3)
shell
docker run -d \
--name=kafka3 \
-h kafka3 \
-p 9094:9094 \
-p 29094:29094 \
-e KAFKA_NODE_ID=3 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka3:29094,PLAINTEXT_HOST://localhost:9094' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka3:29094,CONTROLLER://kafka3:29094,PLAINTEXT_HOST://0.0.0.0:9094' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0

Start the Docker images, mounting the configuration files and data locations created in previous steps.

To verify that Kafka is running:

  • Use nc to get the status of each ZooKeeper instance. The following must respond with either Leader or Follower for all instances

    shell
    $ echo stat | nc 192.168.1.1 2181 | grep '^Mode: '
  • Optionally, use your favorite Kafka tools to validate the state of your Kafka cluster. You could list the topics using the following command, expecting to get an empty list since this is a fresh install of Kafka

    shell
    $ kafka-topics.sh --zookeeper localhost:2181 --list