Deploying a Kafka Cluster using Containers
There are many different Kafka containers available, for example Strimzi or Apache
The recommended default is to run three or mroe instances of Kafka as a separate cluster from the LogScale cluster. The instructions below are using containers from Apache Kafka.
Important
Kafka and LogScale should be run on different instances and hardware. This will help ensure performance for both systesms when ingesting data.
When configuring Kafka, each Kafka host must have it's own unique node number. For example, in a three node cluster:
Host | Kafka ID |
---|---|
kafka1 | 1 |
kafka2 | 2 |
kafka3 | 3 |
How this is configured depends on your chosen Kafka image. When using Kafka you can run in two modes:
In Kraft mode, Kafka does not use ZooKeeper but uses the
node.id
parameter, orKAFKA_NODE_ID
variable in Docker to set the node ID.In addition, in KRaft mode you must configure a unique a cluster ID.
In ZooKeeper mode, Kafka uses ZooKeeper and the
broker.id
parameter, orKAFKA_BROKER_ID
variable in Docker is used to set the unique broker ID.
When configuring each node, ensure that the listener host and port number is accessible to other hosts and that the LogScale instances can reach over the network. The default Kafka ports should be open and accessible between docker images. If in doubt, please refer to the Kafka documentation. In the configuration below, the ports 29092, 29093 and 29094 are used.
For example, to run the Docker image in KRaft mode across three hosts:
docker run -d \
--name=kafka1 \
-h kafka1 \
-p 9092:9092 \
-p 29092:29092 \
-e KAFKA_NODE_ID=1 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka1:29092,PLAINTEXT_HOST://localhost:9092' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka1:29092,CONTROLLER://kafka1:29092,PLAINTEXT_HOST://0.0.0.0:9092' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0
docker run -d \
--name=kafka2 \
-h kafka2 \
-p 9093:9093 \
-p 29093:29093 \
-e KAFKA_NODE_ID=2 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka2:29093,PLAINTEXT_HOST://localhost:9093' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka2:29093,CONTROLLER://kafka2:29093,PLAINTEXT_HOST://0.0.0.0:9093' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0
docker run -d \
--name=kafka3 \
-h kafka3 \
-p 9094:9094 \
-p 29094:29094 \
-e KAFKA_NODE_ID=3 \
-e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka3:29094,PLAINTEXT_HOST://localhost:9094' \
-e KAFKA_PROCESS_ROLES='broker,controller' \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka1:29092,2@kafka2:29093,3@kafka3:29094' \
-e KAFKA_LISTENERS='PLAINTEXT://kafka3:29094,CONTROLLER://kafka3:29094,PLAINTEXT_HOST://0.0.0.0:9094' \
-e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \
-e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \
-e KAFKA_CLUSTER_ID='SNO4Bhs6QYuk4lQUougG6w' \
apache/kafka:3.7.0
Start the Docker images, mounting the configuration files and data locations created in previous steps.
To verify that Kafka is running:
Use nc to get the status of each ZooKeeper instance. The following must respond with either Leader or Follower for all instances
shell$
echo stat | nc 192.168.1.1 2181 | grep '^Mode: '
Optionally, use your favorite Kafka tools to validate the state of your Kafka cluster. You could list the topics using the following command, expecting to get an empty list since this is a fresh install of Kafka
shell$
kafka-topics.sh --zookeeper localhost:2181 --list