LogScale on Bare Metal - Installing Apache Kafka Cluster
LogScale uses Apache Kafka to manage ingest processing and inter-node communication within the LogScale cluster. LogScale recommends a minimum 3-node installation.
When deploying a Kafka, each node must have a unique ID. When using Kafka using KRaft mode, this is achieved by using a UUID, which can be created as follows:
$ bin/kafka-storage.sh random-uuid
LogScale on Bare Metal - Apache Kafka Server Preparation
We recommend installing on Ubuntu, at least version 18.04. Before installing Kafka, make sure the server is up-to-date:
$ apt-get update
$ apt-get upgrade
Create a non-administrative user named
to run Kafka:
$ adduser kafka --shell=/bin/false --no-create-home --system --group
Add this user to the DenyUsers
section of each nodes
file to
prevent it from being able to ssh
or sftp into the node.
Restart the sshd daemon after making the change. Once the system has finished updating and the user has been created, Kafka can be installed.
LogScale on Bare Metal - Apache Kafka Installation using KRaft
To install Kafka using KRaft:
Go to the
directory and download the latest release. The package can be downloaded using wget:shell$ cd /opt $ wget https://downloads.apache.org/kafka/3.9.0/kafka_2.13-3.9.0.tgz
Extract the archive and create directories it needs like this:
shell$ tar zxf kafka_2.13-3.9.0.tgz
Now create the directories where the information will be stored. We will use the top level directory
since that could be a mount point for a separate filesystem. We will also create a directory for application log files in/var/log/kafka
:shell$ mkdir /var/log/kafka $ mkdir /kafka/kafka $ chown kafka:kafka /kafka/kafka
Now link the application directory to
which will allow us to use/opt/kafka
for the application and scripts, but update the version by downloading and relinking to the updated application directory:shell$ ln -s /opt/kafka_2.13-3.9.0 /opt/kafka
Using a text editor, open the Kafka properties file,
, located in thekafka/config
sub-directory. The following options are configured for using Kafka in Kraft mode. The hostnames and port numbers will be shared across each node. Configuration files for three different nodes are shown below:ininode.id=1 controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093 listeners=PLAINTEXT://kafka1:9092,CONTROLLER://kafka1:9093 advertised.listeners=PLAINTEXT://kafka1:9092 num.partitions=6 logs.dir=/opt/kafka/kraft-combined-logs
The first line sets the
value, a unique node must be set for each node. Thecontroller.quarum.voters
sets the nodes that will choose how work is distributed. The next two lines set the ports and protocol types. Thelogs.dir
sets the location of data directories.Update the ownership of the directory where the logs are stored:
shell$ chown -R kafka:kafka /opt/kafka/logs
Modify the directory according to the version of Kafka that has been installed.
If deploying a multi-node Kafka cluster, make sure that each node can resolve the hostname of each other node in the cluster. One way to achieve this is to edit the
file on each node with the host information:ini192.168.1.15 kafka1 kafka2 kafka3
Be aware that in some Linux distributions, the hosts file may contain a line that by default resolves the hostname to the localhost address,
. This will cause servers to only listen on the localhost address and therefore not accessible to other hosts on the network. In this case, change the line to:ini127.0.1.1 kafka1 localhost
Updating the IP address to the public address of the host.
Now create a service file for Kafka.
Create the file
sub-directory, edit the file add the following lines:ini[Unit] [Service] Type=simple User=kafka LimitNOFILE=800000 Environment="LOG_DIR=/var/log/kafka" Environment="GC_LOG_ENABLED=true" Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G" ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties Restart=on-failure TimeoutSec=900 [Install] WantedBy=multi-user.target
Now start the Kafka service:
shell$ systemctl start kafka $ systemctl status kafka $ systemctl enable kafka
These steps must be repeated on each host in a multi-node deployment.