LogScale on Bare Metal - Apache Kafka Installation with ZooKeeper | Falcon LogScale Self-Hosted 1.143.0-1.153.3

LogScale on Bare Metal - Apache Kafka Installation with ZooKeeper

When creating a multi-node Kafka cluster these numbers must be unique for each host:

Host	Kafka `broker.id`	ZooKeeper `myid`
kafka1	1	1
kafka2	2	2
kafka3	3	3

To install Kafka and ZooKeeper:

Go to the /opt directory and download the latest release. The package can be downloaded using wget:
shell
```
$ cd /opt
$ wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
```
Extract the archive and create directories it needs like this:
shell
```
$ tar zxf kafka_2.13-3.7.0.tgz
```
Now create the directories where the information will be stored. We will use the top level directory /kafka since that could be a mount point for a separate filesystem. We will also create a directory for application log files in /var/log/kafka:
shell
```
$ mkdir /var/log/kafka
$ mkdir /var/log/zookeeper
$ mkdir /kafka/kafka
$ chown kafka:kafka /var/log/kafka /var/log/zookeeper
$ chown kafka:kafka /kafka/kafka
$ chown kafka:kafka /kafka/zookeeper
```
Now link the application directory to /opt/kafka which will allow us to use /opt/kafka for the application and scripts, but update the version by downloading and relinking to the updated application directory:
shell
```
$ ln -s /opt/kafka_2.13-3.7.0 /opt/kafka
```
Using a text editor, open the Kafka properties file, server.properties, located in the kafka/config sub-directory. The following options should be set for optimal configuration
ini
```
broker.id=1
log.dirs=/kafka/kafka
delete.topic.enable = true
```
The first line sets the broker.id value to match the server number (in the myid file) set when configuring ZooKeeper. The second sets the data directory. The third line should be added to the end of the configuration file. Save the file and change the owner to the kafka user:
shell
```
$ chown -R kafka:kafka /opt/kafka
```
Modify the directory according to the version of Kafka that has been installed. Note, changing the ownership of the link /opt/kafka doesn't change the ownership of the files in the directory.
If deploying a multi-node Kafka cluster, make sure that each node can resolve the hostname of each other node in the cluster. One way to achieve this is to edit the /etc/hosts file on each node with the host information:
ini
```
192.168.1.15 kafka1
192.168.1.16 kafka2
192.168.1.17 kafka3
```
Important
Be aware that in some Linux distributions, the hosts file may contain a line that by default resolves the hostname to the localhost address, 127.0.0.1. This will cause servers to only listen on the localhost address and therefore not accessible to other hosts on the network. In this case, change the line:
ini
```
127.0.1.1 kafka1
```
Updating the IP address to the public address of the host.
To configure the properties for ZooKeeper, edit the config/zookeeper.properties file with the following options:
ini
```
dataDir=/kafka/zookeeper
clientPort=2181
maxClientCnxns=0
admin.enableServer=false
server.1=kafka1:2888:3888
server.2=kafka2:2888:3888
server.3=kafka3:2888:3888
4lw.commands.whitelist=*
tickTime=2000
initLimit=5
syncLimit=2
```
The server.1, server.2 and server.3 configure the hostname and host-to-host ports used to communicate. These must match the broker.id Kafka configuration and myid file value.
The last three lines are required by ZooKeeper in multi-node configurations to set the timing interval for communicating with the other hosts and the time limit before reporting an error.
Tip
This file can be copied to each node running ZooKeeper, as there are no node-specific configuration settings.

Set the node id for ZooKeeper on each node:

Node kafka1

shell

$ mkdir /kafka/zookeeper
$ echo 1 > /kafka/zookeeper/myid
$ chown -R kafka:kafka /kafka/zookeeper

Node kafka2

shell

$ mkdir /kafka/zookeeper
$ echo 2 > /kafka/zookeeper/myid
$ chown -R kafka:kafka /kafka/zookeeper

Node kafka3

shell

$ mkdir /kafka/zookeeper
$ echo 3 > /kafka/zookeeper/myid
$ chown -R kafka:kafka /kafka/zookeeper

Important

The number in myid must be unique on each host, and match the broker.id configured for Kafka.

Create a service file for ZooKeeper so that it will run as a system service and be automatically managed to keep running.

Create the file /etc/systemd/system/zookeeper.service sub-directory, edit the file add the following lines:

ini

[Unit]

[Service]
Type=simple
User=kafka
LimitNOFILE=800000
Environment="LOG_DIR=/var/log/zookeeper"
Environment="GC_LOG_ENABLED=true"
Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/zookeeper.properties
Restart=on-failure
TimeoutSec=900

[Install]
WantedBy=multi-user.target

Now start and enable the service:

shell

$ systemctl start zookeeper

Check if the service is running by using the status command:

shell

$ systemctl status zookeeper

Output similar to the following showing active (running) if the service is OK:

zookeeper.service
     Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-03-07 05:31:36 GMT; 1s ago
   Main PID: 4968 (java)
      Tasks: 16 (limit: 1083)
     Memory: 24.6M
        CPU: 1.756s
     CGroup: /system.slice/zookeeper.service
             ??4968 java -Xms512M -Xmx4G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/var/log/zookeeper/zookeeper-gc.log:time,tags:filecount=10,filesize=100M" -Dcom.sun.management.>

Mar 07 05:31:36 kafka1 systemd[1]: Started zookeeper.service.

This should report any issues which should be addressed before starting the service again. If everything is OK, enable the service so that it will always start on boot:

shell

$ systemctl enable zookeeper

Important

When running a multi-node service, repeat this process on each node, remembering to ensure that each node has a different number in each myid file.

Now create a service for Kafka. The configuration file is slightly different because there is a dependency added so that the system will start ZooKeeper first if it is not running before trying to Kafka.

Create the file /etc/systemd/system/kafka.service sub-directory, edit the file add the following lines:

ini

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
LimitNOFILE=800000
Environment="LOG_DIR=/var/log/kafka"
Environment="GC_LOG_ENABLED=true"
Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Restart=on-failure
TimeoutSec=900

[Install]
WantedBy=multi-user.target

Now start the Kafka service:
shell
```
$ systemctl start kafka
$ systemctl status kafka
$ systemctl enable kafka
```
These steps must be repeated on each host in a multi-node deployment.

Self-Hosted Overview

Installing LogScale

Instance Administration

Organization Essentials

Configuring Security

Authentication & Identity Providers

Users & permissions

Cluster Management

Configuration Settings

Ingesting Data

LogScale Configuration Parameters

LogScale URLs & Endpoints

Limits & Standards

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Query Language Syntax

Query Joins and Lookups

Query Functions

Dashboards & Widgets

Automation

Template Language

Keyboard Shortcuts

LogScale on Bare Metal - Apache Kafka Installation with ZooKeeper

Important

Tip

Important

Important

Enter search term