Install Kafka
We recommend a minimum 3-node installation for Kafka. If deploying a Single-node test deployment only one node is required, but there will be no high-availability support.
When deploying Kafka using ZooKeeper using the Tarball installation you need to configure the two services with a unique host ID number:
For Kafka this is the
broker.id
configuration value in theserver.properties
file:ini# The id of the broker. This must be set to a unique integer for each broker. broker.id=1
For ZooKeeper this is a file called
myid
in the ZooKeeper data directory that contains the node ID number. You can create it using:shell$
echo 1 >/kafka/zookeeper/myid
When creating a multi-node Kafka cluster these numbers must be unique for each host:
Host |
Kafka broker.id
|
ZooKeeper myid
|
---|---|---|
kafka1 | 1 | 1 |
kafka2 | 2 | 2 |
kafka3 | 3 | 3 |
Server Preparation
We recommend installing on Ubuntu, at least version 18.04. Before installing Kafka, make sure the server is up-to-date. If you haven't already done this, you can upgrade the system with apt-get like so:
$ apt-get update
$ apt-get upgrade
Next, create a non-administrative user named,
kafka
to run Kakfa. You can do
this by executing the following from the command-line:
$ adduser kafka --shell=/bin/false --no-create-home --system --group
You should add this user to the DenyUsers
section of your nodes /etc/ssh/sshd_config
file
to prevent it from being able to ssh or
sftp into the node. Remember to restart the
sshd daemon after making the change. Once the
system has finished updating and the user has been created, you can
install Kafka.
Installing Kafka
To install Kafka and ZooKeeper:
Go to the
/opt
directory and download the latest release. You can do thatub sing wget:shell$
cd /opt
$wget
https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
Extract the archive and create directories directories it needs like this:
shell$
tar zxf
kafka_2.13-3.7.0.tgz
Now create the directories where the information will be stored. We will use the top level directory
/kafka
since that could be a mount point for a separate filesystem. We will also create a directory for application log files in/var/log/kafka
:shell$
mkdir /var/log/kafka
$mkdir /var/log/zookeeper
$mkdir /kafka/kafka
$chown kafka:kafka /var/log/kafka /var/log/zookeeper
$chown kafka:kafka /kafka/kafka
$chown kafka:kafka /kafka/zookeeper
Now link the application directory to
/opt/kafka
which will allow us to use/opt/kafka
for the applcation and scripts, but update the version by downloading and relinking to the updated application directory:shell$
ln -s
/opt/kafka_2.13-3.7.0
/opt/kafkaUsing a text editor, open the Kafka properties file,
server.properties
, located in thekafka/config
sub-directory. You'll need to set a few options — the lines below are not necessarily the order in which they'll be found in the configuration file:inibroker.id=1 log.dirs=/kafka/kafka delete.topic.enable = true
The first line sets the
broker.id
value to match the server number (in themyid
file) you set when configuring ZooKeeper. The second sets the data directory. The third line should be added to the end of the configuration file. When you're finished, save the file and change the owner to thekafka
user:shell$
chown -R kafka:kafka /opt/kafka_2.13-3.7.0
You'll have to adjust this to the version you installed. Note, changing the ownership of the link
/opt/kafka
doesn't change the ownership of the files in the directory.If you are a deploying a multi-node Kafka cluster, make sure that each node can resolve the hostname of each other node in the cluster. One way to achieve this is to edit your
/etc/hosts
file:ini192.168.1.15 kafka1 192.168.1.16 kafka2 192.168.1.17 kafka3
Be aware that in some Linux distributions, the hosts file may contain a line that by default resolves the hostname to the localhost address,
127.0.0.1
. This will cause servers to only listen on the localhost address and therefore not accessible to other hosts on the network. In this case, change the line:ini127.0.1.1 kafka1 kafka1
Updating the IP address to the public addres of the host.
To configure the properties for ZooKeeper, edit the
config/zookeeper.properties
file with the following options:inidataDir=/kafka/zookeeper clientPort=2181 maxClientCnxns=0 admin.enableServer=false server.1=kafka1:2888:3888 server.2=kafka2:2888:3888 server.3=kafka3:2888:3888 4lw.commands.whitelist=* tickTime=2000 initLimit=5 syncLimit=2
The
server.1
,server.2
andserver.3
configure the hostname and host-to-host ports used to communicate. These must match thebroker.id
Kafka configuration andmyid
file value.The last three lines are required by ZooKeeper in multi-node configurations to set the timing interval for communicating with the other hosts and the time limit before reporting an error.
Tip
This file can be copied to each node running ZooKeeper, as there are no node-specific configuration settings.
Set the node id for ZooKeeper:
shell$
mkdir /kafka/zookeeper
$echo 1 > /kafka/zookeeper/myid
$chown -R kafka:kafka /kafka/zookeeper
Important
The number in
myid
must be unique on each host, and match thebroker.id
configured for Kafka.Create a service file for ZooKeeper so that it will run as a system service and be automatically managed to keep running.
Create the file
/etc/systemd/system/zookeeper.service
sub-directory, edit the file add the following lines:ini[Unit] [Service] Type=simple User=kafka LimitNOFILE=800000 Environment="LOG_DIR=/var/log/zookeeper" Environment="GC_LOG_ENABLED=true" Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G" ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/zookeeper.properties Restart=on-failure TimeoutSec=900 [Install] WantedBy=multi-user.target
Now start and enable the service:
shell$
systemctl start zookeeper
You can check if the service is running by using the
status
option:shell$
systemctl status zookeeper
You should get output similar to the following showing
active (running)
if the service is OK:zookeeper.service Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: enabled) Active: active (running) since Thu 2024-03-07 05:31:36 GMT; 1s ago Main PID: 4968 (java) Tasks: 16 (limit: 1083) Memory: 24.6M CPU: 1.756s CGroup: /system.slice/zookeeper.service ??4968 java -Xms512M -Xmx4G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/var/log/zookeeper/zookeeper-gc.log:time,tags:filecount=10,filesize=100M" -Dcom.sun.management.> Mar 07 05:31:36 kafka1 systemd[1]: Started zookeeper.service.
This should report any issues which you should address before starting the service again. If everthing is OK, enable the service so that it will always start on boot:
shell$
systemctl enable zookeeper
Important
If you are running a multi-node service, you must repeat this process on each node, remembering to ensure that you have a different node ID number in each
myid
file.Now create a service for Kafka. This file is slightly different because there is a dependency added so that the system will start ZooKeeper first if it is not running before trying to Kafka.
Create the file
/etc/systemd/system/zookeeper.service
sub-directory, edit the file add the following lines:ini[Unit] Requires=zookeeper.service After=zookeeper.service [Service] Type=simple User=kafka LimitNOFILE=800000 Environment="LOG_DIR=/var/log/kafka" Environment="GC_LOG_ENABLED=true" Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G" ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties Restart=on-failure TimeoutSec=900 [Install] WantedBy=multi-user.target
Now start the Kafka service:
shell$
systemctl start kafka
$systemctl status kafka
$systemctl enable kafka
You will need to repeat this on each host in a multi-node deployment.