Install Kafka

We recommend a minimum 3-node installation for Kafka. If deploying a Single-node test deployment only one node is required, but there will be no high-availability support.

When deploying Kafka using ZooKeeper using the Tarball installation you need to configure the two services with a unique host ID number:

  • For Kafka this is the broker.id configuration value in the server.properties file:

    ini
    # The id of the broker. This must be set to a unique integer for each broker.
    broker.id=1
  • For ZooKeeper this is a file called myid in the ZooKeeper data directory that contains the node ID number. You can create it using:

    shell
    $ echo 1 >/kafka/zookeeper/myid

When creating a multi-node Kafka cluster these numbers must be unique for each host:

Host Kafka broker.id ZooKeeper myid
kafka1 1 1
kafka2 2 2
kafka3 3 3
Server Preparation

We recommend installing on Ubuntu, at least version 18.04. Before installing Kafka, make sure the server is up-to-date. If you haven't already done this, you can upgrade the system with apt-get like so:

shell
$ apt-get update
$ apt-get upgrade

Next, create a non-administrative user named, kafka to run Kakfa. You can do this by executing the following from the command-line:

shell
$ adduser kafka --shell=/bin/false --no-create-home --system --group

You should add this user to the DenyUsers section of your nodes /etc/ssh/sshd_config file to prevent it from being able to ssh or sftp into the node. Remember to restart the sshd daemon after making the change. Once the system has finished updating and the user has been created, you can install Kafka.

Installing Kafka

To install Kafka and ZooKeeper:

  1. Go to the /opt directory and download the latest release. You can do thatub sing wget:

    shell
    $ cd /opt
    $ wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
  2. Extract the archive and create directories directories it needs like this:

    shell
    $ tar zxf kafka_2.13-3.7.0.tgz
  3. Now create the directories where the information will be stored. We will use the top level directory /kafka since that could be a mount point for a separate filesystem. We will also create a directory for application log files in /var/log/kafka:

    shell
    $ mkdir /var/log/kafka
    $ mkdir /var/log/zookeeper
    $ mkdir /kafka/kafka
    $ chown kafka:kafka /var/log/kafka /var/log/zookeeper
    $ chown kafka:kafka /kafka/kafka
    $ chown kafka:kafka /kafka/zookeeper

    Now link the application directory to /opt/kafka which will allow us to use /opt/kafka for the applcation and scripts, but update the version by downloading and relinking to the updated application directory:

    shell
    $ ln -s /opt/kafka_2.13-3.7.0 /opt/kafka
  4. Using a text editor, open the Kafka properties file, server.properties, located in the kafka/config sub-directory. You'll need to set a few options — the lines below are not necessarily the order in which they'll be found in the configuration file:

    ini
    broker.id=1
    log.dirs=/kafka/kafka
    delete.topic.enable = true

    The first line sets the broker.id value to match the server number (in the myid file) you set when configuring ZooKeeper. The second sets the data directory. The third line should be added to the end of the configuration file. When you're finished, save the file and change the owner to the kafka user:

    shell
    $ chown -R kafka:kafka /opt/kafka_2.13-3.7.0

    You'll have to adjust this to the version you installed. Note, changing the ownership of the link /opt/kafka doesn't change the ownership of the files in the directory.

  5. If you are a deploying a multi-node Kafka cluster, make sure that each node can resolve the hostname of each other node in the cluster. One way to achieve this is to edit your /etc/hosts file:

    ini
    192.168.1.15 kafka1
    192.168.1.16 kafka2
    192.168.1.17 kafka3

    Be aware that in some Linux distributions, the hosts file may contain a line that by default resolves the hostname to the localhost address, 127.0.0.1. This will cause servers to only listen on the localhost address and therefore not accessible to other hosts on the network. In this case, change the line:

    ini
    127.0.1.1 kafka1 kafka1

    Updating the IP address to the public addres of the host.

  6. To configure the properties for ZooKeeper, edit the config/zookeeper.properties file with the following options:

    ini
    dataDir=/kafka/zookeeper
    clientPort=2181
    maxClientCnxns=0
    admin.enableServer=false
    server.1=kafka1:2888:3888
    server.2=kafka2:2888:3888
    server.3=kafka3:2888:3888
    4lw.commands.whitelist=*
    tickTime=2000
    initLimit=5
    syncLimit=2

    The server.1, server.2 and server.3 configure the hostname and host-to-host ports used to communicate. These must match the broker.id Kafka configuration and myid file value.

    The last three lines are required by ZooKeeper in multi-node configurations to set the timing interval for communicating with the other hosts and the time limit before reporting an error.

    Tip

    This file can be copied to each node running ZooKeeper, as there are no node-specific configuration settings.

  7. Set the node id for ZooKeeper:

    shell
    $ mkdir /kafka/zookeeper
    $ echo 1 > /kafka/zookeeper/myid
    $ chown -R kafka:kafka /kafka/zookeeper

    Important

    The number in myid must be unique on each host, and match the broker.id configured for Kafka.

  8. Create a service file for ZooKeeper so that it will run as a system service and be automatically managed to keep running.

    Create the file /etc/systemd/system/zookeeper.service sub-directory, edit the file add the following lines:

    ini
    [Unit]
    
    [Service]
    Type=simple
    User=kafka
    LimitNOFILE=800000
    Environment="LOG_DIR=/var/log/zookeeper"
    Environment="GC_LOG_ENABLED=true"
    Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
    ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/zookeeper.properties
    Restart=on-failure
    TimeoutSec=900
    
    [Install]
    WantedBy=multi-user.target

    Now start and enable the service:

    shell
    $ systemctl start zookeeper

    You can check if the service is running by using the status option:

    shell
    $ systemctl status zookeeper

    You should get output similar to the following showing active (running) if the service is OK:

    zookeeper.service
         Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: enabled)
         Active: active (running) since Thu 2024-03-07 05:31:36 GMT; 1s ago
       Main PID: 4968 (java)
          Tasks: 16 (limit: 1083)
         Memory: 24.6M
            CPU: 1.756s
         CGroup: /system.slice/zookeeper.service
                 ??4968 java -Xms512M -Xmx4G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/var/log/zookeeper/zookeeper-gc.log:time,tags:filecount=10,filesize=100M" -Dcom.sun.management.>
    
    Mar 07 05:31:36 kafka1 systemd[1]: Started zookeeper.service.

    This should report any issues which you should address before starting the service again. If everthing is OK, enable the service so that it will always start on boot:

    shell
    $ systemctl enable zookeeper

    Important

    If you are running a multi-node service, you must repeat this process on each node, remembering to ensure that you have a different node ID number in each myid file.

  9. Now create a service for Kafka. This file is slightly different because there is a dependency added so that the system will start ZooKeeper first if it is not running before trying to Kafka.

    Create the file /etc/systemd/system/zookeeper.service sub-directory, edit the file add the following lines:

    ini
    [Unit]
    Requires=zookeeper.service
    After=zookeeper.service
    
    [Service]
    Type=simple
    User=kafka
    LimitNOFILE=800000
    Environment="LOG_DIR=/var/log/kafka"
    Environment="GC_LOG_ENABLED=true"
    Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
    ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
    Restart=on-failure
    TimeoutSec=900
    
    [Install]
    WantedBy=multi-user.target
  10. Now start the Kafka service:

    shell
    $ systemctl start kafka
    $ systemctl status kafka
    $ systemctl enable kafka

    You will need to repeat this on each host in a multi-node deployment.