Kafka Usage & Installation
Humio uses Apache Kafka internally for queuing incoming messages and for storing shared state when running Humio in a cluster setup. This page describes how Humio uses Kafka. If you already understand Kafka concepts, you can skip this and go to the instructions on how to Install Kafka, further down this page.
For more information on Kafka configuration and settings, see Kafka Configuration.
How Humio Uses Kafka
Humio creates the following queues in Kafka:
You can set the environment variable
HUMIO_KAFKA_TOPIC_PREFIX
to add that prefix to the
topic names in Kafka. Adding a prefix is recommended if you share the
Kafka installation with applications other than Humio, or with another
Humio instance. The default is not to add a prefix.
Humio configures default retention settings on the topics when it creates them. If they exist already, Humio does not alter retention settings on the topics.
If you wish to inspect and change the topic configurations, such as
the retention settings, to match your disk space available for Kafka,
please use the kafka-configs
command. See below for an example, modifying the retention on the
ingest queue to keep burst of data for up to one hour only.
global-events
This is Humio's event-sourced database queue.
This queue has a relatively low throughput.
Allow messages of at least 2 MB:
max.message.bytes=2097152
(or more) to allow large events.No log data is saved to this queue.
There should be a high number of replicas for this queue.
Humio will raise the number of replicas on this queue to three if there are at least three brokers in the Kafka cluster and Humio is allowed to manage the topic.
Default required replicas:
min.insync.replicas = 2
(provided there are three brokers when Humio creates the topic.
Default retention configuration:
retention.bytes = 1073741824
(1 GB) and retention.ms = -1
(to disable time based retention).
Compression should be set to:
compression.type=producer
kafka-humio-ingest
Ingested events are sent to this queue, before they are stored in
Humio. Humio's front end will accept ingest requests, parse them,
and put them on the queue. Humio's back end processes events from
the queue and stores them into the datastore. This queue will have
high throughput corresponding to the ingest load. The number of
replicas can be configured in accordance with data size, latency and
throughput requirements, and how important it is not to lose
in-flight data. Humio defaults to two replicas on this queue, if at
least two brokers exist in the Kafka cluster, and Humio has not been
told otherwise through the configuration parameter
INGEST_QUEUE_REPLICATION_FACTOR
, which defaults to
2
. When data is stored in
Humio's own datastore, we don't need it on the queue any more.
Default required replicas:
min.insync.replicas = $INGEST_QUEUE_REPLICATION_FACTOR - 1
(provided there are enough brokers when Humio creates the topic)Default retention configuration:
retention.ms = 604800000
(7 days as millis)Compression should be set to:
compression.type=producer
Allow messages of at least 10 MB:
max.message.bytes=10485760
to allow large events.Compaction is not allowed.
transientChatter-events
This queue is used for chatter between Humio nodes. It is only used
for transient data. Humio will raise the number of replicas on this
queue to 3
if there are at
least three brokers in the Kafka cluster. The queue can have a short
retention and it is not important to keep the data, as it gets stale
very fast.
Default required replicas:
min.insync.replicas = 2
(provided there are three brokers when Humio creates the topic)Default retention configuration:
retention.ms = 3600000
(one hour as millis)Compression should be set to:
compression.type=producer
Support compaction settings allowing Kafka to retain only the latest copy:
cleanup.policy=compact
Kafka Version
Humio is capable of running on Kafka version 2.4.1 and greater and is usually tested against the latest Kafka version.
Note
Although any version down to 2.4.1 applies, it is strongly recommended to install the latest Kafka version possible on your environment. Find the currently available Kafka versions.
## Example commands for setting protocol version on topic...
# See current config for topic, if any:
kafka-configs.sh --zookeeper localhost:2181 --describe --entity-type topics --entity-name 'humio-ingest'
# Set protocol version for topic:
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name 'humio-ingest' --add-config 'message.format.version=0.11.0'
# Remove setting, allowing to use the default of the broker:
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name 'humio-ingest' --delete-config 'message.format.version'
Server Preparation
We recommend installing on Ubuntu, at least version 18.04. Before
installing Kafka, make sure the server is up-to-date. If you haven't
already done this, you can upgrade the system with
apt-get
like so:
apt-get update
apt-get upgrade
Next, create a non-administrative user named,
kafka
to run Kakfa. You can do
this by executing the following from the command-line:
adduser kafka --shell=/bin/false --no-create-home --system --group
You should add this user to the
DenyUsers
section of your nodes
/etc/ssh/sshd_config
file to prevent it from
being able to ssh or sftp into the node. Remember to restart the
sshd
daemon after making the
change. Once the system has finished updating and the user has been
created, you can install Kafka.
Installation
To install Kafka, you'll need to go to the
opt
directory and download the
latest release. You can do that like so with
wget
:
cd /opt
wget https://www-us.apache.org/dist/kafka/x.x.x/kafka_x.x.x.x.tgz
You would adjust this last line, change the Xs to the latest version number. Once it downloads, untar the file and then create the directories it needs like this:
tar zxf kafka_x.x.x.x.tgz
mkdir /var/log/kafka
mkdir /var/kafka-data
chown kafka:kafka /var/log/kafka
chown kafka:kafka /var/kafka-data
ln -s /opt/kafka_x.x.x.x /opt/kafka
The four lines in the middle here create the directories for Kafka's
logs and data, and changes the ownership of those directories to the
kafka
user. The last line
creates a symbolic to /opt/kafka
. You would
adjust that, though, replacing the Xs with the version number.
Using a simple text editor, open the Kafka properties file,
server.properties
, located in
the kafka/config
sub-directory.
You'll need to set a few options — the lines below are not
necessarily the order in which they'll be found in the configuration
file:
broker.id=1
log.dirs=/var/kafka-data
delete.topic.enable = true
The first line sets the
broker.id
value to match the
server number (myid
) you set
when configuring Zookeeper. The second sets the data directory. The
third line should be added to the end of the configuration file. When
you're finished, save the file and change the owner to the
kafka
user:
chown -R kafka:kafka /opt/kafka_x.x.x.x
You'll have to adjust this to the version you installed. Note,
changing the ownership of the link /opt/kafka
doesn't change the ownership of the files in the directory.
Now you'll need to create a service file for starting Kafka. Use a
simple text editor to create a file named,
kafka.service
in the
/etc/systemd/system/
sub-directory. Then add the
following lines to the service file:
[Unit]
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
LimitNOFILE=800000
Environment="LOG_DIR=/var/log/kafka"
Environment="GC_LOG_ENABLED=true"
Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Restart=on-failure
[Install]
WantedBy=multi-user.target
For more information on Kafka configuration and settings, see Kafka Configuration.
Now you're ready to start the Kafka service. Enter the first line below to start it. When it finishes, enter the second line to check that it's running and there are no errors reported:
systemctl start kafka
systemctl status kafka
systemctl enable kafka
After breaking out of the status by pressing
q
, enter the last line above to
set the Kafka service to start when the server boots up.