Kafka Configuration

Humio uses Apache Kafka internally for queuing incoming messages and for storing shared state when running Humio in a cluster setup. This page describes how to configure Kafka. For information on how Humio uses Kafka and how to install Kafka, see the Kafka Usage & Installation page of the Documentation.

Humio has built-in API endpoints for controlling Kafka. Using the API it is possible to specify partition size and replication factor on the ingest queue. It’s also possible to use other Kafka tools, such as the command-line tools included in the Kafka distribution.

Note

Make sure to not apply compression inside Kafka to the queues below. Humio compresses the messages when relevant. Letting Kafka apply compression as well slows down the system and also adds problems with GC due to use of JNI in case LZ4 is applied. Setting compression.type to producer is recommended on these queues.

Topic Management

It is possible to use Kafka in two modes; either Humio manages its Kafka topics (this is the default), or it does not. If Humio is managing, it will create topics if they do not exist, and will look at the topic configurations and manage those as well. If Humio is not managing the Kafka topics, it will not create topics or change configurations; you must create and properly configure the topics listed in the Topics section in Kafka.

To disable Humio’s automatic management, and manage topics manually, set the configuration flag KAFKA_MANAGED_BY_HUMIO to false.

Other Properties

It is possible to add extra Kafka configuration properties to Humio’s Kafka consumers and Kafka producers by pointing to a properties file using EXTRA_KAFKA_CONFIGS_FILE. For example, this enables Humio to connect to a Kafka cluster using SSL and SASL. Remember to map the configuration file into the Humio Docker container if running Humio in a Docker container.

Retention Settings

Show ingest queue configuration. (This only shows properties set specifically for the topic — not the default ones specified in kafka.properties

shell
<kafka_dir>/bin/kafka-configs.sh --zookeeper $ZOOKEEPER_HOST:2181 --entity-name humio-ingest --entity-type topics --describe

Set retention on the ingest queue to 7 days.

shell
<kafka_dir>/bin/kafka-configs.sh --zookeeper $ZOOKEEPER_HOST:2181 --entity-name humio-ingest --entity-type topics --alter --add-config retention.ms=604800000

Set retention on the ingest queue to 1GB (per partition)

shell
<kafka_dir>/bin/kafka-configs.sh --zookeeper $ZOOKEEPER_HOST:2181 --entity-name humio-ingest --entity-type topics --alter --add-config retention.bytes=1073741824

Note

The setting retention.bytes is per partition. By default Humio has 24 partitions for ingest.

Broker Settings

If you use the Kafka brokers only for Humio, you can configure the Kafka brokers to allow large messages on all topics. This example allows up to 100 MB in each message. Note that larger sizes make the brokers need more memory for replication.

python
# max message size for all topics by default:
message.max.bytes=104857600

Sample Kafka Configuration

It is important to set log.dirs to the location where Kafka should store the data. Without such a setting, Kafka defaults to /tmp/kafka-logs, which is very likely NOT where you want it. Note that this is the actual Kafka data not the debug log.

ini
############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################

listeners=PLAINTEXT://localhost:9092
#use compression
compression.type=producer

############################# Log Basics #############################

# A comma-separated list of directories under which to store log files
log.dirs=/data/kafka-data

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=48

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1000073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
auto.create.topics.enable=false
unclean.leader.election.enable=false

############################# Zookeeper #############################
zookeeper.connect=localhost:2181

Sample Zookeeper Configuration

ini
# the directory where the snapshot is stored.
dataDir=/data/zookeeper-data
# the port at which the clients will connect
clientPort=2181
clientPortAddress=localhost
tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
admin.enableServer=false
4lw.commands.whitelist=*