Kafka Usage
LogScale uses Apache Kafka internally for queuing incoming messages and for storing shared state when running LogScale in a cluster setup. This page describes how LogScale uses Kafka. If you already understand Kafka concepts, you can skip this and go to the instructions on how to install Kafka, further down this page.
For more information on Kafka configuration and settings, see Kafka Configuration.
How LogScale Uses Kafka
LogScale creates the following queues in Kafka:
You can set the environment variable
HUMIO_KAFKA_TOPIC_PREFIX
to add that prefix to the
topic names in Kafka. Adding a prefix is recommended if you share the
Kafka installation with applications other than LogScale, or with
another LogScale instance. The default is not to add a prefix.
LogScale configures default retention settings on the topics when it creates them. If they exist already, LogScale does not alter retention settings on the topics.
If you wish to inspect and change the topic configurations, such as the retention settings, to match your disk space available for Kafka, please use the kafka-configs command. See below for an example, modifying the retention on the ingest queue to keep burst of data for up to one hour only.
global-events
This is LogScale's event-sourced database queue.
This queue has a relatively low throughput.
Allow messages of at least 2MB or more to allow large events:
inimax.message.bytes=2097152
No log data is saved to this queue.
There should be a high number of replicas for this queue.
LogScale will raise the number of replicas on this queue to three if there are at least three brokers in the Kafka cluster and LogScale is allowed to manage the topic.
Default required replicas:
min.insync.replicas = 2
Provided there are three brokers when LogScale creates the topic. Default retention configuration:
retention.bytes = 1073741824
Which configures 1GB, and disable time based retention:
retention.ms = -1
Compression should be set to:
compression.type=producer
kafka-humio-ingest
Ingested events are sent to this queue, before they are stored in LogScale. LogScale's front end will accept ingest requests, parse them, and put them on the queue. LogScale's back end processes events from the queue and stores them into the datastore. This queue will have high throughput corresponding to the ingest load. The number of replicas can be configured in accordance with data size, latency and throughput requirements, and how important it is not to lose in-flight data.
LogScale defaults to two replicas on this queue, if at least two
brokers exist in the Kafka cluster, and LogScale has not been told
otherwise through the configuration parameter
INGEST_QUEUE_REPLICATION_FACTOR
, which defaults to
2
. When data is stored in
LogScale's own datastore, we don't need it on the queue any more.
Default required replicas:
inimin.insync.replicas = $INGEST_QUEUE_REPLICATION_FACTOR - 1
Provided there are enough brokers when LogScale creates the topic.
Default retention configuration (7 days as milliseconds):
iniretention.ms = 604800000
Set the retention configuration on the
humio-ingest
topic to:iniretention.bytes = disk_space_in_bytes_on_one_host / partitionCount
with the actual setting based on the disk space available.
Compression should be set to:
inicompression.type=producer
Allow messages of at least 8 MB to allow large events:
inimax.message.bytes=8388608
Compaction is not allowed.
transientChatter-events
This queue is used for chatter between LogScale nodes. It is only
used for transient data. LogScale will raise the number of replicas
on this queue to 3
if there are
at least three brokers in the Kafka cluster. The queue can have a
short retention and it is not important to keep the data, as it gets
stale very fast.
Default required replicas (provided there are three brokers when LogScale creates the topic):
inimin.insync.replicas = 2
Default retention configuration (one hour as millis):
iniretention.ms = 3600000
Compression should be set to:
inicompression.type=producer
Support compaction settings allowing Kafka to retain only the latest copy:
inicleanup.policy=delete,compact
Kafka Version
LogScale recommends that the latest Kafka version is used with your LogScale deployment. The latest version of Kafka is available at Kafka Downloads.
Note
The minimum supported Kafka version is Kafka 2.4.1 and greater and is usually tested against the latest Kafka version.
You can set the configuration for individual topics using the following commands:
## Example commands for setting protocol version on topic...
# See current config for topic, if any:
kafka-configs.sh --zookeeper localhost:2181 --describe --entity-type topics --entity-name 'humio-ingest'
# Set protocol version for topic:
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name 'humio-ingest' --add-config 'message.format.version=0.11.0'
# Remove setting, allowing to use the default of the broker:
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name 'humio-ingest' --delete-config 'message.format.version'