Prerequisites — Kafka and Zookeeper
LogScale requires low latency access to a Kafka cluster to operate
optimally, you must have an available Kafka cluster before deploying
LogScale. Non-kafka systems that are similar are not supported, e.g.
Google PubSub or Azure EventHub are not supported. Sub 50 millisecond
ping times from the LogScale pods to the Kafka cluster will ensure data
is ingested quickly and be available for search in less than a second.
In its default configuration LogScale will automatically create the
Kafka topics and partitions required to operate. This functionality can
be disabled by setting the KAFKA_MANAGED_BY_HUMIO
value
to false.
Running Kafka on Kubernetes can be accomplished a variety of ways. The Strimzi Kafka Operator is one such way and uses the same operator patterns that the Humio Operator uses to manage the life-cycles of both Kafka and Zookeeper nodes. In production setups, LogScale, Kafka, and Zookeeper should be run on separate worker nodes in the cluster. Both Kafka and Zookeeper must use persistent volumes provided by the Kubernetes environment, and they should not use ephemeral disks in a production deployment. Kafka brokers and Zookeeper instances should be deployed on worker nodes in different locations such as racks, data centers, or availability zone to ensure reliability. Kubernetes operators such, as the Strimzi operator, do not create worker nodes and label them, that task is left to the administrators.
This yaml represents a basic Kafka cluster with 3 brokers, each running on worker nodes with 16 cores and 64GB of memory. The cluster is configured to have three replicas by default with the minimum in sync replicas set to two. This allows for the upgrading of brokers with no downtime as there will always be 2 brokers with the data. For more information on configuring Strimzi please see their documentation.