Kafka Prerequisites

LogScale requires low latency access to a Kafka cluster to operate optimally, you must have an available Kafka cluster before deploying LogScale.

Non-kafka systems that are similar are not supported, for example Google PubSub or Azure EventHub are not supported.

Sub 50 millisecond ping times from the LogScale pods to the Kafka cluster will ensure data is ingested quickly and be available for search in less than a second.

In its default configuration LogScale will automatically create the Kafka topics and partitions required to operate. This functionality can be disabled by setting the KAFKA_MANAGED_BY_HUMIO value to false.

Running Kafka on Kubernetes can be accomplished a variety of ways. The Strimzi Kafka Operator is one such way and uses the same operator patterns that the Humio Operator uses to manage the life-cycles of both Kafka and ZooKeeper nodes. In production setups, LogScale, Kafka, and ZooKeeper should be run on separate worker nodes in the cluster. Both Kafka and ZooKeeper must use persistent volumes provided by the Kubernetes environment, and they should not use ephemeral disks in a production deployment. Kafka brokers and ZooKeeper instances should be deployed on worker nodes in different locations such as racks, data centers, or availability zone to ensure reliability. Kubernetes operators such, as the Strimzi operator, do not create worker nodes and label them, that task is left to the administrators.

The following YAML represents a basic Kafka cluster with three brokers, each running on worker nodes with 16 cores and 64GB of memory. The cluster is configured to have three replicas by default with the minimum in sync replicas set to two. This allows for the upgrading of brokers with no downtime as there will always be two brokers with the data.

yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-basic-cluster-1
spec:
  kafka:
    version: 3.4.0
    replicas: 3
    listeners:
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.4"
    rack:
      topologyKey: topology.kubernetes.io/zone
    resources:
      requests:
        memory: 58Gi
        cpu: "8"
      limits:
        memory: 58Gi
        cpu: "14"   
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes_worker_node_group
              operator: In
              values:
              - kafka-workers
            - key: kubernetes.io/arch
              operator: In
              values:
              - amd64
            - key: kubernetes.io/os
              operator: In
              values:
              - linux
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - kafka
        topologyKey: kubernetes.io/hostname     
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 500Gi
        deleteClaim: false
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
    resources:
      requests:
        memory: 8Gi
        cpu: "2"
      limits:
        memory: 8Gi
        cpu: "2"  
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes_worker_node_group
              operator: In
              values:
              - zookeeper-workers
            - key: kubernetes.io/arch
              operator: In
              values:
              - amd64
            - key: kubernetes.io/os
              operator: In
              values:
              - linux
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - zookeeper
  entityOperator:
    topicOperator: {}
    userOperator: {}

For more general information on configuring Strimzi please see their documentation.