Deploying a Kafka Cluster using Containers

There are many different Kafka distributions available, for example Strimzi or Apache

The instructions below use Strimzi and are provided as a simplified guide, not as configuration for a production system.

Important

Kafka and LogScale should be run on different instances and hardware. This will help ensure performance for both systems when ingesting data.

For production, a valid Kafka cluster should have at least six instances (pods/nodes), consisting of three brokers and three controllers. There also needs to be a minimum of three availability zones for an HA cluster. Additionally, there must be low latency (<300ms) between pods.

When configuring Kafka, each Kafka host must have its own unique node number. For example, in a three node cluster:

Host Kafka ID
kafka1 1
kafka2 2
kafka3 3

When using Kafka in KRaft mode, you need to configure:

  • A node.id parameter, or KAFKA_NODE_ID variable in Docker to set the node ID.
  • A unique a cluster ID.

When configuring each node, ensure that the listener host and port number is accessible to other hosts and that the LogScale instances can reach them over the network. The default Kafka ports should be open and accessible between docker images. If in doubt, please refer to the Kafka documentation.

Installation of Kafka/KRaft using Strimzi

This section shows you how to install Kafka in KRaft mode on Kubernetes using Strimzi.

  1. Ensure the prerequisites are in place:

    • Functioning Kubernetes cluster

    • Helm installed

    • kubectl configured

  2. Install the Strimzi operator:

    shell
    # Add Strimzi Helm repository
    helm repo add strimzi https://strimzi.io/charts/
    helm repo update
    
    # Install Strimzi operator
    helm install strimzi strimzi/strimzi-kafka-operator -n <namespace>
  3. Create the Kafka cluster:

    yaml
    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: logscale
    spec:
      kafka:
        version: 3.5.0
        replicas: 3  # These are the broker nodes
        listeners:
          - name: plain
            port: 9092
            type: internal
            tls: false
        storage:
          type: persistent-claim
          size: 100Gi
        config:
          offsets.topic.replication.factor: 3
          transaction.state.log.replication.factor: 3
          transaction.state.log.min.isr: 2
          process.roles: "broker"  # Broker-only role
          node.roles: "broker"    # Broker-only role
          controller.quorum.voters: "0@logscale-kafka-controller-0.logscale-kafka-controller.namespace.svc:9093,1@logscale-kafka-controller-1.logscale-kafka-controller.namespace.svc:9093,2@logscale-kafka-controller-2.logscale-kafka-controller.namespace.svc:9093"
          controller.listener.names: "CONTROLLER"
          inter.broker.protocol.version: "3.5"
        listeners:
          - name: CONTROLLER
            port: 9093
            type: internal
            tls: true
      # Separate controller configuration
      kafkaController:
        replicas: 3  # These are the controller nodes
        storage:
          type: persistent-claim
          size: 20Gi
        config:
          process.roles: "controller"  # Controller-only role
          node.roles: "controller"    # Controller-only role
    Click to collapse:

    Click to expand an example configuration for a production system:

  4. Apply the configuration:

    shell
    kubectl apply -f kafka-kraft-cluster.yaml -n <namespace>
  5. Verify the installation:

    shell
    kubectl get kafka -n <namespace>
    kubectl get pods -n <namespace>