LogScale Kubernetes Reference Architecture

Deploying LogScale using Kubernetes and Install LogScale Operator on Kubernetes is the recommended deployment model. The reference architecture and deployment model for this type of installation is outlined below.


When following this guide, advanced Kubernetes skills are a prerequisite. This document touches on many different aspects that are assumed to be known by a person running stateful software within a Kubernetes cluster.

To use this guide when deploying LogScale, follow these sections:

  1. Cluster topology

    Understand the basic topology and components of the LogScale cluster

  2. Kubernetes Deployment Requirements

    Check the basic requirements needed for deployment

  3. Deploying Prerequisites

    Configure the pre-requisite elements of the deployment, such as Kafka, Humio Operator and block storage.

  4. Basic Architecture Configuration

    Deploy a basic architecture, where each node within the cluster has an equal role.

  5. Advanced Architecture Configuration

    Alternatively, deploy a more advanced architecture, where there are different node quantities for the different aspects of the cluster, such as ingestion and HTTP.

  6. Deployment for High Availability

    Advice on deploying your cluster within for High Availability.

  7. Additional Considerations

    Additional considerations and options for deployment.

Humio Operator Overview

The Humio Operator is a Kubernetes operator to automate the provisioning, management, and operations of a LogScale cluster deployed to a Kubernetes environment. The Humio Operator provides Custom Resource Definitions (CRD), for the LogScale cluster. The Humio Operator uses the resource definition to create pods and configure the LogScale cluster, it does not create StatefulSet or DaemonSet resources.

The Humio Operator does not manage the infrastructure in the deployed environment. The operator manages the lifecycle of the LogScale cluster and relies on the built in primitives in any Kubernetes environment to facilitate the provisioning of resources such as load balancers in the environment.

Cluster topology

LogScale clusters are made up of different multiple logical components. It is possible to run nodes for a specific purpose, but it is also possible to run nodes with several responsibilities. The overall logical cluster topology looks like this:

Kubernetes Deployment Cluster Topology

Figure 8. Kubernetes Deployment Cluster Topology

The individual components are:

  • Ingest

    Ingest processes receive requests from Log Shippers that contain events through several supported ingest APIs. Events are parsed using system or user-defined Parsers and are subsequently placed in a Kafka queue for further processing by the digest processes.

  • Digest

    Digest processes read events from the Kafka ingest queue and build data files called segments. Queries for recent data in segment files are handled by the digest nodes. This includes data pushed to LogScale's live queries which are queries that are continuously running and aggregating data. Once segment files are completed they are placed in bucket storage, if configured, and future queries are serviced by the storage processes.

  • Storage

    Storage processes store segment files and process queries for the segment files for which they are assigned. Older segments that may no longer reside on a storage instance will download the segment from bucket storage if configured. For most cases, it is recommended that digest nodes are configured to also be storage nodes.

  • Query Coordination

    Query coordination processes receive queries from users, dashboards, alerts, and scheduled searches and create a query plan that sends internal queries to the digest and storage processes that own the segment files required for the query. These do not need to be reachable via the load balancer, but can be reached via the UI/API nodes.

  • UI and API

    UI and API processes handle requests from client using a browsers or clients making API requests against the LogScale cluster.

  • Kafka and Zookeeper

    Kafka is used by LogScale as an internal cluster communication mechanism and as a queue for ingested events. Current and older versions of Kafka require Zookeeper. Future versions of Kafka will no longer require Zookeeper.

  • Bucket Storage

    Bucket storage relies on a compatible object storage system such as Google Cloud Storage, MinIO, S3, or an S3 compatible API (support may be limited). When using bucket storage the segment files that are completed by the digest processes will upload the segment files to object storage. This feature allows redundancy for the segment files in the face of server failure in addition to allowing the querying of segment files that are no longer stored on storage process nodes due to age or storage capacity.

    The bucket storage functionality assumes it can read objects, so it is not compatible with write-only object storage systems. Any segment file that is being uploaded to bucket storage will be encrypted and decrypted on the LogScale nodes, even if the bucket storage system contains a built in encryption feature. More details on bucket storage can be found here: Bucket Storage.