How-To: Install Kubernetes Reference Architecture

Important

This content is deprecated from LogScale version 1.222.0. For the latest material refer to the Kubernetes Reference Architecture docs.

Deploying LogScale using Kubernetes and Humio Operator is the recommended deployment model. The reference architecture and deployment model for this type of installation is outlined below.

Note

When following this guide, advanced Kubernetes skills are a prerequisite. This document touches on many different aspects that are assumed to be known by a person running stateful software within a Kubernetes cluster.

To use this guide when deploying LogScale, follow these sections:

  1. Cluster topology

    Understand the basic topology and components of the LogScale cluster

  2. Kubernetes Deployment Requirements

    Check the basic requirements needed for deployment

  3. Deploying Prerequisites

    Configure the pre-requisite elements of the deployment, such as Kafka, Humio Operator and block storage.

  4. Basic Architecture Configuration

    Deploy a basic architecture, where each node within the cluster has an equal role.

  5. Advanced Architecture Configuration

    Alternatively, deploy a more advanced architecture, where there are different node quantities for the different aspects of the cluster, such as ingestion and HTTP.

  6. Deployment for High Availability

    Advice on deploying your cluster within for High Availability.

  7. Additional Considerations

    Additional considerations and options for deployment.

Humio Operator Overview

The Humio Operator serves as a Kubernetes automation tool designed to handle the provisioning, management, and operations of LogScale clusters within Kubernetes environments through Custom Resource Definitions (CRDs). While managing the LogScale cluster lifecycle, the operator leverages native Kubernetes primitives for resource provisioning but does not directly manage infrastructure or create StatefulSet and DaemonSet resources.

The Humio Operator is a Kubernetes operator to automate the provisioning, management, and operations of a LogScale cluster deployed to a Kubernetes environment. The Humio Operator provides Custom Resource Definitions (CRD), for the LogScale cluster. The Humio Operator uses the resource definition to create pods and configure the LogScale cluster, it does not create StatefulSet or DaemonSet resources.

The Humio Operator does not manage the infrastructure in the deployed environment. The operator manages the lifecycle of the LogScale cluster and relies on the built in primitives in any Kubernetes environment to facilitate the provisioning of resources such as load balancers in the environment.

Cluster topology

The LogScale cluster topology consists of multiple logical components including Ingest for receiving data, Digest for processing events, Storage for managing segment files, Query Coordination for handling search requests, UI/API nodes for user interaction, Kafka for internal communication, and Bucket Storage for object storage capabilities. Each component plays a specific role in the cluster architecture, with nodes capable of handling single or multiple responsibilities to create a robust and scalable log management system.

LogScale clusters are made up of different multiple logical components. It is possible to run nodes for a specific purpose, but it is also possible to run nodes with several responsibilities. The overall logical cluster topology looks like this:

Kubernetes Deployment Cluster Topology

Figure 12. Kubernetes Deployment Cluster Topology


The individual components are:

  • Ingest

    Ingest processes receive requests from Third-Party Log Shippers that contain events through several supported ingest APIs. Events are parsed using system or user-defined Parse Data and are subsequently placed in a Kafka queue for further processing by the digest processes.

  • Digest

    Digest processes read events from the Kafka ingest queue and build data files called segments. Queries for recent data in segment files are handled by the digest nodes. This includes data pushed to LogScale's live queries which are queries that are continuously running and aggregating data. Once segment files are completed they are placed in bucket storage, if configured, and future queries are serviced by the storage processes.

  • Storage

    Storage processes store segment files and process queries for the segment files for which they are assigned. Older segments that may no longer reside on a storage instance will download the segment from bucket storage if configured. For most cases, it is recommended that digest nodes are configured to also be storage nodes.

  • Query Coordination

    Query coordination processes receive queries from users, dashboards, alerts, and scheduled searches and create a query plan that sends internal queries to the digest and storage processes that own the segment files required for the query. These do not need to be reachable via the load balancer, but can be reached via the UI/API nodes.

  • UI and API

    UI and API processes handle requests from client using a browsers or clients making API requests against the LogScale cluster.

  • Kafka

    Kafka is used by LogScale as an internal cluster communication mechanism and as a queue for ingested events.

  • Bucket Storage

    Bucket storage relies on a compatible object storage system such as Google Cloud Storage, MinIO, S3, or an S3 compatible API (support may be limited). When using bucket storage the segment files that are completed by the digest processes will upload the segment files to object storage. This feature allows redundancy for the segment files in the face of server failure in addition to allowing the querying of segment files that are no longer stored on storage process nodes due to age or storage capacity.

    The bucket storage functionality assumes it can read objects, so it is not compatible with write-only object storage systems. Any segment file that is being uploaded to bucket storage will be encrypted and decrypted on the LogScale nodes, even if the bucket storage system contains a built in encryption feature. More details on bucket storage can be found here: Bucket Storage.

Instance Sizing

Assumptions:

  • 30 Day Retention on NVME (gen4)

  • 20% Overhead left on NVME (gen4)

  • 10x Compression

  • Supported Object Storage provider

  • Kafka 5x Compression - 24 Hour Storage

  • LogScale requires a kafka cluster most commonly provided by the Strimzi Kubernetes operator or Amazon MSK (required up to LogScale 1.107.0).

Kafka clusters are separate from LogScale clusters to avoid resource contention and allow independent management.

Important

Any deployed LogScale instance should not be sharing resources with any other services and it should be a dedicated node to avoid resource contention. LogScale assumes that all spare memory is available for use by LogScale.

X-Small - 1 TB/Day Ingestion
Software Instances vCPU Memory Storage Total Storage
LogScale 3 16 64 GB NVME 2 TB 6 TB
Kafka 3 4 8 GB SSD 500 GB 1.5 TB
ZooKeeper (up to LogScale 1.107) 3 4 8 GB SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16 GB SSD 64 GB 150 GB
Small - 3 TB/Day Ingestion
Software Instances vCPU Memory Storage Total Storage
LogScale 3 32 128 GB NVME 6 TB 18 TB
Kafka 3 4 8 GB SSD 1 TB 3 TB
ZooKeeper (up to LogScale 1.107) 3 4 8 GB SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16 GB SSD 64 GB 150 GB
Medium - 5 TB/Day Ingestion
Software Instances vCPU Memory Storage Total Storage
LogScale 6 32 128 GB NVME 6 TB 36 TB
Kafka 3 8 16 GB SSD 1 TB 3 TB
ZooKeeper (up to LogScale 1.107) 3 4 8 GB SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16 GB SSD 64 GB 150 GB
Large - 10 TB/Day Ingestion
Software Instances vCPU Memory Storage Total Storage
LogScale 12 32 128 GB NVME 6 TB 72 TB
Kafka 6 8 16 GB SSD 1 TB 6 TB
ZooKeeper (up to LogScale 1.107) 3 4 8 GB SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16 GB SSD 64 GB 150 GB
X-Large - 30 TB/Day Ingestion
Software Instances vCPU Memory Storage Total Storage
LogScale 30 32 128 GB NVME 7 TB 210 TB
Kafka 6 8 16 GB SSD 1.5 TB 13.5 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16 GB SSD 64 GB 150 GB

Kubernetes Deployment Requirements

The documentation outlines essential requirements for deploying LogScale in a Kubernetes environment, covering key components like bucket storage, Kafka integration, TLS configuration, and specific worker node configurations. The requirements are organized into three main areas: LogScale-specific requirements including storage and dependencies, Kubernetes environment needs such as DNS and ingress controllers, and worker node specifications focusing on storage configurations and zone awareness.

There are certain requirements that must be met before deploying LogScale:

LogScale Requirements

Bucket Storage

When using LogScale in production it is recommended that the instances running LogScale have local NVMe storage. Depending on the environment in which LogScale is being deployed these disks may be ephemeral, such as is the case with AWS's instance-store instances or Google's local SSD's. When utilizing ephemeral instances bucket storage is required for a production environment as it acts as the persistent storage for the cluster.

Kubernetes

The minimum supported Kubernetes version supported by the Humio Operator can be found at Humio Operator Version Matrix

Any Kubernetes platform is supported but implementation and usage of some features may be different.

Kafka

LogScale relies on Kafka as a fault tolerant event bus and internal cluster communication system. The minimum supported version of Kafka can be found here: Kafka Version

In general, we recommend using the latest Kafka version possible on the given environment.

TLS

By default the Humio Operator utilizes cert-manager to create an internal certificate authority for use by the LogScale cluster. In addition, support for provisioning certificates for external connectivity can be used in conjunction with cert-manager's external issuer support. If LogScale is configured to expose its APIs using HTTPS, which is the default, LogScale assumes Kafka connectivity will also use TLS; this is configurable. In some environments that employ service meshes that implement TLS or mTLS, TLS support can be disabled completely.

Kubernetes Environment Requirements

The components of a Kubernetes cluster will vary depending on the environment but this generally includes building blocks such as DNS, ingress, ingress controllers, networking, and storage classes. Depending on the different components of the Kubernetes cluster, some implementations details may vary and care must be taken during implementation. For example if the environment is using a service mesh that provides TLS connectivity for pods and services, the TLS provisioning feature built into the operator should be disabled.

For a production cluster, an ingress controller that has a routing rule to the humio service is required. The TLS configuration for this controller and for the Humio service will differ if the environment is utilizing an internal PKI implementation or using the built in automation the operator provides with cert-manager.

LogScale is a highly multi-threaded piece of software, and we recommend running a single cluster pod per underlying Kubernetes worker node. This can be achieved using Kubernetes pod anti-affinity. The reason for running one cluster pod per Kubernetes worker node is to let LogScale prioritize what threads gets priority, and to limit disk access so only one cluster pod is able to access the filesystem of a cluster pod. Multiple cluster pods must not use the same data directory, and ensuring they run on separate machines helps achieve that.

Kubernetes Worker Requirements

All Kubernetes worker nodes should have labels set for the zones they are located in. This should be done using the worker node label topology.kubernetes.io/zone. With this label available on the underlying worker nodes, it means that the LogScale cluster pods will become zone aware and use the zone information e.g. distribute/replicate data. It is important to consider the best strategy for placing worker nodes across zones so the LogScale cluster pods gets scheduled to worker nodes uniformly across multiple availability zones, data centers, or racks.

There's two overall paths for configuring disks for LogScale. One is a combination of ephemeral storage and bucket storage, and the other is to use network attached block storage. We recommend using ephemeral disks and bucket storage for production clusters if bucket storage is available. The operator does not partition, format, and mount the local storage on the worker instance, here is an example of using AWS's user-data script to accomplish this. The recommendation is to not mix instance types/sizes within the same pool of LogScale nodes that sharing the same configuration. It is not recommended to use spot instances as some cloud providers offer.

Ephemeral disk & bucket storage

Worker nodes running LogScale cluster pods must be configured with fast local storage, such as NVMe drives. If there are multiple NVMe drives attached to the machine, they can be combined using RAID-0. Preparing disks for use can be done using features like user data. It is important to configure LogScale to know that the disks are ephemeral by ensuring the environment variables include USING_EPHEMERAL_DISKS=true. This configuration makes LogScale able to make better and safer decisions managing data in the cluster. We can use hostPath mounts to access the locally-attached fast storage from the cluster pods, or potentially any other Kubernetes storage provider which grants direct unrestricted access to the underlying fast storage.

Local PVCs are supported by the Humio operator. This alternative to hostPath does not require the formatting and raiding of multiple disks but does require initial setup that is dependent on the Kubernetes environment.

It is technically possible to run with emptyDir volume types on the cluster pods, but that is not recommended since the lifecycle of emptyDir volumes follows the lifecycle of the pods. Upgrading LogScale clusters or changing configurations, will replace cluster pods which would wipe the data directories for LogScale cluster pods every time, which is not desired. The cluster should try and reuse ephemeral disks as much as possible, even if data is technically safe elsewhere, since performance will take a huge hit every upgrade/restart if data directories are not reused because LogScale would need to fetch everything from bucket storage again.

Network block storage

Using this type of underlying storage for LogScale clusters is generally much slower and more expensive compared to the alternative above, i.e. using the combination of ephemeral disks and bucket storage. The typical use of network block storage for Kubernetes pods will dynamically create disks and attach them to the Kubernetes worker nodes as needed. Network block storage should not be used for I/O heavy LogScale nodes, like storage and digest. However, it can be used for ingest, query coordinators and UI/API nodes.

Deploying Prerequisites

The Humio Operator and Kubernetes deployment process requires several key prerequisites that must be installed and configured before beginning. These essential components include Kafka prerequisites, operator custom resources, basic security and resource configuration, ingress configuration, and bucket storage setup.

Before starting deployment using Humio Operator and Kubernetes, the follow pre-requisite components should be installed and configured:

Kafka Prerequisites

LogScale requires low latency access to a Kafka cluster to operate optimally, you must have an available Kafka cluster before deploying LogScale.

Non-kafka systems that are similar are not supported, for example Google PubSub or Azure EventHub are not supported.

Sub 50 millisecond ping times from the LogScale pods to the Kafka cluster will ensure data is ingested quickly and be available for search in less than a second.

In its default configuration LogScale will automatically create the Kafka topics and partitions required to operate. This functionality can be disabled by setting the KAFKA_MANAGED_BY_HUMIO value to false. Note that when enabled, LogScale will set up the ingest queue, global, and chatter topics in Kafka using reasonable default settings, but does not edit existing topics to conform to these defaults.

Important

Admins setting up new clusters should ensure the target Kafka cluster is completely booted before booting LogScale the first time, as LogScale will refuse to boot if it needs to create a topic, and there are too few brokers in the Kafka cluster to hit the configured replication factor.

Running Kafka on Kubernetes can be accomplished a variety of ways. The Strimzi Kafka Operator is one such way and uses the same operator patterns that the Humio Operator uses to manage the life-cycles of both Kafka and KRaft nodes. In production setups, LogScale, Kafka, and KRaft should be run on separate worker nodes in the cluster. Both Kafka and KRaft must use persistent volumes provided by the Kubernetes environment, and they should not use ephemeral disks in a production deployment. Kafka brokers and KRaft instances should be deployed on worker nodes in different locations such as racks, data centers, or availability zone to ensure reliability. Kubernetes operators such, as the Strimzi operator, do not create worker nodes and label them, that task is left to the administrators.

For the latest configurations see the LogScale GitHub repository.

For more general information on configuring Strimzi please see their documentation.

Operator Custom Resources

The operator overall supports two types of custom resources:

  • The first type makes it possible to manage a LogScale cluster by leveraging the CRD HumioCluster. This will spin up and manage the cluster pods that makes up a LogScale cluster and the related Kubernetes resources to run the LogScale cluster.

  • The second group of custom resources are focused on creating LogScale specific resources within LogScale clusters by using the LogScale API to manage them. At the time of writing, the humio-operator project supports the following CRDs for managing LogScale functionality:

    • HumioAction

    • HumioAlert

    • HumioIngestToken

    • HumioParser

    • HumioRepository

    • HumioView

Within the Kubernetes ecosystem it is very common to follow GitOps-style workflows. To help bridge the gap, even for clusters that aren't managed by the humio-operator, there is a HumioExternalCluster CRD. This CRD can be configured with a URL and token to any LogScale cluster, that you may want to manage e.g. alerts using "HumioAlert" resources on some cluster where the cluster pods themselves are not managed by the humio-operator. Since this uses the regular LogScale APIs, this can also be used by customers to manage resources on the LogScale cloud.

Basic Security and Resource Configuration

The following example provides the configuration for a basic cluster on AWS using ephemeral disks and bucket storage. Access to S3 is handled using IRSA (IAM roles for service accounts) and SSO is handled through a Google Workspace SAML integration (link). This example assumes IRSA and the Google Workspace are configured and can be provided in the configuration below.

Before the cluster is created the operator must be deployed to the Kubernetes cluster and three secrets created. For information regarding the installation of the operator please refer to the Install Humio Operator on Kubernetes.

Prerequisite secrets:

  • Bucket storage encryption key — Used for encrypting and decrypting all files stored using bucket storage.

  • SAML IDP certificate — Used to verify integrity during SAML SSO logins.

  • LogScale license key — Installed by the humio-operator during cluster creation. To update the license key for a LogScale cluster managed by humio-operator, this Kubernetes secret must be updated to the new license key. If updates to the license key is performed within the LogScale UI, it will be reverted to the license in this Kubernetes secret.

In practice it looks like this:

shell
kubectl create secret --namespace example-clusters generic \
  basic-cluster-1-bucket-storage --from-literal=encryption-key=$(openssl rand -base64 64)
kubectl create secret --namespace example-clusters generic \
  basic-cluster-1-idp-certificate --from-file=idp-certificate.pem=./my-idp-certificate.pem
kubectl create secret --namespace example-clusters generic \
  basic-cluster-1-license --from-literal=data=licenseString

Once the secrets are created the following cluster specification can be applied to the cluster, for details on applying the specification see the operator resources Creating the Resource.

Once applied the HumioCluster resource is created along with many other resources, some of which depend on cert-manager. In the basic cluster example a single node pool is created with three pods performing all tasks.

The overall structure of the Kubernetes resources within a LogScale deployment looks like this:

Kubernetes Installation Cluster Definition

Figure 13. Kubernetes Installation Cluster Definition


Any configuration setting for LogScale can be used in the cluster specification. For additional configuration options please see the Configuration Variables.

Ingress Configuration

The humio-operator contains one built-in ingress implementation, which relies on ingress-nginx to expose the cluster to the outside of the Kubernetes cluster. The built-in support for ingress-nginx should be seen mostly as a good starting point and source of inspiration, if it does not match certain requirements, it is possible to point alternative ingress controllers to the "Service" resource(s) pointing to the cluster pods. The built-in support for ingress-nginx only works if there is a single node pool with all nodes performing all tasks.

In most managed Kubernetes environment ingress has been integrated with the providers environment to orchestrate the creation of load balancers to load balance traffic to the LogScale service. In AWS when using the AWS Load Balancer Controller add-on (installation documentation) the following ingress object can be created to load balance external traffic to the basic-cluster service that was shown previously.

Basic Traefik IngressRoute example

yaml
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: logging
  name: basic-cluster-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/healthcheck-path: /api/v1/status
    alb.ingress.kubernetes.io/backend-protocol: HTTPS
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:xxxxxx:999999999999:certificate/xxxxxxxxx
 
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: basic-cluster-1
              port:
                number: 8080

Pods running ingress can be scaled dynamically using the Kubernetes Horizontal Pod Autoscaler

Configuring Bucket Storage

Utilizing bucket storage enables the LogScale to operate on ephemeral disks. When an instance is replaced due to maintenance or hardware failure data will be pulled from object storage when a new instance replaces the old one. LogScale natively supports Amazon Bucket Storage and Google Cloud Bucket Storage. S3 compatible systems such as MinIO are supported.

When LogScale determines segment files should be deleted, either by hitting retention settings or someone deleting a repository, the segments get marked for deletion, turning them into "tombstones". While they are marked for deletion, any local copies stored on the LogScale nodes will be deleted, but the copy stored in bucket storage will be kept around for a while before it gets deleted. By default, segments are kept for 7 days in bucket storage before the segment files gets deleted from bucket storage. This means it is possible to undo the deletion of segments as long as it is within 7 days of the segments being marked for deletion.

Configuring any type of quota on the bucket storage system is not recommended. LogScale assumes bucket storage will always be able to keep all the files it is trying to store in it.

Basic Architecture Configuration

The basic architecture configuration for Kubernetes deployment offers a straightforward approach where all tasks run on all pods, providing cost efficiency and simplicity but limiting individual component scalability. The documentation outlines the advantages and disadvantages of this setup, accompanied by a visual representation of the basic architecture and important notes about network communication across zones for LogScale and Kafka pods.

The basic cluster specification above when deployed would run all tasks on all pods.

Pros:

  • Easy to get started, since we only need to specify configurations for one set of cluster pods (aka node pool)

  • Cheap, since it may be more efficient if we don't need to run many separate pods to achieve the same amount of work

Cons:

  • Can't scale individual logical LogScale components independently

Kubernetes Installation Basic Architecture

Figure 14. Kubernetes Installation Basic Architecture


Note

Network communication isn't fully shown. There is communication across zones for LogScale and Kafka pods.

Advanced Architecture Configuration

The Advanced Architecture Configuration section explains how to configure cluster nodes with dedicated responsibilities, enabling independent scaling and targeted update strategies for specific pod sets in Kubernetes worker nodes. This approach offers cost benefits through selective resource allocation, though it requires additional configuration steps, as demonstrated through a detailed YAML configuration example for setting up an advanced Humio cluster deployment.

We can split out responsibilities for cluster nodes, such that each responsibility has their own dedicated set of cluster pods. This makes it possible to define update strategies for sets of cluster pods that serve a single purpose, or affinity rules that schedules the pods on a specific set of Kubernetes worker nodes.

Pros:

  • Easy to scale individual logical components independently

  • Cost benefits of scaling individual logical components independently, e.g. scaling nodes handling only ingest on cheap Kubernetes workers.

Cons:

  • Requires additional configuration

Kubernetes Deployment Advanced Cluster Definition

Figure 15. Kubernetes Deployment Advanced Cluster Definition


Kubernetes Deployment Limits

The documentation explains the importance of setting appropriate resource limits for LogScale pods in Kubernetes deployments, emphasizing the need to reserve CPU and memory capacity for operating system functions. Proper limit configuration in production environments ensures LogScale pods are allocated to worker nodes with sufficient resources, preventing competition with other applications and maintaining operational efficiency.

The limits and resources set on the LogScale pods should be appropriate for the size of the instance leaving 2-4 CPUs and 4GB of memory for the OS in the case of LogScale and Kafka.

When running in a production environment setting limits is important to ensure the LogScale pods run on instances where they will have enough resources to operate efficiently and not compete with other applications on the worker node.

Deployment for High Availability

The documentation explains how to achieve high availability in Kubernetes deployments by utilizing features like pod topology spread constraints, pod affinity/antiAffinity, and taints/tolerations to distribute LogScale pods uniformly across availability zones. The setup leverages init containers to determine zone information, enabling LogScale to make informed decisions about digest partitions and segment placement.

When constructing the HumioCluster resource, it is possible to leverage Kubernetes features like pod topology spread constraints, pod affinity and antiAffinity and taints and tolerations. These things can be configured on a per node pool level. These should be configured in a way so that pods are placed on worker nodes where pods are spread out uniformly across availability zones. So if we have a node pool with nodeCount 9 and worker nodes in 3 availability zones, the goal should be to place 3 pods in each availability zone.

%%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% graph TB; subgraph zone1 node1 node2 node3 end subgraph zone2 node4 node5 node6 end subgraph zone3 node7 node8 node9 end

It is possible to configure a node pool per availability zone, and use affinity and/or tolerations so pods for a specific node pool are placed on worker nodes in a specific availability zone. However, in most cases it is easier to have one node pool scheduled to multiple availability zones. To make sure LogScale pods knows the correct availability zone, we leverage an init container in the cluster pods. This init container gets the Kubernetes worker node name and looks up the availability zone using well-known labels on the worker node resource in the Kubernetes cluster. The zone information collected by the init container gets automatically passed to the LogScale pod by configuring the ZONE configuration option for the LogScale pod.

With zone information available to LogScale, it means LogScale will use this zone when making decisions, e.g. how to configure digest partitions and where to place segments.

Disaster Recovery

LogScale's disaster recovery capabilities enable operators to create new clusters using replicated bucket storage data from another location, where internal metadata and segment files are continuously written. The recovery process involves starting a single instance with specific configuration settings to enable recovery and rewrite internal metadata from the replicated storage.

LogScale's disaster recovery abilities rely on the persisted data in bucket storage being replicated to another location. Internal metadata (the global database) and segment files are continuously written to bucket storage and replicating the object storage to the other location allows operators to create a new cluster using the replicated bucket as the source of data for the cluster. This process requires starting one instance with several configuration options set to enable the recovery and rewriting of internal metadata. See Start a new LogScale cluster based on another with buckets for more information.

Additional Considerations

The documentation covers additional considerations for LogScale deployment in Kubernetes environments, including how to configure Prometheus metrics endpoints, service mesh integration, and logging capabilities through Helm charts. Key topics address pod annotations, TLS connectivity, container log shipping, horizontal pod autoscaling limitations, and provides a detailed example of Traefik ingress controller configuration.

Prometheus Endpoint

It is possible to configure LogScale to expose an endpoint that provides LogScale metrics in a format that is supported by Prometheus, a popular metrics solution in the Kubernetes ecosystem. The most common method is to make sure pods expose the metrics on a defined port, and then configure Prometheus to automatically discover pods that have explicitly marked a certain port to be scraped. If we use the basic cluster example, this is how we would do achieve that:

  • Add PROMETHEUS_METRICS_PORT to the "environmentVariables" list in the HumioCluster resources:

    yaml
    apiVersion: core.humio.com/v1alpha1
    kind: HumioCluster
    ...
    spec:
    ...
      environmentVariables:
        - name: PROMETHEUS_METRICS_PORT
          value: "8401"
    ...
  • Add pod annotations to HumioCluster pods, which Prometheus can discover and automatically start scraping the metrics endpoints on each pod:

yaml
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
...
spec:
...
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8401"
...
Service mesh

If LogScale cluster pods are added to a service mesh, there are a couple of items worth highlighting.

  • If the service mesh already provides mutual TLS, it is recommended to disable the TLS connectivity on the LogScale side and rely on the service mesh to handle it. When the service mesh handles TLS, it means any built-in observability features of the service mesh works.

  • If the service mesh relies on injecting a proxy as an additional container/sidecar to the pods, it is important to ensure network connectivity works during the entire shutdown sequence for LogScale cluster pods. If the service mesh proxy starts shutting down before LogScale is done shutting down, it may impact LogScale's ability to handle data in a safe manner. This includes connectivity to Kafka, bucket storage and any other important components.

Logging humio helm chart

As shown above, the humio-operator project takes care of LogScale cluster management, but does not solve the task of shipping logs for Kubernetes containers to a LogScale cluster. To solve that, we have a separate Helm chart located in https://github.com/humio/humio-helm-charts/ which installs a log shipper and sends container logs to the specified LogScale cluster.

Horizontal Pod Autoscaler

The built in Kubernetes resource type HorizontalPodAutoscaler is not supported.

Traefik Ingress Controller Example

An example for a more complex ingress controller.

yaml
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: basic-cluster-1-externally-trusted-certificate
  namespace: example-clusters
spec:
  commonName: basic-cluster-1.logscale.local
  secretName: basic-cluster-1-externally-trusted-certificate
  dnsNames:
    - basic-cluster-1.logscale.local
  issuerRef:
    name: letsencrypt-traefik-prod
    kind: ClusterIssuer
---
apiVersion: traefik.containo.us/v1alpha1
kind: ServersTransport
metadata:
  name: basic-cluster-1-transportconfig
  namespace: example-clusters
spec:
  disableHTTP2: true
  insecureSkipVerify: false
  rootCAsSecrets:
  - basic-cluster-1
  serverName: basic-cluster-1.example-clusters
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: basic-cluster-1
  namespace: example-clusters
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`basic-cluster-1.logscale.local`)
      kind: Rule
      services:
        - kind: Service
          name: basic-cluster-1
          port: 8080
          scheme: https
          serversTransport: basic-cluster-1-transportconfig
  tls:
    secretName: basic-cluster-1-externally-trusted-certificate

Scaling a HumioCluster Up or Down

The documentation explains how to scale a HumioCluster up or down in a Kubernetes environment, with scaling up achieved by simply increasing the nodeCount value while scaling down requires a more involved process. The scale-down procedure includes specific steps for handling different node types, marking nodes for eviction, and removing them from the cluster, with slightly different approaches required depending on whether the nodes store segments and the LogScale version being used.

Scaling up a HumioCluster

To scale up a HumioCluster, increase the nodeCount value of the HumioCluster node pool, and the humio-operator will create the additional pods.

Scaling down a HumioCluster

At the time of writing, scaling down a HumioCluster is more involved as there's no built in support for carrying this out. Lowering nodeCount value by itself will not immediately start evicting/removing pods.

Overall there's a few different strategies depending on the type/role of node/pod we want to remove.

  1. Lower nodeCount. For simplicity's sake, keep all other configs/versions unchanged during scale down. We assume only one node pool is being scaled down at a time, so repeat this entire process for each node pool that needs to be scaled down.

  2. Go to LogScale cluster administration UI, under Cluster nodes and click the button for the Mark for eviction action on the nodes we want to remove.

    1. Remember to include zone information in the considerations around picking which nodes to mark for eviction.

  3. Wait until node is done with eviction.

  4. If we want to remove a pod that does not store segments: This is nodes that are neither a storage node, nor digest node. This is typically nodes that primarily serve API calls, UI components, query coordination, ingest and such.

    1. Manually use kubectl delete pod to delete the pods for the nodes we marked for eviction.

      1. After the pods are gone, we have a couple of choices depending on the LogScale version:

        1. LogScale 1.82+:

          1. Either: Wait a couple of hours and LogScale will automatically remove dead nodes from the cluster.

          2. OR: Go to the LogScale cluster administration UI, under Cluster nodes and click the button for the Remove node action.

        2. LogScale <1.82:

          1. Go to the LogScale cluster administration UI, under Cluster nodes and click the button for the Remove node action.

  5. If we want to remove a pod that stores segments: This would be nodes doing storage or digest.

    1. Manually use kubectl delete pod to delete the pods for the nodes we marked for eviction.

    2. When the pod is gone, and cluster sees the node as down: Go to the LogScale cluster administration UI, under Cluster nodes and click the button for the Remove node action.