Operations Guide

This guide covers setting up, verifying, and operating disaster recovery (DR) for LogScale on Google Kubernetes Engine (GKE). It assumes familiarity with Terraform, GKE, and the LogScale platform.

The DR architecture uses an active-standby model across two GCP regions:

  • The primary cluster handles all production traffic (ingest, search, UI).

  • The secondary cluster runs a minimal-footprint standby: Kafka brokers, cert-manager, and the humio-operator deployment (scaled to 0 replicas). No LogScale pods run on standby โ€” the operator being at zero replicas prevents any pod creation regardless of the HumioCluster CR spec.

  • Failover can be triggered three ways:

    1. Beta feature Automated (GLB + Cloud Function) โ€” Uptime check detects primary failure, Cloud Function scales operator and flips GLB capacity.

    2. Beta feature Manual (GLB) โ€” Operator scaled to 0 on primary, GLB capacity_scaler flipped manually via gcloud or Terraform.

    3. Beta feature DNS WRR routing โ€” Cloud DNS weighted round-robin with manual weight change.

  • Encryption keys are synchronized to the secondary cluster via Terraform remote state references.

  • The secondary cluster has read-only cross-region access to the primary's GCS bucket for snapshot recovery. This applies when primary bucket data is not replicated to the secondary bucket (e.g., via GCS Transfer Service or dual-region storage). If replication is configured, set GCP_RECOVER_FROM_REPLACE_BUCKET to rewrite segment paths to the local copy instead. Cross-region read path and replicated bucket path are Beta features.

  • Promotion from standby to active uses a two-phase pool routing switch to avoid traffic blackhole during service selector changes. This does NOT guarantee zero data loss โ€” events in flight during the failover window (30-60s GLB detection + CF trigger delay) may be lost. RPO depends on client-side retry and buffering capabilities.

Cluster Types
Type Node Pools Use Case
basic Digest + Kafka Dev/test, small workloads
dedicated-ui Digest + UI + Kafka Separate UI serving from query processing
advanced Digest + UI + Ingest + Kafka Full production topology with dedicated ingest
Cluster Sizes
Size Digest Nodes Machine Type Estimated Daily Ingest
xsmall 3 n2-highmem-16 Up to 1 TB/day
small 9 n2-highmem-16 1-5 TB/day
medium 21 n2-highmem-32 5-20 TB/day
large 42 n2-highmem-32 20-50 TB/day
xlarge 78 n2-highmem-64 50+ TB/day
Network Access
Setting API Access
kubernetes_private_cluster_enabled = true Internal networks only
kubernetes_private_cluster_enabled = false + ip_ranges_allowed_to_kubeapi = [] GCP public CIDRs only
kubernetes_private_cluster_enabled = false + ip_ranges_allowed_to_kubeapi = ["0.0.0.0/0"] Unrestricted

See Google's network isolation guide for recommended configurations.

Quick Start
terraform
module "logscale" {
  source = "."

  project_id             = "my-project"
  region                 = "us-central1"
  zone                   = "us-central1-a"
  logscale_cluster_type  = "basic"
  logscale_cluster_size  = "xsmall"
  public_url             = "logscale.example.com"

  # Versions (minimum supported)
  humio_operator_chart_version = "0.29.2"
  humio_operator_version       = "0.29.2"
  logscale_image_version       = "1.207.0"
  strimzi_operator_version     = "0.45.0"
}