Operations Guide
This guide covers setting up, verifying, and operating disaster recovery (DR) for LogScale on Google Kubernetes Engine (GKE). It assumes familiarity with Terraform, GKE, and the LogScale platform.
The DR architecture uses an active-standby model across two GCP regions:
The primary cluster handles all production traffic (ingest, search, UI).
The secondary cluster runs a minimal-footprint standby: Kafka brokers, cert-manager, and the humio-operator deployment (scaled to 0 replicas). No LogScale pods run on standby โ the operator being at zero replicas prevents any pod creation regardless of the HumioCluster CR spec.
Failover can be triggered three ways:
Beta feature Automated (GLB + Cloud Function) โ Uptime check detects primary failure, Cloud Function scales operator and flips GLB capacity.
Beta feature Manual (GLB) โ Operator scaled to 0 on primary, GLB capacity_scaler flipped manually via gcloud or Terraform.
Beta feature DNS WRR routing โ Cloud DNS weighted round-robin with manual weight change.
Encryption keys are synchronized to the secondary cluster via Terraform remote state references.
The secondary cluster has read-only cross-region access to the primary's GCS bucket for snapshot recovery. This applies when primary bucket data is not replicated to the secondary bucket (e.g., via GCS Transfer Service or dual-region storage). If replication is configured, set
GCP_RECOVER_FROM_REPLACE_BUCKETto rewrite segment paths to the local copy instead. Cross-region read path and replicated bucket path are Beta features.Promotion from standby to active uses a two-phase pool routing switch to avoid traffic blackhole during service selector changes. This does NOT guarantee zero data loss โ events in flight during the failover window (30-60s GLB detection + CF trigger delay) may be lost. RPO depends on client-side retry and buffering capabilities.
Cluster Types
| Type | Node Pools | Use Case |
|---|---|---|
| basic | Digest + Kafka | Dev/test, small workloads |
| dedicated-ui | Digest + UI + Kafka | Separate UI serving from query processing |
| advanced | Digest + UI + Ingest + Kafka | Full production topology with dedicated ingest |
Cluster Sizes
| Size | Digest Nodes | Machine Type | Estimated Daily Ingest |
|---|---|---|---|
| xsmall | 3 | n2-highmem-16 | Up to 1 TB/day |
| small | 9 | n2-highmem-16 | 1-5 TB/day |
| medium | 21 | n2-highmem-32 | 5-20 TB/day |
| large | 42 | n2-highmem-32 | 20-50 TB/day |
| xlarge | 78 | n2-highmem-64 | 50+ TB/day |
Network Access
| Setting | API Access |
|---|---|
kubernetes_private_cluster_enabled = true
| Internal networks only |
kubernetes_private_cluster_enabled = false +
ip_ranges_allowed_to_kubeapi = []
| GCP public CIDRs only |
| kubernetes_private_cluster_enabled = false + ip_ranges_allowed_to_kubeapi = ["0.0.0.0/0"] | Unrestricted |
See Google's network isolation guide for recommended configurations.
Quick Start
module "logscale" {
source = "."
project_id = "my-project"
region = "us-central1"
zone = "us-central1-a"
logscale_cluster_type = "basic"
logscale_cluster_size = "xsmall"
public_url = "logscale.example.com"
# Versions (minimum supported)
humio_operator_chart_version = "0.29.2"
humio_operator_version = "0.29.2"
logscale_image_version = "1.207.0"
strimzi_operator_version = "0.45.0"
}