Traffic Routing During Failover

The traffic routing uses GLB capacity_scaler-based routing and DR-aware label selectors to manage traffic during both normal operations and failover.

Failover Traffic Flow:

GCP DR - Traffic Failover

Normal Traffic Flow:

GCP DR - Traffic Normal
DNS Resolution Chain
  1. Client requests: <global-hostname>.<zone-name>

  2. Cloud DNS: Resolves to Global Load Balancer IP (global anycast IP managed by Terraform)

  3. GLB performs health checks on both primary and secondary backends every 10 seconds (configurable via health_check_interval_sec)

  4. Routing logic: GLB routes based on capacity_scaler values โ€” primary at 1.0, secondary at 0.0. During failover, the Cloud Function flips these values via the GLB backend service.

Label Selectors Used

The traffic routing uses DR-aware label selectors on the NodePort service:

GCP DR - DNS Global
Label Values Purpose
app.kubernetes.io/name humio Selects all LogScale pods
humio.com/feature OperatorInternal Selects query-capable pods (excludes ingest-only)

Label selector behavior:

Scenario dr mode dr_use_dedicated_routing Selector Pods Selected
Normal primary operation active false app.kubernetes.io/name=humio, humio.com/feature=OperatorInternal All query-capable pods (digest, UI)
Standby (cold) standby false app.kubernetes.io/name=humio, humio.com/feature=OperatorInternal No pods (operator at 0 replicas)
Phase 1 promotion active false app.kubernetes.io/name=humio, humio.com/feature=OperatorInternal Query-capable pods as they come online
Phase 2 promotion active true Pool-specific node selectors (e.g., humio.com/node-pool=<prefix>) Pods on dedicated node pools
Why Two-Label Selector Excludes Ingest-Only Pods

When dr_use_dedicated_routing=false, the selector uses app.kubernetes.io/name=humio + humio.com/feature=OperatorInternal. Ingest-only pods (with NODE_ROLES=ingestonly) do not receive the humio.com/feature=OperatorInternal label from the humio-operator. This prevents ingest-only pods from receiving query/UI traffic โ€” they return "Unsupported operation. Ingest-Only node does not support the attempted operation."

humio-operator Label Definitions

The humio-operator adds labels to pods during deployment:

Pod Type Labels Added
Digest Pod app.kubernetes.io/name=humio, humio.com/feature=OperatorInternal, humio.com/node-pool=<prefix>
Ingest Pod app.kubernetes.io/name=humio, humio.com/node-pool=<prefix>-ingest-only (no OperatorInternal label)
UI Pod app.kubernetes.io/name=humio, humio.com/feature=OperatorInternal, humio.com/node-pool=<prefix>

During promotion, Phase 2 switches the NodePort service selector from the generic OperatorInternal selector to pool-specific humio.com/node-pool selectors, enabling traffic targeting to specific node pools.

GLB Health Check Configuration

The Global Load Balancer monitors both primary and secondary backends:

Parameter Value Source
Protocol HTTPS var.health_check_type (default)
Port 443 var.health_check_port (default)
Path /api/v1/status var.health_check_path (default)
Interval 10s var.health_check_interval_sec (default)
Timeout 5s var.health_check_timeout_sec (default)
Healthy Threshold 2 consecutive successes var.healthy_threshold (default)
Unhealthy Threshold 3 consecutive failures var.unhealthy_threshold (default)

Health Status Behavior:

  • Healthy (200 OK): GLB backend marked healthy, receives traffic based on capacity_scaler

  • Unhealthy: GLB backend marked unhealthy after 3 consecutive failures (~30s with 10s interval)

  • Capacity Scaler: Primary at 1.0 (full traffic), secondary at 0.0 (no traffic). Cloud Function flips these during failover.

Expected GLB Status During Normal Operations
Status Primary Secondary meaning
HEALTHY Yes capacity_scaler=0.0 Normal: primary active, secondary standby
HEALTHY Yes Yes Phase 1 promotion: both backends have capacity > 0
UNHEALTHY No Yes Failover active: Cloud Function flipped capacity_scaler, secondary handling all traffic