Disaster Recovery Operations Guide
This is the operations guide.
See also Disaster Recovery Technical Reference.
Overview
Two clusters are managed using separate Terraform state files:
Primary (
eastus): production,dr="active".Secondary (
westus): standby,dr="standby", minimal capacity, reads the primary's Azure Blob Storage container using the exact same encryption key pulled from remote state, and keeps all LogScale pods scaled to zero until a failover/promotion is initiated.
Region flexibility:
The regions shown (eastus and westus) are examples only. You can deploy in any Azure regions supported by your organization. Update
azure_resource_group_regionin your tfvars, the remote state configuration, and any region-specific references (for example Traffic Manager/DNS) to match your chosen regions.
Key features:
Automated encryption key synchronization (no hardcoding). Standby apply requires the primary key (remote state or explicit value).
Cross-region storage access via primary storage firewall update (secondary NAT GW IP) + primary storage account key from remote state (
AZURE_RECOVER_FROM_ACCOUNTKEY). Terraform also assigns Storage Blob Data Reader to the standby AKS managed identity.Alerts toggle automatically via
ENABLE_ALERTSbased ondr(true for active, false for standby).Standby keeps Humio Operator scaled to 0; Azure Function (or manual) scales the operator to 1 on failover.
NodeCountis already set to 1 on the HumioCluster manifest; no automatic scale-down exists.Manual, controlled promotion by changing dr and applying Terraform.
Key capabilities:
| Feature | Primary (Active) | Secondary (Standby) |
|---|---|---|
| Region |
eastus
|
westus
|
| Cluster Type | Advanced (full production) | Standby (Humio operator off) |
| Node Pools | All pools per cluster_size (system/digest/ingress/ingest/ui/kafka) | System/digest/ingress/kafka only; UI and Ingest node pools not created |
Humio nodeCount
|
cluster_size digest count
|
nodeCount=1 declared, but no pods run until operator
is scaled up
|
| Humio operator | 1 replica | 0 replicas until failover |
| Replication Factor | Production value | 1 (overridden) |
| Auto Rebalance | Enabled | Disabled |
| Storage Container |
terraform output -raw storage_container_name
|
terraform output -raw storage_container_name
|
| Encryption Key | Generated on first deploy | Pulled from primary state (required for standby apply) |
| Terraform Workspace | primary | secondary |
| DR Mode |
dr = "active"
|
dr = "standby"
|