Stage 1: DR Configuration Setup
Plan Your Naming
Choose deterministic, region-scoped names for all resources. This avoids collisions and makes cross-region references unambiguous.
Primary:
Region: us-central1
Infrastructure prefix: logscale-primary
GCS bucket: logscale-primary-us-central1-<project-id>
Secondary:
Region: us-west1
Infrastructure prefix: logscale-secondary
GCS bucket: logscale-secondary-us-west1-<project-id>Note
Replace <project-id> with your actual GCP project ID. Bucket names must be globally unique.
Deploy Primary Cluster
Create primary.tfvars:
project_id = "your-project"
region = "us-central1"
infrastructure_prefix = "logscale-primary"
dr = "active"
logscale_cluster_type = "advanced"
logscale_cluster_size = "small"
# Deterministic bucket naming
gcs_bucket_name = "logscale-primary-us-central1-your-project"
# Global Load Balancer (optional, for health-based failover)
enable_global_lb = true
enable_glb_named_port = true
# DNS
manage_global_dns = true
global_dns_zone_name = "your-dns-zone"
global_logscale_hostname = "logscale"
primary_logscale_hostname = "dr-primary"
secondary_logscale_hostname = "dr-secondary"
public_dns_zone_name = "your-public-dns-zone"
public_url = "logscale.yourdomain.com"
# Cross-region: primary needs to know secondary's bucket name
dr_primary_gcs_bucket = "logscale-secondary-us-west1-your-project"
# Versions (same on both clusters)
humio_operator_chart_version = "0.29.2"
humio_operator_version = "0.29.2"
logscale_image_version = "1.228.1"
# ... (all other version variables)Deploy in targeted order:
terraform init
terraform apply -target=module.vpc
terraform apply -target=module.gke
# ... (continue with remaining modules per the setup guide)
terraform apply -target=module.global_lb # GLB for DR
terraform apply -target=module.dns_failover # DNS recordsImportant
Deploy modules in dependency order. The GLB and DNS modules depend on the GKE cluster and its services being ready.
Deploy Secondary (Standby) Cluster
Create secondary.tfvars:
project_id = "your-project"
region = "us-west1"
infrastructure_prefix = "logscale-secondary"
dr = "standby"
logscale_cluster_type = "advanced" # MUST match primary — cost savings come from operator at 0 replicas, not cluster type
logscale_cluster_size = "xsmall"
# Deterministic bucket naming
gcs_bucket_name = "logscale-secondary-us-west1-your-project"
# GLB named port (required so primary GLB can route to secondary)
enable_glb_named_port = true
# Remote state: read primary's outputs for encryption key sync
primary_remote_state_config = {
backend = "gcs"
workspace = "default"
config = {
bucket = "your-tf-state-bucket"
prefix = "logscale/gcp/primary/terraform/tf.state"
}
}
# Recovery configuration
dr_primary_gcs_bucket = "logscale-primary-us-central1-your-project"
gcp_recover_from_bucket = "logscale-primary-us-central1-your-project"
gcp_recover_from_replace_region = "us-central1/us-west1"
# Cloud Function for automated failover (optional)
dr_cloud_function_enabled = true
dr_cloud_function_target_node_count = 2
dr_cloud_function_pre_failover_failure_seconds = 180
# Versions (MUST match primary)
humio_operator_chart_version = "0.29.2"
humio_operator_version = "0.29.2"
logscale_image_version = "1.228.1"
public_url = "logscale.yourdomain.com"Deploy in targeted order:
terraform init
terraform apply -target=module.vpc
terraform apply -target=module.gke
# ... (targeted deployment per the setup guide)
terraform apply -target=module.dr_failover_function # Cloud FunctionGLB Backend Registration
The standby cluster self-registers its instance groups into the primary's
GLB backend service on first deploy (via
terraform_data.glb_self_register). No primary redeploy is
required.
Verify both backends are registered after standby deploy:
gcloud compute backend-services get-health <BACKEND_SERVICE_NAME> \
--global --format='table(status.healthStatus[].ipAddress,status.healthStatus[].healthState)'
# Expected: 2+ IPs (primary HEALTHY, standby UNHEALTHY — standby has no LogScale pods)
If only one backend appears, check that enable_glb_named_port =
true and primary_remote_state_config are set on
the standby worker.
Similarly, after a standby-to-active-to-standby
round-trip (e.g., DR test followed by failback), the encryption key recovery
secret may be empty. Verify:
# On STANDBY — must NOT be empty (SHA256 of empty = e3b0c44298fc...)
kubectl get secret <RECOVERY_SECRET> -n log \
-o jsonpath='{.data.gcp-storage-encryption-key}' | base64 -d | shasum -a 256
If the hash is e3b0c44298fc1c149afbf4c8996fb924..., the
key is empty and DR recovery will fail. Redeploy the standby to re-read the
primary's encryption key from remote state.