GCP DR Data Flow - Encryption Keys, GCS Buckets, Remote State
This section of the documentation traces the data flow between primary and secondary GCP LogScale clusters for disaster recovery. It covers encryption key synchronization, GCS bucket naming and IAM, remote state wiring, and the environment variables that drive LogScale's recovery process.
Encryption Key Synchronization
LogScale encrypts data at rest in GCS using a symmetric key stored in a Kubernetes secret. Both the primary and secondary clusters must use the same key so the secondary can decrypt the primary's snapshots during recovery.
Flow
PRIMARY CLUSTER SECONDARY CLUSTER
+---------------------------+ +---------------------------+
| post-install module: | | post-install module: |
| | | |
| random_password | | Read key from one of: |
| .gcp_storage_ | | 1. existing_gcs_ |
| encryption_password | | encryption_key var |
| | | | 2. primary remote state |
| v | | .gcs_storage_ |
| kubernetes_secret | | encryption_key |
| "{name}-gcp-storage- | | | |
| encryption-key" | remote | v |
| | | state | kubernetes_secret (x2): |
| Exported as TF output: |--------->| a) "{name}-gcp-storage- |
| gcs_storage_ | | encryption-key" |
| encryption_key | | (own bucket encrypt) |
| (sensitive) | | b) "dr-secondary-gcs- |
+---------------------------+ | storage-encryption" |
| (recovery decrypt) |
+---------------------------+Terraform Code Path
Primary (generates the key):
modules/kubernetes/post-install/main.tfcreatesrandom_password.gcp_storage_encryption_password(count = 1whendr != "standby",length = 64, no special characters)Stores value in
kubernetes_secret.gcp_storage_encryption_key(secret name: {logscale_cluster_name}-gcp-storage-encryption-key,key: gcp-storage-encryption-key)Exports via output
gcp_storage_encryption_key_value(sensitive)Root
outputs.tfre-exports asgcs_storage_encryption_keyfor remote state access
Secondary (imports the key):
data.terraform_remote_state.primary[0].outputs.gcs_storage_encryption_keyfetches the valuemodules/kubernetes/post-install/main.tfcreates two secrets: -kubernetes_secret.gcp_storage_encryption_key-- uses the imported key for the secondary's own bucket encryption (same key as primary ensures data portability) -kubernetes_secret.gcp_dr_storage_encryption_key-- stores the key under the DR recovery secret name for LogScale'sGCP_RECOVER_FROM_ENCRYPTION_KEYenv var.
Resolution Logic (locals.tf)
The key resolution follows a priority chain:
# Step 1: Try remote state
remote_gcs_encryption_key = var.primary_remote_state_config == null ? null :
try(data.terraform_remote_state.primary[0].outputs.gcs_storage_encryption_key, null)
# Step 2: Prefer explicit variable, fall back to remote state
effective_gcs_encryption_key = var.existing_gcs_encryption_key != null ?
var.existing_gcs_encryption_key :
local.remote_gcs_encryption_keyPriority:
existing_gcs_encryption_keyvariable (set directly in tfvars) -- highest priorityRemote state output from primary -- automatic discovery
null -- post-install module generates a new key (primary behavior)
GCS Bucket Naming Strategy
Deterministic bucket naming is critical for DR because:
The primary must know the secondary's bucket name at deploy time (for cross-region IAM)
The secondary must know the primary's bucket name to set
GCP_RECOVER_FROM_BUCKETBoth must be knowable without requiring the other cluster to exist first.
Naming Patterns (from
locals.tf)
DR deployments (is_dr_deployment = true, i.e., any remote
state config is set):
| Cluster Role | Data Bucket | Access Logs Bucket |
|---|---|---|
| Primary (dr = "active") | dr-primary-{region}-{project_id} | logs-pri-{region}-{project_id} |
| Secondary (dr = "standby") | dr-secondary-{region}-{project_id} | logs-sec-{region}-{project_id} |
Non-DR deployments (is_dr_deployment = false):
| Cluster Role | Data Bucket | Access Logs Bucket |
|---|---|---|
| Primary | {infrastructure_prefix}-{region}-{project_id} | logs-{infrastructure_prefix}-{region}-{project_id} |
| Secondary | {infrastructure_prefix}-secondary-{region}-{project_id} | logs-{infrastructure_prefix}-sec-{region}-{project_id} |
Override: Set gcs_bucket_name (or dr_primary_gcs_bucket) explicitly to use an exact name instead of the generated pattern.
Cross-Region IAM Flow
PRIMARY SECONDARY
+------------------------+ +------------------------+
| GCS bucket: | | GCS bucket: |
| dr-primary- | | dr-secondary- |
| us-west1-proj123 | | us-east1-proj123 |
| | | |
| SA: wl-identity@... | | SA: wl-identity@... |
| roles/storage.admin | | roles/storage.admin |
| roles/storage. | | roles/storage. |
| objectUser | | objectUser |
+------------------------+ +------------------------+
|
| IAM bindings on PRIMARY bucket:
| storage.legacyBucketReader
| storage.objectViewer
|
(grants secondary SA read
access to primary bucket)
These IAM bindings are created by
modules/gcp/gke/storage.tf:
resource "google_storage_bucket_iam_member" "dr_cross_region_access" {
count = var.dr == "standby" && var.dr_primary_gcs_bucket != "" ? 1 : 0
bucket = var.dr_primary_gcs_bucket # primary's bucket name
role = "roles/storage.legacyBucketReader"
member = module.gcs_workload_identity.gcp_service_account_fqn
}
resource "google_storage_bucket_iam_member" "dr_cross_region_object_access" {
count = var.dr == "standby" && var.dr_primary_gcs_bucket != "" ? 1 : 0
bucket = var.dr_primary_gcs_bucket
role = "roles/storage.objectViewer"
member = module.gcs_workload_identity.gcp_service_account_fqn
}
The dr_primary_gcs_bucket value can come from:
Explicit tfvars setting
Remote state lookup (via
effective_dr_peer_gcs_bucketlocal in rootlocals.tf)gcp_recover_from_bucketfallback
Remote State Configuration
Secondary reads primary's outputs
# In secondary's tfvars:
primary_remote_state_config = {
backend = "gcs"
workspace = "default"
config = {
bucket = "primary-tf-state-bucket"
prefix = "logscale/gcp/terraform/tf.state"
}
}Primary reads secondary's outputs (for GLB only)
# In primary's tfvars (only when enable_global_lb = true):
secondary_remote_state_config = {
backend = "gcs"
workspace = "default"
config = {
bucket = "secondary-tf-state-bucket"
prefix = "logscale/gcp/terraform/tf.state"
}
}What remote state provides
| Output | Consumed By | Purpose |
|---|---|---|
gcs_bucket_id
| Secondary |
Discover primary's bucket name for
GCP_RECOVER_FROM_BUCKET
|
gcs_bucket_region
| Secondary |
Build GCP_RECOVER_FROM_REPLACE_REGION path
translation
|
gcs_storage_encryption_key
| Secondary | Encryption key sync (sensitive) |
instance_group_urls
| Primary (GLB) | Add secondary as backend target in load balancer |
gce_ingress_ip_address
| Primary (GLB) | Secondary's static IP for per-cluster DNS A record |
cluster_location
| Primary (GLB) | Secondary's region for backend service config |
global_lb_backend_service_name
| Secondary (Cloud Function) | GLB backend name for health-based failover alert |
global_dns_zone_name
| Secondary | DNS zone name discovery (avoids duplicating in tfvars) |
cluster_name
| Secondary | Used in DR recovery replace patterns |
primary_health_check_id
| Secondary (Cloud Function) | Reuse primary's health check instead of creating duplicate |
gcs_encryption_key_secret_name
| Secondary | K8s secret name for recovery encryption key |
DR Environment Variables on Standby
When dr = "standby", the root main.tf
builds user_logscale_envvars that get injected into the
HumioCluster CR's pod spec. These are consumed by LogScale's
recovery subsystem when the standby cluster starts or is promoted.
Base variables (all clusters)
| Variable | Value | Source |
|---|---|---|
GCP_STORAGE_WORKLOAD_IDENTITY
| "true" | Hardcoded |
GCP_STORAGE_BUCKET
| Secondary's own bucket name |
module.gke.gke_storage_bucket
|
ENABLE_ALERTS
| "false" |
Set false when dr == "standby"
|
GCP_STORAGE_ENCRYPTION_KEY
| From K8s secret |
secretKeyRef to post-install encryption secret
|
Recovery variables (standby only)
| Variable | Value | Source |
|---|---|---|
GCP_RECOVER_FROM_BUCKET
| Primary's bucket name |
local.final_gcp_recover_from_bucket (remote
state or tfvars)
|
GCP_RECOVER_FROM_WORKLOAD_IDENTITY
| "true" | Hardcoded |
GCP_RECOVER_FROM_REPLACE_REGION
| "{primary-region}/{secondary-region}" |
local.final_gcp_recover_from_replace_region
|
GCP_RECOVER_FROM_REPLACE_BUCKET
| "{primary-bucket}/{secondary-bucket}" |
local.final_gcp_recover_from_replace_bucket or
auto-constructed
|
GCP_RECOVER_FROM_ENCRYPTION_KEY
| From K8s secret |
secretKeyRef to DR recovery encryption secret
|
Notable design decisions
GCP_RECOVER_FROM_REGIONis NOT set. GCS buckets are globally addressable โ no region is needed for cross-region access (unlike S3). LogScale ignores this variable for the GCS bucket provider.GCP_RECOVER_FROM_REPLACE_REGIONIS set despite the above. This is for path translation in stored snapshot references, not for bucket access. LogScale rewrites paths likeus-west1/bucket/objecttous-east1/bucket/object.ALLOW_KAFKA_RESET_UNTIL_TIMESTAMP_MSis NOT required. LogScale automatically enablesallowKafkaResetwhenbucketStorageRecoverFromis configured.ENABLE_ALERTSis "false" on standby to prevent the standby cluster from firing duplicate alerts before it is promoted. On promotion (changing dr from "standby" to "active"), this flips to "true".
secretKeyRef resolution
The encryption key env vars use secretKeyRef rather than inline values to avoid exposing sensitive keys in the HumioCluster CR or Terraform state. The references:
GCP_STORAGE_ENCRYPTION_KEY:
secretKeyRef:
name: {logscale_cluster_name}-gcp-storage-encryption-key
key: gcp-storage-encryption-key
GCP_RECOVER_FROM_ENCRYPTION_KEY:
secretKeyRef:
name: dr-secondary-gcs-storage-encryption (default, configurable)
key: gcp-storage-encryption-key (default, configurable)
Both secrets are created by
modules/kubernetes/post-install. On standby, both
contain the same key value (imported from primary), but they are separate
secrets to maintain the contract that LogScale expects different
secret names for own-bucket vs recovery-bucket encryption.
DR State Impact Summary
How each component behaves based on the dr variable
value:
| Component | dr = "active" (Primary) | dr = "standby" (Secondary) |
|---|---|---|
| GCS Bucket | Own bucket, full R/W | Own bucket (R/W) + read-only on primary's bucket |
| Encryption Key | Generated (random, 64 chars) | Imported from primary (remote state or explicit) |
| HumioCluster Alerts |
Enabled (ENABLE_ALERTS=true)
|
Disabled (ENABLE_ALERTS=false)
|
| Recovery Env Vars | Not set |
Set (GCP_RECOVER_FROM_*)
|
| Global Load Balancer |
Created (if enable_global_lb=true)
| Not created |
| Cloud Function | Not created |
Created (if dr_cloud_function_enabled=true)
|
| DNS Failover (WRR) | Manages global CNAME records | No global DNS management |
| DNS Failover (A record) | Creates per-cluster A record | Creates per-cluster A record |
| Node Pool Routing | Dedicated pool selectors (default) |
Configurable via dr_use_dedicated_routing
|
| Workload Identity | Binds all 3 K8s SAs | Binds all 3 K8s SAs |
| Cross-Region IAM | Not created (no need) | Grants read on primary's bucket |
| Access Logs Bucket | Own logs bucket | Own logs bucket (separate from primary) |
Failover Automation (Cloud Function)
The DR failover Cloud Function (dr-failover-function module) provides automated failover when the primary cluster becomes unhealthy.
Trigger Chain:
Primary cluster goes down
|
v
Uptime Check fails (every 60s, checking /api/v1/status on primary FQDN)
|
v
Alert Policy fires (after 60s sustained failure)
|
v
Notification Channel publishes to Pub/Sub topic ({cluster}-dr-alerts)
|
v
Cloud Function triggered (failover_handler)
|
v
Function validates:
1. Primary has been failing for >= pre_failover_failure_seconds (default 180s)
2. Cooldown period has not elapsed since last failover
|
v
Function scales GKE node pool to target_node_count
Function patches HumioCluster CR to enable standby promotionGLB Health-Based Trigger (Alternative)
When GLB is enabled, a second alert policy monitors the GLB backend service directly:
GLB detects primary backend unhealthy (via health check)
|
v
Alert on: 5xx responses OR zero 200 responses for 60s
|
v
Same Pub/Sub -> Cloud Function chain as aboveThis provides faster detection than the uptime check because the GLB health check runs at the infrastructure level.
Function Configuration
| Parameter | Default | Description |
|---|---|---|
function_timeout
| 300s | Max execution time |
function_memory_mb
| 256 Mi | Memory allocation |
target_node_count
| 1 | Nodes to scale to on failover |
pre_failover_failure_seconds
| 180s | Minimum consecutive failure before acting |
max_retries
| Configurable | Retry count for GKE API calls |
base_delay_seconds
| Configurable | Initial retry backoff |
failover_cooldown_seconds
| Configurable | Minimum time between failover events |
Remote State Bootstrapping Order
DR deployments must be applied in a specific order because each cluster reads the other's state.
Initial Deployment (No GLB)
Step 1: Deploy PRIMARY cluster
No remote state config needed
Generates encryption key
Creates GCS bucket with deterministic name
Exports outputs to state backend
Step 2: Deploy SECONDARY cluster
Set primary_remote_state_config pointing to primary's state
Reads: encryption key, bucket name, bucket region, DNS zone
Creates own bucket + cross-region IAM on primary's bucket
Sets
GCP_RECOVER_FROM_*env vars on HumioCluster
Initial Deployment (With GLB)
Step 1: Deploy PRIMARY cluster
enable_global_lb = trueNo
secondary_remote_state_configyet (secondary doesn't exist)GLB created with primary backend only
Step 2: Deploy SECONDARY cluster
primary_remote_state_configsetenable_glb_named_port = trueExports
instance_group_urlsvia state
Step 3: Re-apply PRIMARY cluster
Add secondary_remote_state_config (points to secondary's state)
GLB picks up secondary's instance groups as second backend
Primary functions correctly with one backend during steps 1-2
Variable Cross-Reference
Quick reference for which variables feed into which components.
Variables that affect DR behavior
| Variable | Used By | Effect |
|---|---|---|
dr
| All modules | Master switch: "active" or "standby" |
dr_use_dedicated_routing
|
module.logscale
| Service selector strategy during promotion |
primary_remote_state_config
| Root data sources | Enables secondary โ primary state reading |
secondary_remote_state_config
| Root data sources | Enables primary โ secondary state reading (GLB) |
dr_primary_gcs_bucket
|
module.gke, locals
| Explicit primary bucket name override |
existing_gcs_encryption_key
|
module.kubernetes_post_install
| Direct key injection (skips remote state) |
gcp_recover_from_bucket
| Root locals | Fallback primary bucket name for recovery |
gcp_recover_from_replace_region
| Root locals | Explicit region replacement pattern |
gcp_recover_from_replace_bucket
| Root locals | Explicit bucket replacement pattern |
enable_global_lb
|
module.global_lb,
module.dns_failover,
module.kubernetes_post_install
| GLB vs DNS failover |
enable_glb_named_port
|
module.kubernetes_post_install
| Named port on instance groups for GLB |
dr_cloud_function_enabled
|
module.dr_failover_function
| Automated failover function |
manage_global_dns
|
module.dns_failover
| Global WRR CNAME management |
Deprecated variables (kept for backwards compatibility)
| Variable | Reason |
|---|---|
gcp_recover_from_region
| GCS does not use region for bucket access. LogScale returns "region-not-set". |