DR Failover

This section covers the three-stage failover process: standby readiness, failover execution, and promotion to active.

Stage 1: DR Provisioning and Standby Readiness

Stage 1 is the normal operational state before any disaster event. The primary cluster serves all production traffic while the secondary cluster is fully provisioned but kept idle — EKS, Kafka, and ingress are running, but LogScale pods remain at zero replicas. The Lambda function monitors the primary health check and is ready to trigger failover automatically. The diagram below shows the full architecture across both regions, including the Route53 failover routing, Terraform workspace layout, S3 bucket relationships, and the environment variables pre-configured on the standby cluster.

What is deployed and running:

Component	Primary	Secondary
EKS cluster	Running	Running
Kafka brokers	Running	Running
LogScale pods	Full cluster_size	0 replicas (operator off)
Humio operator	1 replica	0 replicas
Ingress / ALB	Active	Active (minimal)
Lambda	N/A	Watching primary health check
Route53	Global FQDN → PRIMARY ALB	SECONDARY record (failover target)

AWS DR Recovery Environment Variables

All variables listed in this section are set automatically by Terraform -- no manual configuration is required.

The following S3_RECOVER_FROM_* environment variables are deployed only on the secondary (standby) cluster. They are automatically set by Terraform when dr = "standby" - no manual configuration is required. Their values are derived from the primary cluster's remote state or fallback tfvars variables.

These variables tell the standby LogScale instance where to find the primary cluster's S3 bucket and how to authenticate to it. When the standby cluster is activated during failover (humio-operator scaled to 1), LogScale reads these variables at startup and performs the recovery sequence described in Secondary Readiness Required Steps.

For the complete variable reference table, source details, and Terraform resolution logic, see AWS DR Recovery Environment Variables.

Stage 2: Failover -- Scale up Humio and read global snapshot

Stage 2 begins when the primary cluster becomes unreachable. The failover can be triggered automatically by the Lambda (via CloudWatch alarm and SNS) or manually by an operator. During this stage, the Humio operator is scaled from 0 to 1 on the secondary cluster, the LogScale pod starts and recovers data from the primary's S3 bucket using the global snapshot, and the primary health check FQDN is locked to prevent DNS failback. The diagram below shows the system mid-failover: the primary is down (red), Route53 has switched the global FQDN to the secondary ALB, and the secondary LogScale pod is reading from the primary's S3 bucket.

Stage 2 is triggered either automatically by the DR Lambda or manually by an operator.

Automatic failover (Lambda):

Primary Route53 health check fails (3 consecutive failures = 30 seconds)
CloudWatch alarm fires
SNS topic notifies the Lambda
Lambda validates the primary has been failing for at least pre_failover_failure_seconds (default: 180s)
Lambda cleans up stale TLS secrets (prevents CA mismatch)
Lambda scales humio-operator from 0 to 1
Lambda locks the primary health check by swapping its FQDN to failover-locked.invalid to prevent automatic DNS failback

Manual failover (operator):

shell

# Scale the operator manually
kubectl --context dr-secondary -n logging scale deploy/humio-operator --replicas=1
# Lock the primary health check to prevent automatic failback
PRIMARY\_HC\_ID=$(terraform workspace select primary && terraform output -raw primary\_health\_check\_id)
aws route53 update-health-check \
--health-check-id "$PRIMARY\_HC\_ID" \
--fully-qualified-domain-name "failover-locked.invalid"

Secondary Readiness Required Steps

On standby, the HumioCluster already declares nodeCount=1, but the Humio operator is scaled to 0. When the Humio operator is scaled to 1 (by the Lambda on health check failure or manually), it reconciles the HumioCluster and starts a single LogScale pod.

Scale the Humio operator on secondary:

With Lambda enabled (default): Route53 Health Check failure → CloudWatch Alarm → SNS → Lambda scales humio-operator replicas to 1. No further action needed.

Manually (e.g., for tests or if Lambda is disabled):

shell

kubectl --context dr-secondary -n logging scale deploy/humio-operator --replicas=1

Do not patch spec.nodeCount it is already set to 1. The operator must be running for the Humio pod to start.

What Happens After Operator Starts:

The Humio operator reconciles and creates the Humio pod
The pod reads S3_RECOVER_FROM_* env vars
It lists and downloads the latest global-snapshot.json from the primary S3 bucket
It patches the snapshot to reference the secondary bucket/region using S3_RECOVER_FROM_REPLACE_* values
It loads the patched snapshot into memory
The cluster starts up with the recovered metadata state; data segments remain in the primary bucket (read-only access)

Kafka Epoch Key and Automatic Reset:

During DR recovery, the secondary cluster connects to a different Kafka cluster than the primary. LogScale detects this via the Kafka "epoch key" (derived from the Kafka cluster ID + topic prefix) stored in the global snapshot. When the epoch key changes, LogScale requires a "Kafka reset" to proceed.

Important

Do NOT set ALLOW_KAFKA_RESET_UNTIL_TIMESTAMP_MS. When S3_RECOVER_FROM_BUCKET is configured (as it is on all standby clusters), LogScale automatically enables Kafka reset. Setting ALLOW_KAFKA_RESET_UNTIL_TIMESTAMP_MS overrides this automatic behavior due to a precedence rule in LogScale's Config.scala -- if the env var is set (even with an expired timestamp), the automatic S3_RECOVER_FROM_BUCKET fallback never activates. This can cause the pod to crash with FATAL: The Kafka epoch key has changed if the timestamp expires before failover occurs.

What Data is Transferred in the Global Snapshot:

The global snapshot is a JSON-based export of LogScale's internal cluster state at boot time.

Transferred in the snapshot:

Dataspaces (repositories): All repository definitions, views, retention policies, and metadata
Bucket storage configurations: Provider info (S3/GCS/Azure/OCI), regions, bucket names, encryption settings, key prefixes. During DR recovery, these are patched with new credentials and marked as readOnly=true
Segment metadata: References to log data locations including bucket IDs, byte sizes, date ranges, epoch/offset information. Only the metadata about segments is transferred, not the actual compressed log data files
Datasource configurations, license information, cluster identifiers, system configuration

Cleared during DR recovery patching:

All host entries (dropped via dropAllHostsFromClusterForDisasterRecoveryBoot())
All partition assignments (ingest, segment, and query coordination partitions deleted)
Segment ownership (ownerHosts, currentHosts, topEpoch, topOffset)
Datasource runtime state (currentSegments, ingestEpoch, ingestOffset cleared; ingestIdle set to true)

NOT in the snapshot (must be synced separately):

Actual log data (compressed segments remain in primary bucket, accessed read-only by secondary)
Kubernetes Secrets: license, TLS/CA certificates, OAuth/SAML secrets, SMTP credentials, image pull secrets
Storage encryption keys (synchronized via Terraform remote state)
Runtime state: live Kafka consumer positions, query execution state, cache contents

Key insight: The global snapshot is LogScale's configuration and metadata state (~MBs), not your log data (~TBs). During DR, the secondary cluster reads the actual log events directly from the primary's S3 bucket using the segment metadata as a map.

Spot-check pods on secondary:

shell

kubectl --context dr-secondary -n logging get pods
# Expect humio-operator (1/1), one Humio pod once recovery starts, and Kafka components running

Check recovery logs:

shell

kubectl --context dr-secondary -n logging logs -f <humio-pod> | grep -i "recover\|snapshot\|DataSnapshotLoader"
# Should see: "Fetching global-snapshot.json from s3://..."
# Should see: "Successfully patched global-snapshot.json"

DNS Architecture and Traffic Flow

Route53 Failover Routing with Health Check FQDN Locking:

Automatic Failback Prevention: During failover, the Lambda swaps the primary Route53 health check FQDN to failover-locked.invalid, making the primary permanently unhealthy in Route53. This prevents automatic DNS failback. A second safeguard is lifecycle { ignore_changes } on the health check FQDN, which prevents terraform apply from reverting the lock.

For the full design rationale, health check state transition table, and why FQDN locking was chosen over health check inversion, see Section 18: DNS Architecture — FQDN Locking Details in the Technical Reference.

Post-Failover Verification

Important

Verify the primary health check FQDN is locked immediately after any failover.

shell

# Primary health check should show FQDN = failover-locked.invalid
PRIMARY\_HC\_ID=$(terraform workspace select primary && terraform output -raw primary\_health\_check\_id)
aws route53 get-health-check --health-check-id "$PRIMARY\_HC\_ID" \
--query 'HealthCheck.HealthCheckConfig.FullyQualifiedDomainName'
# Should return: "failover-locked.invalid"
# If FQDN still shows the original hostname, lock it manually:
aws route53 update-health-check \
--health-check-id "$PRIMARY\_HC\_ID" \
--fully-qualified-domain-name "failover-locked.invalid"
# Secondary should be serving traffic via global FQDN
curl -sk "https://<global\_logscale\_hostname>.<zone\_name>/api/v1/status" | jq .status
# => "OK"
# DNS should resolve to secondary ALB
dig +short <global\_logscale\_hostname>.<zone\_name>

Manual Failback Procedure

Failback is never automatic. After confirming the primary is fully healthy and data-consistent:

# 1. Verify primary is fully ready
curl -sk "https://<primary\_logscale\_hostname>.<zone\_name>/api/v1/status" | jq .status
# =gt; "OK"
# 2. Restore the primary health check FQDN to the original hostname
PRIMARY\_HC\_ID=$(terraform workspace select primary && terraform output -raw primary\_health\_check\_id)
aws route53 update-health-check \
--health-check-id "$PRIMARY\_HC\_ID" \
--fully-qualified-domain-name "<primary\_logscale\_hostname>.<zone\_name>"
# 3. Verify DNS has switched back
dig +short <global\_logscale\_hostname>.<zone\_name>
# Should resolve to primary ALB

Important

Terraform includes lifecycle { ignore_changes = [fqdn, invert_healthcheck] } on the primary health check. This means terraform plan will not show a diff when the health check FQDN is locked -- the FQDN swap is purely a runtime state managed by the Lambda and operators, not by Terraform.

Warning

Do not run terraform apply on the primary state during an active failover unless you have verified that the health check FQDN protection is in place via lifecycle { ignore_changes }. It is safe to run after failback when the primary is already serving traffic.

Traffic Routing During Failover

AWS DR - Failover Traffic Routing Sequence

Verify DR Recovery Succeeded

After failover, verify that the secondary cluster has successfully recovered:

shell

# 1. Check LogScale pod status
kubectl --context dr-secondary -n logging get pods -l app.kubernetes.io/name=humio
# STATUS: Running, READY: 1/1
# 2. Check recovery logs for snapshot fetch
kubectl --context dr-secondary -n logging logs <humio-pod> | grep -i "global-snapshot"
# Should show: Fetching global-snapshot.json from s3://<primary-bucket>/...
# 3. Verify LogScale API responds via global FQDN
curl -sk "https://<global\_logscale\_hostname>.<zone\_name>/api/v1/status" | jq .
# Should return: {"status": "OK", ...}
# 4. Verify DNS resolution
dig +short <global\_logscale\_hostname>.<zone\_name>
# Should resolve to secondary ALB
# 5. Check health check FQDN is locked (confirm failback is prevented)
aws route53 get-health-check --health-check-id "$PRIMARY\_HC\_ID" \
--query 'HealthCheck.HealthCheckConfig.FullyQualifiedDomainName'
# Should return: "failover-locked.invalid"

Verify via LogScale UI

logscale

DataSnapshotLoader
| #kind != threaddumps

You should see messages similar to:

text

Checking bucket storage localAndHttpWereEmpty=true
Trying to fetch a global snapshot from bucket storage s3 if one exists in bucket=logscale-s3-dr-secondary-us-east-2-<account\_id>
Fetching global snapshot from bucket storage s3 found no snapshot to fetch.
Trying to fetch a global snapshot as recovery source from bucket storage in s3
Trying to fetch a global snapshot from bucket storage s3 if one exists in bucket=logscale-s3-dr-primary-us-west-2-<account\_id>
Fetched global snapshot from bucket storage s3 found snapshot with epochOffset={epoch=0 offset=699094}
Fetched a global snapshot as recovery source from bucket storage in s3 and got snapshot with epochOffset={epoch=0 offset=699094} now patching...
Snapshots to choose from, last is better...: List(({epoch=0 offset=699094},s3)) using kafkaMinOffsetOpt of Some(KafkaMinOffset(...))
Selecting snapshot from source=s3 with epochOffset={epoch=0 offset=699094}
updateSnapshotForDisasterRecovery: Patching region using from=us-west-2 to=us-east-2 on bucketId=1
updateSnapshotForDisasterRecovery: Patching bucket using from=logscale-s3-dr-primary-us-west-2-<account\_id> to=logscale-s3-dr-secondary-us-east-2-<account\_id> on bucketId=1
updateSnapshotForDisasterRecovery: Patching access configs from RECOVER\_FROM on bucketId=1
updateSnapshotForDisasterRecovery: setting readOnly=true on bucketId=1 keyPrefix= new value for bucket=logscale-s3-dr-secondary-us-east-2-<account\_id>

Real LogScale logs showing successful DR standby synchronization:

Note: The logs show:

First checks the secondary bucket (logscale-s3-dr-secondary-...) - finds no snapshot
Then fetches from the primary bucket (logscale-s3-dr-primary-...) as the recovery source
Patches the snapshot to use the secondary bucket for new writes
Sets the primary bucket reference to readOnly=true

Ready to promote when:

Operator is 1/1 on secondary
Kafka components exist on secondary
DataSnapshotLoader logs match the expected sequence above
Snapshot file shows patched region/bucket pointing to secondary; encryption keys match

Stage 3: Promote Secondary to Active

Stage 3 converts the secondary cluster from a minimal failover state into a full production cluster. This involves changing dr from "standby" to "active" and running a two-phase Terraform apply. The cluster scales up to full capacity (additional node groups, production replica counts, alerting enabled), and the DR Lambda and its associated resources are removed since they are no longer needed. The diagram below shows the final promoted state: the secondary is now the active cluster at full scale, the primary remains down with its health check locked, and all traffic flows through the promoted cluster.

Once the secondary cluster is verified and serving traffic, you may want to promote it to a full active cluster. This involves changing dr from "standby" to "active" and applying Terraform.

What promotion changes:

Setting	Before (dr="standby")	After (dr="active")
Humio operator replicas	0 (or 1 if failover happened)	1
HumioCluster nodeCount	1	cluster_size digest count
Additional node groups	None (no UI/Ingest)	UI, Ingest node groups created
Replication factor	1	Production value
ENABLE_ALERTS	false	true
DR Lambda	Deployed	Removed

Zero-Downtime Promotion (Two-Phase Apply)

For zero-downtime DR promotion, use the two-phase terraform apply approach with the dr_use_dedicated_routing variable. This ensures traffic continues to flow to the existing digest pod while UI/Ingest pods scale up.

Understanding dr_use_dedicated_routing:

dr_use_dedicated_routing = true (default): Services look for specific pod types. The UI service only routes to UI pods, and the ingest service only routes to ingest pods.
dr_use_dedicated_routing = false: Services look for ANY LogScale pod, regardless of type.

Why two phases are needed:

When promoting from dr="standby" to dr="active", the HumioCluster's node pool configuration changes from digest-only (1 pod) to the full production topology (digest + UI + ingest pods). Without two phases:

Service selectors immediately change to look for UI pods (humio.com/node-pool=<prefix>-ui)
UI pods don't exist yet
Services have zero endpoints → 503 errors

With two phases:

Phase 1: Selectors use app.kubernetes.io/name=humio to match ALL LogScale pods (including existing digest pod)
Traffic continues to existing digest pod during UI/Ingest scale-up
Phase 2: After UI/Ingest pods are ready, selectors switch to pool-specific routing

dr_use_dedicated_routing Behavior Matrix:

dr	dr_use_dedicated_routing	Selector Used	Use Case
"active"	false	Generic (app.kubernetes.io/name=humio)	Phase 1 of promotion - zero downtime
"active"	true	Pool-specific	Phase 2 of promotion / normal production routing
"standby"	false	Generic (app.kubernetes.io/name=humio)	Standby waiting for failover
"standby"	true	Pool-specific	Standby with dedicated routing (rare)

Phase 1: Promote with Generic Selectors (Zero-Downtime)

shell

# Edit tfvars for Phase 1
vi secondary-us-east-2.tfvars
dr = "active"
dr\_use\_dedicated\_routing = false # Generic selector - matches ALL pods
# Apply Phase 1
terraform workspace select secondary
terraform apply -var-file=secondary-us-east-2.tfvars
# Verify UI and Ingest pods are coming up
kubectl --context dr-secondary -n logging get pods -l humio.com/node-pool
# Wait until UI and Ingest pods show Running and Ready

Phase 2: Enable Dedicated Routing (After Pods Ready)

shell

# Option A: Stay in DR mode with optimal routing
vi secondary-us-east-2.tfvars
dr = "active"
dr\_use\_dedicated\_routing = true # Pool-specific selectors - optimal routing
# Apply Phase 2
terraform workspace select secondary
terraform apply -var-file=secondary-us-east-2.tfvars
# Verify services have correct endpoints
kubectl --context dr-secondary -n logging get endpoints

Standard Promotion (Single Apply)

If downtime during promotion is acceptable:

shell

vi secondary-us-east-2.tfvars
dr = "active"
# dr\_use\_dedicated\_routing defaults to true
terraform workspace select secondary
terraform apply -var-file=secondary-us-east-2.tfvars

What changes automatically on promotion:

Scales node groups to production sizes
Sets production replication factor and enables auto-rebalance
Enables alerts by setting ENABLE_ALERTS=true
Humio operator scales to 1 and HumioCluster nodeCount follows production values

Resources destroyed during promotion

When promoting from dr="standby" to dr="active", Terraform destroys the entire module.dr-failover-lambda because automated failover is no longer needed. The following resources are removed:

Lambda Function (<cluster>-dr-failover-handler) — the automated failover function
CloudWatch Alarm (<cluster>-dr-failover-primary-unhealthy) — the Route53 health check monitor
SNS Topic + Subscription (<cluster>-dr-failover-sns) — the cross-region alarm-to-Lambda bridge
Lambda IAM Role + Policy — the role and inline policy granting EKS, Route53, CloudWatch, KMS, and SSM access
EKS Access Entry — Lambda's cluster access entry (namespace-scoped to logging)
KMS Key — the KMS key used to encrypt Lambda environment variables
CloudWatch Log Group (/aws/lambda/<cluster>-dr-failover-handler) — Lambda execution logs

Global DNS note: Promoting the standby to dr="active" does not automatically move ownership of the Route53 global DNS resources. Keep manage_global_dns=true only in a single workspace to avoid two states managing the same failover records/zone.

S3_RECOVER_FROM_* Environment Variable Preservation

When promoting dr="standby" to dr="active", Terraform removes the S3_RECOVER_FROM_* environment variables from the HumioCluster manifest. This is intentional:

During standby: These variables tell LogScale where to fetch recovery data (primary's S3 bucket)
After promotion: The cluster is fully independent and writes to its own S3 bucket; recovery variables are no longer needed
If data migration is still in progress: Keep dr="standby" until LogScale has finished copying all segments from the source bucket to its own bucket. Monitor the DataSegmentUploader logs

Operational implication: Removing these env vars changes the HumioCluster spec and may cause the operator to roll/recreate pods during promotion. Plan for a restart window as part of the promotion process.

Verify promotion:

shell

# Confirm S3\_RECOVER\_FROM\_\* env vars are removed
kubectl get humiocluster -n logging --context dr-secondary -o yaml | grep S3\_RECOVER
# Should return empty
# Verify nodeCount is at production value
kubectl get humiocluster -n logging --context dr-secondary -o jsonpath='{.items[0].spec.nodeCount}'
# => production value (e.g., 3)
# All pods running
kubectl get pods -n logging --context dr-secondary
# => all pods Running

Failover Timing Summary

Important

The timing values in this section were measured on a new test cluster with very low ingestion volume. Production clusters with higher data volumes, larger node counts, and bigger snapshots will likely experience longer recovery times. Use these figures as a baseline reference, not as guarantees.

End-to-End Failover Timeline

Total failover time depends on three phases: detection, Lambda execution, and LogScale recovery.

Detection phase:

Component	Duration	Notes
Route53 health check failure	~30s	3 failures at 10s interval
CloudWatch alarm evaluation	~60s	1 evaluation period
Pre-failover validation	0-180s	Configurable (default 180s)

Execution phase:

Component	Duration	Notes
SNS → Lambda invocation	<1s	Near-instant
Validate health check + clean TLS secrets	~4s	Route53 + Kubernetes API calls
Scale operator 0 → 1	~3s	Kubernetes API call with retry
Lock primary health check FQDN	~1s	Route53 API call

Recovery phase:

Components	Duration	Notes
Operator starts LogScale pod	~30-60s	Pod scheduling and image pull
Snapshot fetch and patch	~30-120s	Depends on snapshot size and S3 latency
Kafka topic creation + LogScale ready	~60-120s	Full initialization
Route53 DNS propagation	~60s	TTL-dependent (default 60s)

Total expected time:

Scenario	Detection	Lambda	Recovery + DNS	Total
Production (180s validation)	~90s	~188s	~3-6 min	~7-10 min
Testing (0s validation)	~90s	~8s	~3-6 min	~5-8 min

Configuration Variables

Variable	Default	Testing	Description
`dr_failover_lambda_pre_failover_failure_seconds`	180	0	Minimum consecutive failure seconds before triggering
`route53_record_ttl`	(required)	60	DNS record TTL in seconds
Health check interval (primary / secondary)	10s / 30s	Hardcoded	HTTPS probe for primary, TCP for secondary
Failure threshold (primary / secondary)	3 / 2	Hardcoded	Consecutive failures before marking unhealthy

Pre-failover holdoff: The Lambda queries CloudWatch to verify the primary has been continuously failing for at least pre_failover_failure_seconds before proceeding. If insufficient time has passed, the Lambda exits without action and CloudWatch re-invokes on the next alarm cycle. Set to 0 for testing only — always use the default 180 in production to prevent false failovers.

Cooldown: The Lambda records its last failover timestamp to SSM Parameter Store (/logscale-dr/<function-name>/last-failover-timestamp) and skips execution if within the cooldown period (module-internal default: 300s, not configurable via tfvars). The SSM persistence ensures the cooldown survives Lambda cold starts and redeployments.

Testing vs Production

terraform

# Fast failover for testing (NOT for production)
dr_failover_lambda_pre_failover_failure_seconds = 0
# Testing timeline: ~5-8 minutes

# Production defaults (prevent false failovers)
dr_failover_lambda_pre_failover_failure_seconds = 180
# Production timeline: ~7-10 minutes

Cross-Cloud Timing Comparison

Phase	AWS (Route53)	Azure (Traffic Manager)	OCI (DNS Steering)
Health check interval	10s	30s	60s
Failure threshold	3 failures	3 failures	N/A (alarm-based)
Detection time	~30s	~90s	~55s
Function execution	Lambda ~8s	Azure Function ~10-20s	OCI Function ~49s
DNS propagation	~60s (TTL)	~60s (TTL)	~60s (TTL)
Failback prevention	FQDN locked to `failover-locked.invalid`	Primary endpoint disabled	Steering policy answer disabled

DR Promotion Scaling Timeline

Expected phases when promoting from dr="standby" to dr="active". Values are estimated — measured on a test cluster with minimal ingestion.

Phase 1:

Action	Estimated Duration
terraform apply (set `dr="active"`, `dr_use_dedicated_routing=false`)	2-5 min
EKS UI/Ingest node groups provision (EC2 instances boot)	3-6 min
Pod scheduling + readiness	3-7 min

Phase 2:

Action	Estimated Duration
terraform apply (set `dr_use_dedicated_routing=true`)	1-2 min
Ingress reload + endpoint update	30-60s
DNS propagation (if applicable)	60-180s

Total: ~10 - 20 minutes

Verification:

shell

# Check all pods running after Phase 1
kubectl get pods -n logging --context dr-secondary
# Verify UI/Ingest nodes
kubectl get nodes --context dr-secondary -l purpose=logscale-ui
kubectl get nodes --context dr-secondary -l purpose=logscale-ingest
# Check ingress routing after Phase 2
kubectl get ingress -n logging --context dr-secondary -o yaml | grep -A 5 "rules:"

Versions of this Page

Deployment Overview

Planning Your Deployment

Instance Sizing

Authentication and identity providers

Storage Architecture

Installing Using Containers

Installing On Bare Metal or Cloud Instance

Reference Architectures

Installing Load Balancers

Deploying Auxiliary Services

Configuration Settings

Managing Your Deployment

Testing Your Deployment

DR Failover

Stage 1: DR Provisioning and Standby Readiness

Stage 2: Failover -- Scale up Humio and read global snapshot

Secondary Readiness Required Steps

Important

DNS Architecture and Traffic Flow

Important

Important

Warning

Traffic Routing During Failover

Verify DR Recovery Succeeded

Stage 3: Promote Secondary to Active

Zero-Downtime Promotion (Two-Phase Apply)

S3_RECOVER_FROM_* Environment Variable Preservation

Failover Timing Summary

Important

Enter search term