Operations Guide
This Operations Guide is made up of the following sections:
This section states the intent, audience, and boundaries of the DR runbook.
This section provides an overview of how Disaster Recovery (DR) is structured, and what the primary and standby roles do.
This section provides an overview of the Disaster Recovery Architecture.
This section of the documentation explains the building blocks behind Disaster Recovery (DR) to help you understand how DNS, certificates, and automation fit together.
This section explains the sequence that must be followed for successful deployment.
This section explains how to reach both private OKE clusters safely and manage kubeconfig contexts.
This section covers TLS certificate strategy for the global DR hostname and why DNS-01 is typically required.
Dynamic Secondary IP Lookup via Remote State
This section explains how primary and secondary exchange nginx-ingress LoadBalancer IPs through Terraform remote state.
This section covers the complete DR deployment process, from prerequisites through the three stages of DR configuration, failover, and promotion.
This section documents the expected time from primary failure detection to secondary cluster activation. Pre-failover validation runs for dr_failover_function_pre_failover_failure_seconds seconds (set to 0 for testing only).
Quick reference info.
Known Issues and Recommendations
This section lists some known issues and recommended mitigations for DR operations.
Disaster Recovery Additional Resources
Related documentation.