TLS CA Certificate Mismatch Fix

Issue

DR standby clusters experienced x509: certificate signed by unknown authority errors when the humio-operator tried to install the license after failover.

Root Cause

When a DR standby cluster's operator is scaled to 0 and later scaled back up:

  1. The CA keypair ({cluster-name}-ca-keypair) may be regenerated

  2. The cluster TLS secret ({cluster-name}) retains the OLD CA certificate

  3. New pod certificates are signed by the NEW CA

  4. The operator reads CA from cluster TLS secret (NOT the CA keypair)

  5. TLS verification fails - operator can't communicate with LogScale pods

Key insight: The operator reads the CA certificate from the cluster TLS secret, not from the CA keypair secret. This is the root cause of the mismatch after operator restarts.

Solution

Added _cleanup_stale_tls_secret() function to DR failover Lambda:

The Lambda includes a _cleanup_stale_tls_secret() function that checks for an existing TLS secret matching the HumioCluster name. If found, it deletes the secret so the operator regenerates it with the correct CA on next reconciliation. If the secret does not exist (HTTP 404), cleanup is skipped. Any other API error is logged as a warning but does not block failover.

Configuration

The HumioCluster name is automatically derived from the LogScale module output (cluster_name_prefix) and passed to the Lambda module configuration. No manual configuration is required.