Troubleshooting

Start at Step 1 and work through each step in order. Most issues are resolved by Steps 2 to 4.

Step 1 - DNS Resolution

Check DNS resolution from the client and from within the cluster:

shell
dig <global-logscale-hostname>.<zone> @<route53-nameserver>
nslookup <global-logscale-hostname>.<zone>
Step 2 - External Connectivity

Verify TCP connectivity to the ALB and TLS certificate validity:

shell
# Check TCP connectivity (no TLS verification)
nc -zv <alb-hostname> 443
# Or using bash /dev/tcp (if nc unavailable)
timeout 5 bash -c "</dev/tcp/<alb-hostname>/443" && echo "Connected" || echo "Failed"
# Check TLS certificate and cipher suites
curl -kv https://<alb-hostname>/ 2>&1 | grep -E "subject=|issuer=|SSL"
Step 3 - ALB Health

Check target health and ALB configuration:

shell
aws elbv2 describe-target-health --target-group-arn <target-group-arn> --region us-west-2
Step 4 - Route53 Health Check Status

Verify Route53 health checks are passing:

shell
aws route53 get-health-check-status --health-check-id <health-check-id> --region us-east-1
Step 5 - Kubernetes Components

Verify pods, services, and ingress configuration:

shell
kubectl get pods -n logging --context <cluster-context>
kubectl get svc -n logging --context <cluster-context>
kubectl get endpoints -n logging --context <cluster-context>
kubectl get ingress -n logging --context <cluster-context> -o yaml
Step 6 - Lambda Not Invoked
  1. Check CloudWatch alarm state:

    shell
    aws cloudwatch describe-alarms --alarm-names "<secondary-cluster>-dr-failover-primary-unhealthy" --region us-east-1
  2. Check SNS topic subscriptions:

    shell
    aws sns list-subscriptions-by-topic --topic-arn arn:aws:sns:us-east-1:<account-id>:<secondary-cluster>-dr-failover-sns --region us-east-1
  3. Check Lambda logs:

    shell
    aws logs tail /aws/lambda/<secondary-cluster>-dr-failover-handler --region us-east-2
Step 7 - Operator Not Scaling
  1. Verify EKS access entry:

    shell
    aws eks list-access-entries --cluster-name <secondary-cluster> --region us-east-2
  2. Check Lambda IAM role permissions:

    shell
    aws iam get-role-policy --role-name <secondary-cluster>-dr-failover-lambda --policy-name <secondary-cluster>-dr-failover-lambda-access
  3. Verify HumioCluster name:

    shell
    kubectl get humiocluster -n logging --context <secondary-cluster>
Step 8 - TLS Certificate Errors
  1. Check if TLS secret exists:

    shell
    kubectl get secret -n logging --context <secondary-cluster> | grep -v token
  2. Verify CA keypair:

    shell
    kubectl get secret <cluster-name>-ca-keypair -n logging --context <secondary-cluster> -o yaml
  3. Check cert-manager logs:

    shell
    kubectl logs -n cert-manager -l app=cert-manager --context <secondary-cluster>