Troubleshooting Checklist

Note

Troubleshooting workflow: Start at Step 1 (Identify LB Type) and work through each step sequentially. Each step builds on the previous one.

Most issues are resolved by Steps 3โ€“4 (External Connectivity/Traffic Management) or Step 8 (Security Checks).

Step 1: Identify LB Type

First confirm what OCI is actually provisioning from the Service annotations:

shell
kubectl --context oci-primary -n logging-ingress get svc -o wide
kubectl --context oci-primary -n logging-ingress get svc <service-name> -o yaml | rg -n "load-balancer-type|oci-load-balancer"

  • Classic LB: oci.oraclecloud.com/load-balancer-type: lb

Step 2: DNS Resolution
shell
dig logscale.<zone_name>
nslookup logscale.<zone_name>
Step 3: External Connectivity (TCP vs TLS)
shell
IP="$(dig +short logscale.<zone_name> A | head -n1)"
nc -zv "$IP" 443
curl -vk --connect-timeout 8 "https://logscale.<zone_name>/"

  • If nc fails: ingress security rules or public routing/DNS.

  • If nc succeeds but TLS hangs: LB is accepting TCP but something breaks on the dataplane path to backends (or the return path).

MTU/fragmentation sanity check (rules out "SYN works but payload doesn't")

shell
openssl s_client -connect "$IP:443" -servername logscale.<zone_name> -mtu 1200
Step 4: OCI Health Check Status (DNS Traffic Management)
shell
oci health-checks http-monitor-result list --monitor-id <monitor-id> --profile <profile>
Step 5: OCI Load Balancer Status

Classic LB

shell
oci lb load-balancer list --compartment-id <compartment-id> --profile <profile>
oci lb backend-health get --backend-set-name <name> --backend-name <name> --load-balancer-id <lb-id> --profile <profile>
Step 6: Kubernetes Components (Ingress + ingress-nginx)
shell
export KUBECONFIG=$(pwd)/kubeconfig-dr.yaml
kubectl --context oci-primary -n logging-ingress get pods
kubectl --context oci-primary -n logging-ingress get svc -o wide
kubectl --context oci-primary -n logging-ingress describe svc <nginx-service-name>
kubectl --context oci-primary -n logging-ingress get endpoints <nginx-service-name> -o wide
kubectl --context oci-primary -n logging get ingress
kubectl --context oci-primary -n logging get pods
Step 7: NodePort Sanity (From Inside the VCN)

From a bastion host (or any VM inside the VCN), validate that the worker NodePorts respond:

shell
nc -zv <worker-node-ip> <nodeport>

If NodePort is unreachable from inside the VCN, fix Kubernetes/kube-proxy/service endpoints first (before blaming OCI dataplane).

Step 8: Security Checks

  • Verify your public IP is present in public_lb_cidrs (for inbound 80/443 to the public LB).

  • Verify worker nodes allow inbound NodePort range (30000-32767) from:

    • the LB subnet CIDR (VCN CIDR), and/or

    • the LB NSG (NSG-to-NSG rules).

  • If is-preserve-source=true, worker nodes must also allow the original client IP CIDRs on the NodePort range.

Step 9: Packet Proof: Where Do Packets Stop?

When nc connects but TLS times out, stop guessing and capture evidence.

A) tcpdump on a backend node (during an external curl)

On a backend node (one of the LB backend IPs):

shell
sudo tcpdump -ni any '(tcp port <nodeport>)' -nn -vv

In parallel, from your PC or laptop:

shell
curl -vk --connect-timeout 8 https://logscale.<zone_name>/

Decision points:

  • No packets hit the node NodePorts: LB is not forwarding, or traffic is blocked before the node (subnet SL/NSG/route or OCI dataplane).

  • SYNs arrive but no SYN-ACK: node is not responding (kube-proxy/service endpoints issue) or ingress is blocked at node SL/NSG.

  • Full 3-way handshake but no payload (or no reply): look at conntrack/iptables/kube-proxy and pod-level capture.

B) VCN Flow Logs

Enable Flow Logs on:

  • LB subnet (10.0.2.0/24 in this design)

  • Worker subnets (10.0.160.0/20, 10.0.176.0/20, 10.0.192.0/20)

Then reproduce the curl and confirm whether flows show:

  • Client โ†’ LB VIP:443 (should be ACCEPT)

  • LB private IP โ†’ nodeIP:NodePort (must be ACCEPT)

If there is no LB โ†’ node flow while client connects, open an OCI Support Request with:

  • LB OCID + region + compartment

  • timestamps of curl attempts

  • flow log excerpts showing missing/one-way flows

  • tcpdump showing absence/presence of packets at backends

Bastion Tunnel
shell
# Check if tunnel is running
lsof -i :16443

# Establish tunnel
LOCAL_PORT=16443 ./scripts/setup-bastion-tunnel-v3.sh --workspace primary kubectl