Skip to content

Networking

This section covers everything network-related: the external load balancer, the CNI (Cilium), and network isolation via NetworkPolicy.


HAProxy — External Load Balancer

What it does

HAProxy runs on a separate Debian server and serves as the single entry point for all cluster traffic. It load-balances across all three nodes and handles TLS passthrough for both the API server and Traefik.

Internet / LAN
HAProxy (192.168.0.45)
  :6443 → k3s API Server   (mode tcp, TLS passthrough)
  :80   → Traefik HTTP     (mode http)
  :443  → Traefik HTTPS    (mode tcp, TLS passthrough)

IP alias setup on Debian

HAProxy listens on a virtual IP 192.168.0.45, separate from the host's main IP 192.168.0.46. This is done via an interface alias in /etc/network/interfaces:

auto enp1s0:0
iface enp1s0:0 inet static
    address 192.168.0.45
    netmask 255.255.255.0

Apply without full network restart:

sudo ifup enp1s0:0

Don't use systemctl restart networking over SSH — you'll lose the connection.

HAProxy config

frontend k8s-api
    bind 192.168.0.45:6443
    mode tcp          # TLS passthrough — HAProxy doesn't see the content

frontend ingress-http
    bind 192.168.0.45:80
    mode http         # HTTP — Traefik handles redirects

frontend ingress-https
    bind 192.168.0.45:443
    mode tcp          # TLS passthrough — Traefik terminates TLS

mode tcp vs mode http

  • mode tcp — HAProxy passes raw bytes through, never sees packet content. TLS is terminated by the backend (API server / Traefik). HAProxy doesn't need a certificate.
  • mode http — HAProxy understands HTTP, can modify headers, but requires decrypting TLS (needs the private key).

For k8s API and HTTPS ingress → always mode tcp.

Health check pitfall

option httpchk GET /healthz doesn't work with mode tcp — HAProxy can't parse HTTP in TCP mode. Also, Traefik returns 404 on unknown paths, which HAProxy considers unhealthy by default.

Solutions: - Remove check from backends entirely (simplest for homelab) - Use http-check expect status 200,301,302,404

The debugging lesson: remove health checks first to confirm routing works, then fix health checks separately.


Cilium CNI

Why Cilium (and why we migrated from Flannel)

Flannel was the default CNI in k3s and worked fine — until a hard power-off. Flannel stores its subnet.env in /run/flannel/ which is a tmpfs (lives in RAM). After a hard shutdown, that file disappears and Flannel can't initialize the network on restart, leaving all pods stuck in ContainerCreating.

Cilium doesn't have this problem. It also brings: - eBPF instead of iptables — better performance, lower overhead - Built-in NetworkPolicy support - Hubble for network observability - Production standard in enterprise environments

Migration from Flannel to Cilium

Step 1 — Snapshot etcd first

ssh master "sudo k3s etcd-snapshot save --name pre-cilium-migration"

Step 2 — Stop k3s on all nodes

ansible all -m systemd -a "name=k3s state=stopped" -b

Step 3 — Clean up Flannel

ansible all -m shell -a "
rm -rf /run/flannel
rm -rf /var/lib/cni
rm -rf /etc/cni/net.d/*
ip link delete flannel.1 2>/dev/null || true
ip link delete cni0 2>/dev/null || true
" -b

Step 4 — Update k3s.service flags (on each node)

Add to the ExecStart block:

'--flannel-backend=none'
'--disable-network-policy'

Step 5 — Start the cluster (HA quorum matters!)

In a 3-node HA cluster, etcd needs quorum (⅔ nodes) before it can elect a leader. Don't wait too long between starting master and workers.

# Start master
ssh master "sudo systemctl daemon-reload && sudo systemctl start k3s"

# Start workers in parallel soon after
ssh worker1 "sudo systemctl daemon-reload && sudo systemctl start k3s" &
ssh worker2 "sudo systemctl daemon-reload && sudo systemctl start k3s" &

Watch for prober detected unhealthy status in etcd logs — that's your cue to start the workers.

Step 6 — Install Cilium via Helm

helm repo add cilium https://helm.cilium.io/
helm repo update

helm install cilium cilium/cilium \
  --version 1.19.1 \
  --namespace kube-system \
  --set k8sServiceHost=192.168.55.10 \
  --set k8sServicePort=6443 \
  --set operator.replicas=1

Why a direct IP instead of hostname? Cilium during bootstrap can't use DNS because: - DNS runs as CoreDNS pods - Pods can't start without CNI - CNI (Cilium) can't connect without DNS → deadlock

Direct IP breaks the cycle.

Step 7 — Move to Flux as HelmRelease

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cilium
  namespace: kube-system
  annotations:
    meta.helm.sh/release-name: cilium
    meta.helm.sh/release-namespace: kube-system
spec:
  interval: 30m
  chart:
    spec:
      chart: cilium
      version: "1.19.1"
      sourceRef:
        kind: HelmRepository
        name: cilium
        namespace: flux-system
      interval: 12h
  install:
    createNamespace: false
  upgrade:
    remediation:
      retries: 3
  values:
    k8sServiceHost: 192.168.55.10
    k8sServicePort: 6443
    operator:
      replicas: 1

Verifying Cilium

cilium status
cilium connectivity test --test no-policies

Expected output:

Cilium:          OK
Operator:        OK
Envoy DaemonSet: OK
Cluster Pods:    53/53 managed by Cilium

Routing modes — important context

Mode Description Pod→NodeIP kube-proxy needed
VXLAN (tunnel) L3 encapsulation, works everywhere Yes
Native routing Direct L2 routing, same subnet required No

The cluster currently runs in VXLAN mode with k3s kube-proxy. Native routing requires kubeProxyReplacement: true + routingMode: native + --disable-kube-proxy in k3s — and must be set up from scratch or during a full cluster restart, not as a rolling update.

⚠️ Attempting to switch routing modes via rolling DaemonSet update will break the cluster. One node ends up with native routing while others still use VXLAN — traffic falls apart. Ask me how I know.

Incident: Cilium crashloop taking down the network

During the initial Cilium install, the DaemonSet entered a crashloop (couldn't connect to the API server) and corrupted network interfaces in the process. Master and worker1 became unreachable even by ping.

Recovery was done from worker2:

# From worker2 — poll until the API comes back up
while true; do
  kubectl delete daemonset cilium cilium-envoy -n kube-system \
    --force --grace-period=0 2>/dev/null && echo "DONE!" && break
  sleep 2
done
# Then physically restart the affected node

Lesson: in a 3-node cluster, always keep at least one node out of dangerous operations. worker2 saved the day here.

Namespace stuck in Terminating

kubectl get namespace <name> -o json | \
  python3 -c "import sys,json; d=json.load(sys.stdin); d['spec']['finalizers']=[]; print(json.dumps(d))" | \
  kubectl replace --raw /api/v1/namespaces/<name>/finalize -f -

NetworkPolicy

Default behavior without NetworkPolicy

Every pod can connect to every other pod across all namespaces. Zero isolation.

How NetworkPolicy works in Cilium

Cilium enforces NetworkPolicy at the eBPF level. When a policy exists for a pod, Cilium uses DROP (silently discard) by default — not REJECT. This means blocked connections just time out rather than getting an immediate "connection refused", which leaks less information.

Example: isolating the database

Only the clients-api pods should be able to reach the PostgreSQL database:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: clients-db-allow-only-api
  namespace: clients
spec:
  podSelector:
    matchLabels:
      cnpg.io/cluster: clients-db   # targets the DB pods
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: clients-api      # only allow from clients-api
      ports:
        - protocol: TCP
          port: 5432

Example: application ingress policy

Allow traffic only from Traefik (kube-system) and Prometheus (monitoring), plus intra-namespace traffic for helm tests:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: clients-api-ingress
  namespace: clients
spec:
  podSelector:
    matchLabels:
      app: clients-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - podSelector: {}   # all pods in same namespace (for helm tests)
      ports:
        - protocol: TCP
          port: 8080

Note: podSelector: {} is an empty selector matching all pods in the namespace where the NetworkPolicy lives.

Verifying isolation

# Run a pod without app=clients-api label
kubectl run test-isolation --rm -it \
  --image=postgres:17 \
  --restart=Never \
  -n clients \
  -- psql -h clients-db-rw -U app -d clients_db -c "SELECT 1"

# If it hangs without responding → NetworkPolicy is working ✅
# Cilium DROP = silence, not "connection refused"

UFW Port Reference

Ports needed for k3s + Cilium. Managed via Ansible playbook — see security/ufw.md.

Port Protocol Purpose Source
22 TCP SSH Management network
6443 TCP k8s API Server HAProxy + cluster nodes
80 TCP HTTP Ingress HAProxy
443 TCP HTTPS Ingress HAProxy
8472 UDP Flannel VXLAN (legacy) Between nodes
10250 TCP Kubelet Between nodes
2379 TCP etcd client Between nodes
2380 TCP etcd peer Between nodes
9100 TCP node-exporter Between nodes (Prometheus)
4443 TCP metrics-server Between nodes
4244 TCP Hubble gRPC Between nodes
4240 TCP Cilium health checks Between nodes

Useful Commands

# Cilium
cilium status
cilium status --wait
cilium monitor --type drop

# Force clean pod state after crash
kubectl delete pods -n <ns> --field-selector status.phase=Unknown --force
kubectl delete pods -n <ns> --field-selector status.phase=Failed --force

# NetworkPolicy
kubectl get networkpolicy -n <namespace>
kubectl describe networkpolicy <name> -n <namespace>

# Check Cilium config
kubectl get configmap -n kube-system cilium-config -o yaml | \
  grep -E "routing-mode|kube-proxy|cluster-pool-ipv4-cidr"

# Emergency: revert Cilium to VXLAN
kubectl patch configmap cilium-config -n kube-system \
  --type merge \
  -p '{"data":{"routing-mode":"tunnel","kube-proxy-replacement":"false"}}'
kubectl delete pods -n kube-system -l k8s-app=cilium --force --grace-period=0