Did you configure a network load balancer for your secondary network interfaces? Are you struggling to get the health checks to work, even after verifying the listening IPs and ports and the firewall rules?
Your health check responses are most likely returned via the primary interface, causing the packets to get lost and/or dropped. Use this blog to verify and resolve the issue.
How Passthrough Network Load Balancers Work
A passthrough Network Load Balancer routes connections directly from clients to the healthy backends, without any interruption. To accept load-balanced traffic, Google Cloud configures each backend VM with the IP address of the load balancer using a local route:
laurensknoll@squid-proxy-c837:~$ ip route get 10.0.1.2
local 10.0.1.2 dev lo table local src 10.0.1.2 uid 1001
cache <local>
Detecting The Issue
Detect incoming load balancer health checks using tcpdump
:
laurensknoll@squid-proxy-c837:~$ sudo tcpdump host 10.0.1.2 -i any
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:21:33.030400 ens5 In IP 235-0-211-130.1e100.net.59030 > 10.0.1.2.3128: Flags [S], seq 1186327794, win 65535, options [mss 1420,sackOK,TS val 3126546969 ecr 0,nop,wscale 8], length 0
13:21:33.030441 ens4 Out IP 10.0.1.2.3128 > 235-0-211-130.1e100.net.59030: Flags [S.], seq 3824338898, ack 1186327795, win 64768, options [mss 1420,sackOK,TS val 2826694419 ecr 3126546969,nop,wscale 7], length 0
13:21:33.283290 ens4 Out IP 10.0.1.2.3128 > 235-0-211-130.1e100.net.56104: Flags [S.], seq 63151303, ack 507892428, win 64768, options [mss 1420,sackOK,TS val 2826694672 ecr 3126536969,nop,wscale 7], length 0
13:21:33.355735 ens5 In IP 245-0-211-130.1e100.net.40674 > 10.0.1.2.3128: Flags [S], seq 2840299953, win 65535, options [mss 1420,sackOK,TS val 3570372660 ecr 0,nop,wscale 8], length 0
13:21:33.355790 ens4 Out IP 10.0.1.2.3128 > 245-0-211-130.1e100.net.40674: Flags [S.], seq 3518750145, ack 2840299954, win 64768, options [mss 1420,sackOK,TS val 930125414 ecr 3570372660,nop,wscale 7], length 0
13:21:33.539284 ens4 Out IP 10.0.1.2.3128 > 245-0-211-130.1e100.net.34284: Flags [S.], seq 1263115617, ack 440200841, win 64768, options [mss 1420,sackOK,TS val 930125598 ecr 3570362659,nop,wscale 7], length 0
13:21:33.860639 ens5 In IP 247-0-211-130.1e100.net.54780 > 10.0.1.2.3128: Flags [S], seq 2976663686, win 65535, options [mss 1420,sackOK,TS val 3399067719 ecr 0,nop,wscale 8], length 0
13:21:33.860680 ens4 Out IP 10.0.1.2.3128 > 247-0-211-130.1e100.net.54780: Flags [S.], seq 2634616183, ack 2976663687, win 64768, options [mss 1420,sackOK,TS val 921309107 ecr 3399067719,nop,wscale 7], length 0
...
The tcpdump
output shows incoming health checks (130.211.0.0/22
) on the secondary interface: ens5 In
. Responses, however, are sent via the primary interface: ens4 Out
. Since the health check did not connect to this interface, the packet is not properly returned, causing the Unhealthy
backend state:
Not seeing any return traffic? Probably the Reverse Path filter drops the packets. Check if the
netstat -s | grep IPReversePathFilter
-counter increases.
Steering The Load Balanced Traffic
A custom route table and routing policy is configured to return load balanced traffic via the secondary network interface:
- Add route table with secondary interface as default gateway
Identify the secondary interface gateway address:
laurensknoll@squid-proxy-c837:~$ ip route | grep 'dev ens5'
..
10.0.1.1 dev ens5 scope link
10.0.1.1 dev ens5 proto dhcp scope link src 10.0.1.4 metric 100
..
Create the nlb
route table that uses the secondary interface as default gateway:
laurensknoll@squid-proxy-c837:~$ echo "10 nlb" | sudo tee -a /etc/iproute2/rt_tables
laurensknoll@squid-proxy-c837:~$ sudo ip route add default via 10.0.1.1 dev ens5 table nlb
- Configure load-balanced traffic routing policy
Tell the kernel to route packets with the load balancer source address to the nlb
route table:
laurensknoll@squid-proxy-c837:~$ sudo ip rule add from 10.0.1.2 table nlb
- Validating health check traffic routing
Verify that the kernel routes load-balanced traffic to secondary interface gateway:
laurensknoll@squid-proxy-c837:~$ ip route get from 10.0.1.2 to 130.211.0.0
130.211.0.0 from 10.0.1.2 via 10.0.1.1 dev ens5 table nlb uid 1001
cache
Image by Reto Scheiwiller from Pixabay