HAProxy Master-Slave: Active-Passive Load Balancing with Keepalived and a Floating VIP
A single load balancer is a single point of failure dressed up in a fancy hat. The first time the HAProxy box in front of our payment service rebooted for a kernel patch at 2 AM, every retry from every upstream client piled up against a dead TCP socket for the four minutes it took to come back. That outage is what pushed us from “one HAProxy, hope for the best” to a proper master-slave (active-passive) pair with a floating IP.
This is the setup, the config that actually shipped, and the failure modes I learned the hard way.
The shape of the thing
Two HAProxy nodes. One virtual IP (VIP) that sits on whichever node is currently the master. Keepalived runs on both nodes and uses VRRP (Virtual Router Redundancy Protocol) to negotiate who holds the VIP. DNS points at the VIP, never at either node directly.
clients
│
┌──────▼──────┐
│ VIP │ 10.0.0.100 (floats)
└──────┬──────┘
│
┌─────────────────┼─────────────────┐
│ │
┌────▼────┐ ┌────▼────┐
│ lb-01 │ ◀── VRRP heartbeat ──▶ │ lb-02 │
│ MASTER │ │ BACKUP │
│ HAProxy │ │ HAProxy │
│ Keepalvd│ │ Keepalvd│
└────┬────┘ └────┬────┘
│ │
└────────────┬──────────────────────┘
│
┌──────▼──────┐
│ app-01..N │ (real backends)
└─────────────┘
The key idea: only one node holds the VIP at a time. HAProxy on the backup is running, configured identically, with health checks live, but no traffic reaches it because the VIP isn’t bound to its interface. When the master dies, Keepalived on the backup notices the missed VRRP advertisements, promotes itself, and brings up the VIP on its own NIC. Convergence is sub-second on a healthy network.
Why floating VIP and not DNS round-robin
DNS round-robin sounds simpler. Two A records, both LBs in DNS, clients pick one. The problem is DNS caching. Resolvers, OS stub resolvers, JVM InetAddress cache, language runtimes — they all cache, often ignoring TTL. When a node dies, a non-trivial fraction of clients keep hammering the dead address for minutes to hours.
A floating VIP is at the network layer. The IP itself moves. Existing TCP sessions on the dead node are gone (you can’t preserve those without session sync, which HAProxy doesn’t do for TCP), but new connections to the same IP land on the new master immediately because ARP gets updated by a gratuitous ARP from Keepalived.
The trade-off: VIP requires both nodes to be on the same L2 segment. If your LBs are in different VPCs or different AZs without an L2 stretch, you need a different approach (BGP/anycast, cloud LB in front, or DNS with low TTL and aggressive client-side retries).
HAProxy config
Same config on both nodes. Identical. The only thing that differs is Keepalived. Here’s a trimmed version of what we run:
# /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
maxconn 50000
user haproxy
group haproxy
daemon
stats socket /run/haproxy/admin.sock mode 660 level admin
# Bind to a non-local address so HAProxy can listen on the VIP
# even when Keepalived hasn't assigned it yet.
# Without this, starting HAProxy on the backup fails with EADDRNOTAVAIL.
# Set via sysctl: net.ipv4.ip_nonlocal_bind = 1
defaults
log global
mode http
option httplog
option dontlognull
option forwardfor
option http-server-close
timeout connect 5s
timeout client 60s
timeout server 60s
timeout http-request 10s
retries 3
frontend fe_https
bind 10.0.0.100:443 ssl crt /etc/haproxy/certs/site.pem alpn h2,http/1.1
bind 10.0.0.100:80
http-request redirect scheme https code 301 unless { ssl_fc }
default_backend be_app
backend be_app
balance roundrobin
option httpchk GET /healthz
http-check expect status 200
default-server inter 2s fall 3 rise 2 maxconn 1000
server app-01 10.0.1.11:8080 check
server app-02 10.0.1.12:8080 check
server app-03 10.0.1.13:8080 check
Two things worth pinning to the wall:
net.ipv4.ip_nonlocal_bind = 1 on both nodes. HAProxy on the backup needs to bind to 10.0.0.100:443 even though the VIP isn’t on its interface yet. Without this sysctl, HAProxy fails to start on the backup with Cannot assign requested address. Set it permanently:
echo 'net.ipv4.ip_nonlocal_bind = 1' | sudo tee /etc/sysctl.d/99-haproxy.conf
sudo sysctl --system
Health checks (option httpchk) are non-negotiable. Without them HAProxy will happily route requests to a backend that’s CPU-pegged, deadlocked, or returning 500s, because as far as TCP is concerned the socket is open. httpchk plus an expect status 200 makes HAProxy actually probe the app’s /healthz endpoint.
Keepalived config — master
# /etc/keepalived/keepalived.conf on lb-01 (master)
global_defs {
router_id LB_01
enable_script_security
script_user root
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy" # exit 0 if process exists
interval 2
weight -20 # if HAProxy dies, drop priority by 20
fall 2
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass <change-me>
}
virtual_ipaddress {
10.0.0.100/24 dev eth0
}
track_script {
chk_haproxy
}
}
Keepalived config — backup
Identical except for state, priority, and router_id:
# /etc/keepalived/keepalived.conf on lb-02 (backup)
global_defs {
router_id LB_02
enable_script_security
script_user root
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -20
fall 2
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51 # MUST match master
priority 100 # lower than master
advert_int 1
authentication {
auth_type PASS
auth_pass <change-me> # MUST match master
}
virtual_ipaddress {
10.0.0.100/24 dev eth0
}
track_script {
chk_haproxy
}
}
The vrrp_script is what makes this an active-passive load balancer pair and not just an active-passive IP failover. Without it, Keepalived only watches the network — if HAProxy crashes but the kernel is fine, the VIP stays on the dead master and traffic blackholes. With track_script, a dead HAProxy drops the master’s effective priority below the backup’s, the backup wins the election, and the VIP migrates.
Testing failover, properly
Three failure modes to exercise. If you only test one, you don’t have HA, you have a placebo.
1. Process death. SSH to the master, sudo systemctl stop haproxy. Within ~4 seconds (interval 2 × fall 2) the VIP should move to the backup. Confirm with ip addr show eth0 on both nodes and arping 10.0.0.100 from a third host.
2. Node death. Hard reboot the master (sudo reboot is too graceful — try echo b | sudo tee /proc/sysrq-trigger). Backup should take over within ~3 seconds (3 missed VRRP adverts at advert_int 1).
3. Network partition. This is the nasty one. Block VRRP between the nodes with iptables:
# on master
sudo iptables -A INPUT -p vrrp -j DROP
sudo iptables -A OUTPUT -p vrrp -j DROP
Both nodes now think they’re master. Both bring up the VIP. This is split-brain. Upstream switches see two MACs claiming the same IP and either flap or pin to whichever ARP they saw last. Traffic distribution becomes random.
Mitigations:
- Use
unicast_peerinstead of multicast VRRP. It still doesn’t fix true partitions, but it removes one class of switch-level VRRP issues. - Add a second tracking script that checks reachability of a third host (a gateway, a DB) so a node that’s network-isolated demotes itself instead of claiming master.
- Monitor for “two nodes, state MASTER” in your alerting. If both LBs report MASTER for more than a few seconds, page someone.
Here’s the gateway-reachability variant that saved us once:
vrrp_script chk_gateway {
script "/bin/ping -c 1 -W 1 10.0.0.1"
interval 3
weight -30
fall 2
rise 2
}
vrrp_instance VI_1 {
# ... same as before ...
track_script {
chk_haproxy
chk_gateway
}
}
A node that can’t reach the default gateway drops priority by 30 and stops fighting for master.
Gotchas that cost me sleep
Gratuitous ARP gets dropped by some switches. When the VIP migrates, Keepalived sends GARP packets so upstream switches update their CAM tables. Some enterprise switches with aggressive ARP inspection drop these. Symptom: VIP is on the new master, but traffic still goes to the old one for 5+ minutes until ARP entries naturally expire. Fix: disable dynamic ARP inspection for the LB ports, or shorten ARP cache timeout on upstream devices.
virtual_router_id collision. VRID is a single byte that identifies the VRRP group on the L2 segment. If another team runs Keepalived on the same VLAN and picked the same VRID, you’ll get bizarre, intermittent failover. Pick a VRID, document it somewhere shared, and grep your network for collisions before deploying.
Asymmetric configs drift. Six months in, someone tweaks maxconn on lb-01 during a fire and forgets to update lb-02. When failover happens, capacity changes silently. Run a config-diff cron between the two nodes, or — better — manage both with the same Ansible/Salt role and disallow manual edits.
HAProxy reloads vs restarts. systemctl reload haproxy is graceful and keeps connections alive via socket handover. restart drops every in-flight TCP connection. Roll out config changes with reload, do them on the backup first, fail over, then do the master.
SSL cert sync. Both nodes need the same cert at the same path. If only the master has the new cert and failover happens mid-renewal, clients see cert errors. Automate cert distribution (we use a small rsync job triggered by certbot’s deploy hook).
When this isn’t the right answer
Active-passive HAProxy + Keepalived is great for: on-prem, single-DC, single-VLAN, predictable traffic, control over the network gear. It is not great for: multi-AZ cloud (use the cloud LB), global traffic (use anycast or GeoDNS), or extreme throughput where one active node bottlenecks (you want active-active with ECMP or a real LB cluster).
For a backend service handling a few thousand RPS in one datacenter, though, this setup has been boring and reliable for years — which is exactly what you want from a load balancer.