Go HTTP Timeouts: One Timeout Is Not Enough
Our API had a timeout. Still, goroutines piled up and latency kept rising.
Why? Because we configured only one timeout and assumed it covered everything.
Timeout Layers You Need
Different failure modes happen at different phases:
- TCP dial hangs
- TLS handshake stalls
- Server delays response headers
- Body read drags forever
Better Client Configuration
transport := &http.Transport{
DialContext: (&net.Dialer{
Timeout: 2 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 2 * time.Second,
ResponseHeaderTimeout: 3 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
client := &http.Client{
Timeout: 5 * time.Second, // total request budget
Transport: transport,
}
Then wrap each request with a context deadline that matches business SLA.
Lesson Learned
What Went Wrong in My Incident
- What alerted first: Tail latency and goroutine counts rose during partial network degradation.
- What misled us: We had a global timeout set, so we assumed timeout coverage was complete.
- What confirmed root cause: Phase-level tracing showed stalls in dial/handshake/header phases without proper per-phase limits.
Timeouts are not one number. They are a contract for each network phase.
If you do not define that contract, the kernel and defaults will define it for you.