Go Context Cancellation: Why Your Workers Never Stop
Our service memory kept growing after every deploy.
Heap profiles showed old worker goroutines still alive long after shutdown.
Leak Pattern
Workers were spawned with context.Background() instead of request/service lifecycle context:
go runWorker(context.Background(), jobs) // detached forever
These workers ignored shutdown signals and kept polling, logging, and allocating.
Correct Lifecycle Wiring
Tie worker context to the process context and respect ctx.Done().
func runWorker(ctx context.Context, jobs <-chan Job) {
for {
select {
case <-ctx.Done():
return
case job := <-jobs:
handle(job)
}
}
}
At startup:
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go runWorker(ctx, jobs)
Production Guardrails
- Expose
runtime.NumGoroutine()metric. - Add shutdown test asserting goroutine count returns near baseline.
- Never create long-lived goroutines from request handlers without explicit ownership.
What Went Wrong in My Incident
- What alerted first: Memory and goroutine counts climbed after each deploy instead of resetting.
- What misled us: We initially blamed traffic growth and GC tuning.
- What confirmed root cause: Goroutine dumps showed long-lived workers rooted in
context.Background()with no cancellation path.
In Go, every goroutine needs an owner and an exit path.