Go Context Cancellation: Why Your Workers Never Stop

Our service memory kept growing after every deploy.

Heap profiles showed old worker goroutines still alive long after shutdown.

Leak Pattern

Workers were spawned with context.Background() instead of request/service lifecycle context:

go runWorker(context.Background(), jobs) // detached forever

These workers ignored shutdown signals and kept polling, logging, and allocating.

Correct Lifecycle Wiring

Tie worker context to the process context and respect ctx.Done().

func runWorker(ctx context.Context, jobs <-chan Job) {
    for {
        select {
        case <-ctx.Done():
            return
        case job := <-jobs:
            handle(job)
        }
    }
}

At startup:

ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go runWorker(ctx, jobs)

Production Guardrails

Expose runtime.NumGoroutine() metric.
Add shutdown test asserting goroutine count returns near baseline.
Never create long-lived goroutines from request handlers without explicit ownership.

What Went Wrong in My Incident

What alerted first: Memory and goroutine counts climbed after each deploy instead of resetting.
What misled us: We initially blamed traffic growth and GC tuning.
What confirmed root cause: Goroutine dumps showed long-lived workers rooted in context.Background() with no cancellation path.

In Go, every goroutine needs an owner and an exit path.