Graceful Shutdown in Go: The In-Flight Request Trap

We once shipped a harmless-looking deploy and got a wave of user complaints: successful button clicks, missing results.

No panic. No crash. Just silently dropped requests.

What Actually Happened

Our Kubernetes pod received SIGTERM, but the Go process exited too fast:

Load balancer still routed traffic for a short window.
Existing handlers were still running.
Process terminated before handlers finished.

From the client side, this looked random and impossible to reproduce.

The Wrong Shutdown Pattern

func main() {
    srv := &http.Server{Addr: ":8080", Handler: routes()}
    go srv.ListenAndServe()

    // Wait for signal...
    <-sigCh
    os.Exit(0) // Abrupt exit: active requests are cut off
}

The Production-Safe Pattern

Use http.Server.Shutdown with a timeout, and stop accepting new traffic first.

func main() {
    srv := &http.Server{Addr: ":8080", Handler: routes()}

    go func() {
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("server failed: %v", err)
        }
    }()

    <-sigCh

    ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
    defer cancel()

    if err := srv.Shutdown(ctx); err != nil {
        log.Printf("graceful shutdown failed: %v", err)
        _ = srv.Close() // hard close fallback
    }
}

Two Extra Details That Matter

Readiness first: return non-ready before shutdown so new requests stop.
Background jobs: stop consumers/workers too, or they may keep mutating state while HTTP is draining.

What Went Wrong in My Incident

What alerted first: A spike in client retries right after deploy start.
What misled us: App logs looked clean, so we blamed transient network issues.
What confirmed root cause: Correlating pod termination timestamps with failed requests showed traffic was still routed during shutdown.

Graceful shutdown is not a polish feature. It is data integrity during deploys.