Transaction Isolation in Go: The Write-Skew Incident
We had a quota rule: at least one on-call engineer must remain active per team.
Two admins updated different rows at the same time. Both transactions passed validation. End result: zero active on-call engineers.
Why Validation Failed
Each transaction did:
SELECT count(*) WHERE active = true- If count > 1, deactivate one engineer
- Commit
Under READ COMMITTED, both saw the same pre-update snapshot and both committed.
Safer Pattern
Lock the rows participating in the invariant.
tx, _ := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
defer tx.Rollback()
rows, err := tx.QueryContext(ctx, `
SELECT id FROM team_oncall
WHERE team_id = $1 AND active = true
FOR UPDATE
`, teamID)
if err != nil {
return err
}
// validate then update inside same locked scope
Practical Notes
SERIALIZABLEis safest but can raise retryable serialization errors.FOR UPDATEcan be enough for targeted invariants.- Always encode critical business invariants close to the write path, not only in async jobs.
What Went Wrong in My Incident
- What alerted first: An invariant alert fired showing impossible team state in production.
- What misled us: Manual replay of each transaction separately always passed validation.
- What confirmed root cause: Running concurrent transactions under
READ COMMITTEDreproduced write skew consistently.
Race conditions are not only in Go code. They are also in SQL behavior.