Go and Third-Party APIs: Surviving Partial Failures
Payment succeeded. Inventory reserved. Shipping label creation failed.
Welcome to partial failure: the normal state of distributed workflows.
Why This Hurts
If you model a multi-step workflow as one happy-path transaction, you end up with stranded state when any external step fails.
Practical Saga Pattern
Model each step with a forward action and a compensation action:
- Reserve inventory
- Charge payment
- Create shipment
If step 3 fails, compensate:
- Refund payment
- Release inventory
Go Structure
type Step struct {
Do func(context.Context) error
Undo func(context.Context) error
}
func RunSaga(ctx context.Context, steps []Step) error {
completed := make([]Step, 0, len(steps))
for _, s := range steps {
if err := s.Do(ctx); err != nil {
for i := len(completed) - 1; i >= 0; i-- {
_ = completed[i].Undo(ctx)
}
return err
}
completed = append(completed, s)
}
return nil
}
Production Tips
- Persist saga state so recovery survives process restarts.
- Make compensations idempotent.
- Alert on stuck sagas, not just failed requests.
What Went Wrong in My Incident
- What alerted first: Support tickets reported failed orders with successful payment captures.
- What misled us: Request status showed failure, so we assumed no side effects were committed.
- What confirmed root cause: Step-by-step audit logs exposed completed upstream actions without matching compensations.
Strong systems are not the ones that avoid partial failure. They are the ones designed to recover from it.