Message Ordering Bugs in Go: Kafka Partitions Surprise
We assumed event order was guaranteed. It was not.
Kafka preserves order only within a partition, not across all partitions.
The Real Bug
Our producer used a random key for throughput:
key := uuid.NewString() // bad for ordered entity updates
producer.Publish("invoice.events", key, payload)
Events for the same invoice landed in different partitions. Consumers received:
InvoicePaid- then
InvoiceCreated
Downstream projections drifted and failed validation checks.
The Fix
Partition by entity identity so all related events share a partition.
key := invoiceID // stable key per aggregate/entity
producer.Publish("invoice.events", key, payload)
Consumer Hardening
Even with proper partitioning, we added:
- Version checks (
eventVersionmonotonic per entity) - Idempotency on
eventID - Dead-letter queue for impossible transitions
What Went Wrong in My Incident
- What alerted first: Projection mismatches appeared for a small subset of high-traffic entities.
- What misled us: Individual events were valid, so we suspected consumer logic before producer keys.
- What confirmed root cause: Partition/offset analysis showed same entity events split across partitions and arriving out of sequence.
Ordering bugs are nasty because each event is valid in isolation. The sequence is where corruption begins.