Event-driven retail backend — clearing 10K+ transactions/day
A multi-channel commerce backend where every state change is idempotent and auditable. Go + TypeScript on Kafka, outbox pattern, exactly-once effects.
The problem
In multi-channel commerce the same order can arrive twice, a partner webhook can fire three times, a network blip can drop an event mid-flight. The ledger still has to stay exactly right and auditable under load.
The solution
BFF + Broker + Dispatcher. The broker decouples producers from consumers. The transactional outbox publishes each event exactly once per committed write, and idempotency keys dedupe at every effect boundary. Kafka carries the event log; Go and TypeScript services consume it. Retries are safe by construction, so partner flakiness degrades gracefully instead of corrupting state.
- Constraint
- A 1M+ LOC monolith couldn't scale checkout under load and lacked a regulator-traceable audit path. The cutover couldn't stop the business selling. The same order can arrive twice and a partner webhook can fire repeatedly, yet the ledger must stay exactly right (checkout p99 ≤500ms, BACEN 3.978-traceable) across an 18-month migration.
- Decision
- Strangler-fig the monolith. Carve domain capabilities into a BFF + Broker + Dispatcher. Decouple producers from consumers, publish via a transactional outbox (exactly-once per committed write), make every downstream effect idempotent so retries are safe by construction. Rejected a big-bang rewrite (the business can't stop selling), synchronous point-to-point calls (cascading failure), and at-least-once delivery without dedupe (double effects).
- Outcome
- A 1M+ LOC monolith migrated over 18 months without halting sales. 10K+ transactions/day (~250 orders/min peak), checkout p99 ≤500ms, BACEN 3.978-traceable. Retry-safe exactly-once effects mean partner flakiness degrades gracefully instead of corrupting the ledger.
Overview
An event-driven backend clearing 10K+ transactions/day (~250 orders/min at peak) across multiple commerce surfaces, carved out of a 1M+ LOC Rails monolith over an 18-month strangler-fig migration. A BFF fronts the channels. A broker fans domain events out to dispatchers. The outbox publishes each event exactly once per committed transaction, and idempotency keys make every downstream effect safe to retry. Checkout p99 stayed ≤500ms under load. The audit path traces to BACEN Circular 3.978/2020 throughout.