▲ 7 r/apachekafka
Kafka Streams at-least-once delivery - How to prevent duplicate calls to non-idempotent services?
Building a Kafka pipeline in K8s. Concerned about duplicate deliveries to non-idempotent downstream services.
My flow:
Kafka Streams → produces to topics → Kafka Connect → destinations
The problem (at-least-once delivery):
1. Kafka Streams processes message
2. Produces to output topic
3. Kafka Connect writes to MongoDB
4. Kafka Connect calls backend service API
5. Pod dies BEFORE offset commit
6. On restart: Kafka redelivers (at-least-once)
7. MongoDB: idempotent upsert (fine)
8. Backend service: Gets called AGAIN (duplicate!)
My question:
With Kafka's at-least-once delivery guarantee, messages can be redelivered on failures.
- MongoDB/Elasticsearch have idempotent upserts (fine)
- But I also call other backend services (REST APIs, payment processing, notifications) that are NOT idempotent
How do I prevent duplicate calls to non-idempotent services when Kafka redelivers?
Options I'm considering:
- A) Outbox pattern with deduplication table?
Requirements:
- Zero data loss
- No duplicate API calls to backend services
What's the standard production approach? How do you handle at-least-once delivery with non-idempotent downstream systems?
Is trusting Kafka Streams' built-in reliability enough, or should I add additional safeguards like an outbox pattern?
Looking for real-world experience from folks running Kafka Streams in production Kubernetes environments.
u/Careless_Treacle2713 — 3 days ago