Event-Driven Architecture with Apache Kafka: A Complete Guide for Developers
A complete guide to Event-Driven Architecture using Apache Kafka. Learn topics, partitions, producers, consumers, real-world patterns, and production best practices with code examples.
Event-Driven Architecture with Apache Kafka is one of the most powerful patterns for building scalable, decoupled backend systems. It's the backbone of platforms like LinkedIn, Uber, and Netflix.
I'll be honest with you. When I first heard the phrase "Event-Driven Architecture", I thought it was one of those buzzwords that big tech companies throw around to sound impressive.
But the more systems I built, the more I realized something was wrong with the traditional approach. Services were tightly coupled. A single API failure would cascade into chaos. Scaling one component meant scaling everything.
That's when Kafka changed the way I think about building systems.
In this article, I'll walk you through Event-Driven Architecture (EDA) and show you how Apache Kafka makes it practical, scalable, and maintainable. We'll use real diagrams and examples the whole way through.
What is Event-Driven Architecture (EDA)?
In a traditional request-response system, Service A directly calls Service B and waits for a response.
Traditional (Request-Response):
[Order Service] ──────► [Payment Service]
│
▼
(waits for response)
This works fine until:
- Payment Service is slow
- Payment Service is down
- You need 5 more services to react to the same order
Event-Driven Architecture flips this model.
Instead of calling services directly, a service publishes an event ("something happened") and moves on. Other services subscribe and react to those events independently.
Event-Driven:
[Order Service] ──► [Event Bus / Kafka] ──► [Payment Service]
──► [Inventory Service]
──► [Notification Service]
──► [Analytics Service]
The Order Service doesn't know or care who reacts. It just publishes the event and continues.
This is the fundamental idea.
Why Kafka?
There are plenty of message brokers out there: RabbitMQ, AWS SQS, Google Pub/Sub. But Kafka has some unique properties that make it the preferred choice for high-throughput, large-scale systems:
| Feature | Kafka | RabbitMQ | AWS SQS |
|---|---|---|---|
| Throughput | Millions/sec | Thousands/sec | Thousands/sec |
| Message Retention | Days/weeks | Until consumed | 14 days max |
| Replay Events | ✅ Yes | ❌ No | ❌ No |
| Message Ordering | Per partition | Per queue | Only FIFO queues |
| Used by | Netflix, Uber, LinkedIn | Traditional apps | AWS-native apps |
Kafka was originally built by LinkedIn to handle billions of events per day. It was later open-sourced and is now one of the most battle-tested distributed systems in the world.
Core Concepts of Apache Kafka
Before jumping into code, let's understand the building blocks.
Topics
A Topic is like a category or a folder for your events. Think of it as a named stream of related messages.
Topics:
┌─────────────────────┐
│ Topic: "orders" │ ← all order-related events
└─────────────────────┘
┌─────────────────────┐
│ Topic: "payments" │ ← all payment-related events
└─────────────────────┘
┌─────────────────────┐
│ Topic: "users" │ ← all user-related events
└─────────────────────┘
Producers
A Producer is any service that publishes messages to a Kafka topic.
Producer:
[Order Service] ──publishes──► Topic: "orders"
Consumers
A Consumer is any service that reads messages from a Kafka topic.
Consumer:
Topic: "orders" ──reads──► [Payment Service]
Topic: "orders" ──reads──► [Inventory Service]
Consumer Groups
Multiple consumers can be grouped into a Consumer Group. Kafka distributes messages across the group so each message is processed by exactly one consumer in the group.
Consumer Group: "payment-processors"
Topic: "orders"
┌──────────┐
│ Msg 1 │ ──► [Payment Worker 1]
│ Msg 2 │ ──► [Payment Worker 2]
│ Msg 3 │ ──► [Payment Worker 1]
│ Msg 4 │ ──► [Payment Worker 2]
└──────────┘
Each message processed by only ONE worker.
Workers can scale independently.
Partitions
A Topic is split into Partitions. This is Kafka's secret weapon for scalability.
Topic: "orders" with 3 Partitions
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Partition 0 │ │ Partition 1 │ │ Partition 2 │
│ Msg 1 │ │ Msg 2 │ │ Msg 3 │
│ Msg 4 │ │ Msg 5 │ │ Msg 6 │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
[Consumer 1] [Consumer 2] [Consumer 3]
More partitions = more parallelism = higher throughput.
The Full Architecture
Now let's zoom out and see how everything fits together in a real system.
Event-Driven System with Kafka:
┌──────────────────────────────────────────────────────────────┐
│ PRODUCERS │
│ [Order Service] [User Service] [Payment Service] │
└──────────┬──────────────┬──────────────┬────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ APACHE KAFKA │
│ │
│ Topic: orders Topic: users Topic: payments │
│ [P0][P1][P2] [P0][P1] [P0][P1][P2][P3] │
└──────────┬──────────────┬──────────────┬────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ CONSUMERS │
│ [Inventory] [Notifications] [Analytics] [Audit Logger] │
└──────────────────────────────────────────────────────────────┘
Each producer is completely decoupled from each consumer. Adding a new consumer (e.g., an audit logger) requires zero changes to any producer.
Real-World Example: E-Commerce Order Flow with Kafka
Let's walk through a concrete example. We'll use an e-commerce platform processing an order.
Without Kafka (Tightly Coupled)
User places order
│
▼
[Order Service]
│
├──► calls [Payment Service] (sync - waits)
│ │
│ ├──► calls [Inventory Service] (sync - waits)
│ │
│ └──► calls [Email Service] (sync - waits)
│
└──► responds to User (after ALL services complete)
Problems:
- If Email Service is down, the order fails
- Total latency = sum of all service latencies
- Hard to add new services (you have to modify Order Service)
With Kafka (Decoupled)
User places order
│
▼
[Order Service] ──publishes──► Topic: "order.created"
│
└──► immediately responds to User: "Order received!"
Meanwhile, independently:
Topic: "order.created"
│
├──► [Payment Service] processes payment
├──► [Inventory Service] reserves stock
├──► [Email Service] sends confirmation email
└──► [Analytics Service] records the event
Benefits:
- Order Service responds instantly, no waiting around
- If Email Service is down, it catches up later when it restarts
- Adding a new service requires zero changes to Order Service
Kafka Code Example: Producer and Consumer in Node.js
Let's write a simple Kafka producer and consumer using Node.js with the kafkajs library.
Setup
npm install kafkajsKafka Connection
// kafka.ts
import { Kafka } from "kafkajs";
export const kafka = new Kafka({
clientId: "ecommerce-app",
brokers: ["localhost:9092"],
});Producer: Publishing an Order Event
// order-producer.ts
import { kafka } from "./kafka";
const producer = kafka.producer();
interface OrderEvent {
orderId: string;
userId: string;
items: { productId: string; quantity: number }[];
totalAmount: number;
}
async function publishOrderCreated(order: OrderEvent) {
await producer.connect();
await producer.send({
topic: "order.created",
messages: [
{
key: order.orderId,
value: JSON.stringify(order),
},
],
});
console.log(`Order event published: ${order.orderId}`);
await producer.disconnect();
}
publishOrderCreated({
orderId: "ORD-1001",
userId: "USR-42",
items: [{ productId: "PROD-5", quantity: 2 }],
totalAmount: 1999,
});Consumer: Payment Service Reacting to Order Events
// payment-consumer.ts
import { kafka } from "./kafka";
const consumer = kafka.consumer({ groupId: "payment-service" });
async function startPaymentConsumer() {
await consumer.connect();
await consumer.subscribe({ topic: "order.created", fromBeginning: false });
await consumer.run({
eachMessage: async ({ message }) => {
if (!message.value) return;
const order = JSON.parse(message.value.toString());
console.log(`Processing payment for order: ${order.orderId}`);
console.log(`Amount: ₹${order.totalAmount}`);
// Process payment logic here
await processPayment(order.orderId, order.totalAmount);
},
});
}
async function processPayment(orderId: string, amount: number) {
console.log(`Payment of ₹${amount} processed for order ${orderId}`);
// After processing, publish a new event for downstream services
const producer = kafka.producer();
await producer.connect();
await producer.send({
topic: "payment.completed",
messages: [
{
key: orderId,
value: JSON.stringify({ orderId, status: "success" }),
},
],
});
await producer.disconnect();
}
startPaymentConsumer();Consumer: Notification Service
// notification-consumer.ts
import { kafka } from "./kafka";
const consumer = kafka.consumer({ groupId: "notification-service" });
async function startNotificationConsumer() {
await consumer.connect();
// Subscribe to multiple topics
await consumer.subscribe({ topic: "order.created", fromBeginning: false });
await consumer.subscribe({
topic: "payment.completed",
fromBeginning: false,
});
await consumer.run({
eachMessage: async ({ topic, message }) => {
if (!message.value) return;
const payload = JSON.parse(message.value.toString());
if (topic === "order.created") {
console.log(`Sending order confirmation email for ${payload.orderId}`);
}
if (topic === "payment.completed") {
console.log(`Sending payment receipt for ${payload.orderId}`);
}
},
});
}
startNotificationConsumer();Notice how NotificationService and PaymentService are completely independent. Neither one knows the other exists. Both react to the same event.
Event Sourcing with Kafka: Taking It Further
One of the most powerful patterns enabled by Kafka is Event Sourcing.
Instead of storing just the current state, you store every event that led to that state.
Traditional Database:
Order Table:
┌─────────┬──────────┬────────┐
│ orderId │ status │ total │
├─────────┼──────────┼────────┤
│ ORD-001 │ SHIPPED │ ₹1999 │
└─────────┴──────────┴────────┘
(only current state, history lost)
Event Sourcing with Kafka:
Topic: "order.events"
┌─────────────────────────────────────────────┐
│ order.created → orderId: ORD-001 │
│ payment.done → orderId: ORD-001 │
│ order.packed → orderId: ORD-001 │
│ order.shipped → orderId: ORD-001 │
└─────────────────────────────────────────────┘
(full history, can replay and rebuild state)
This gives you:
- Full audit trail of everything that happened
- Time travel: rebuild system state at any point in time
- Debug production issues by replaying events
- New services can read the entire event history and build their own view
Common Apache Kafka Patterns
Pattern 1: Fan-Out
One producer, multiple independent consumers.
[Order Service] ──► "order.created" ──► [Payment Service]
──► [Inventory Service]
──► [Email Service]
──► [Analytics Service]
Use when: Multiple services need to react to the same event independently.
Pattern 2: Event Pipeline (Chain)
Events flow through a series of processing stages.
[Raw Data] ──► "raw.events" ──► [Transformer] ──► "clean.events" ──► [Aggregator] ──► "reports"
Use when: You need to process, enrich, or transform data through multiple stages.
Pattern 3: CQRS (Command Query Responsibility Segregation)
Separate the write model (commands) from the read model (queries).
User Action
│
▼
[Write API] ──► Kafka ──► [Event Processor] ──► Write DB (Postgres)
──► [Read Model Builder] ──► Read DB (Elasticsearch/Redis)
│
▼
[Read API] ◄── User Query
Use when: Your read patterns and write patterns have very different requirements.
When to Use (and Not Use) Apache Kafka
Use Kafka When:
- You have multiple services that need to react to the same events
- You need high throughput (millions of messages/second)
- You need message replay to re-read old messages
- You need decoupling between services
- You're building microservices that should evolve independently
Don't Use Kafka When:
- You have a simple monolith, it's overkill
- You need immediate synchronous responses (use REST/gRPC instead)
- Your team is small and the operational overhead outweighs benefits
- You're building a simple CRUD app
Kafka solves real problems, but it also introduces operational complexity. Don't reach for it just because Netflix uses it.
Apache Kafka in Production: Key Things to Get Right
Running Kafka in production is not trivial. Here are the things I've learned the hard way:
1. Message Schema Management
Always use a schema for your messages. Without it, a producer change will silently break all consumers.
// Define strict schemas for your events
interface OrderCreatedEvent {
eventType: "order.created";
eventVersion: "1.0";
orderId: string;
userId: string;
totalAmount: number;
createdAt: string; // ISO timestamp
}Use Apache Avro or JSON Schema with a Schema Registry for larger teams.
2. Idempotent Consumers
Networks fail. Kafka may deliver a message more than once. Your consumers must handle duplicate events gracefully.
async function processPayment(orderId: string, amount: number) {
// Check if already processed (idempotency check)
const existing = await db.payments.findOne({ orderId });
if (existing) {
console.log(`Payment for ${orderId} already processed. Skipping.`);
return;
}
// Process and save atomically
await db.payments.create({ orderId, amount, status: "completed" });
}3. Dead Letter Queue (DLQ)
When a message fails to process repeatedly, don't lose it. Route it to a Dead Letter Topic for investigation.
Normal Flow:
Topic: "orders" ──► [Consumer] ──► processes successfully
Failure Flow:
Topic: "orders" ──► [Consumer] ──► fails 3 times ──► Topic: "orders.dlq"
│
▼
[Ops Team alerts,
manual review]
4. Monitor Consumer Lag
Consumer lag is the number of unprocessed messages in a topic. If it keeps growing, your consumers can't keep up.
# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group payment-serviceSummary: Event-Driven Architecture with Kafka
Let's recap what we covered:
| Concept | What it does |
|---|---|
| Topic | Named stream of related events |
| Producer | Publishes events to a topic |
| Consumer | Reads and reacts to events |
| Consumer Group | Distributes messages across multiple workers |
| Partition | Enables parallelism and high throughput |
| Event Sourcing | Store events instead of just state |
| Dead Letter Queue | Handle unprocessable messages safely |
Final Thoughts
Event-Driven Architecture with Kafka is not about adding complexity. It's about removing the wrong kind of complexity.
Tight coupling between services is the kind of complexity that quietly grows over time and eventually brings a system to its knees. Kafka replaces that with an event log, a single source of truth that every service can read at its own pace.
It does take some upfront investment to understand topics, partitions, consumer groups, and schemas. But once it clicks, you'll find yourself naturally thinking in events. You'll ask "what happened?" instead of "what should I call?"
Start small. Pick one flow in your system. Replace a synchronous call with an event. See how it feels.
That's how it starts for most engineers. One event at a time.
Happy Coding! 🚀
Enjoyed this article?
Get more AI engineering insights delivered to your inbox.