Kappa Architecture – Another Way of Data Processing

Jul 20, 2019 min read

Kappa architecture was proposed by Jay Kreps (co-creator of Apache Kafka) as a simplification of the Lambda architecture. The core idea: remove the batch layer entirely and treat everything as a stream.

Kappa Architecture Diagram

The Core Concept

In Lambda architecture, you maintain two separate processing paths — batch and streaming — which leads to operational complexity. Kappa asks: what if we just used the speed layer for everything?

In Kappa:

  • All data is processed as a stream
  • Historical reprocessing is done by replaying the stream from the beginning
  • A single codebase handles both real-time and historical data

Key Components

Stream Processing Layer: Handles all data processing in real time. Technologies like Apache Kafka, Apache Flink, or Spark Streaming are commonly used.

Serving Layer: Stores the output of stream processing for querying. This can be a database, data warehouse, or any queryable store.

Message Queue / Log: A durable, replayable log (like Kafka) is essential — it allows you to reprocess historical data by replaying from any point in time.

Advantages Over Lambda

  • Simpler codebase: One processing path instead of two
  • Easier maintenance: No need to keep batch and streaming logic in sync
  • Reprocessing: Just replay the stream with updated logic

Trade-offs

  • Requires a robust, scalable message queue that can retain data long-term
  • May not be as cost-effective for very large historical datasets compared to batch processing on cheap storage
  • Latency for full historical reprocessing can be high

When to Use Kappa

Kappa is a good fit when:

  • Your use case is primarily stream-oriented
  • You want to minimize operational complexity
  • You have a reliable, replayable message queue in place

See also: Lambda Architecture and Delta Architecture.