Session Type
Breakout Session
Name
Towards Transactional Buffering of Change Data Capture Events
Date
Wednesday, May 21, 2025
Time
3:00 PM - 3:45 PM
Location Name
Breakout Room 3
Description

Data pipelines built on top of change data capture (CDC) are gaining ever more traction and power many different real-time applications these days. The standard way CDC solutions operate is to propagate captured data changes as separate events, which are typically consumed one by one and as is by downstream systems. In this talk, we are taking a deep dive to explore CDC pipelines for transactional systems to understand how the direct consumption of individually published CDC events impacts data consistency at the sink side of data flows. In particular, we'll learn why the lack of transactional boundaries in change event streams may well lead to temporarily inconsistent state--such as partial updates from multi-table transactions--that never existed in the source database. A promising solution to mitigate this issue is aggregating CDC events based on their original transactional context. To demonstrate the practical aspects of this approach, we'll go through a concrete end-to-end example showing:

  • how to configure Debezium to enrich captured change events from a relational database with transaction-related metadata
  • an experimental Apache Flink stream processing job to buffer CDC events based on transactional boundaries
  • a bespoke downstream consumer to atomically apply transactional CDC event buffers into a target system

If you have ever wondered how to tackle the often-neglected problem of temporarily inconsistent state when consuming change event streams originating from relational databases, this session is for you!

Hans-Peter Grahsl
Level
Intermediate
Target Audience
Architect, Developer
Tags
Apache Flink, Event-Driven Systems, Integration, Stream Processing