Session Type
Breakout Session
Name
Blur the line between real-time and batch with Apache Kafka, Druid, and Iceberg
Date
Wednesday, May 21, 2025
Time
9:00 AM - 9:45 AM
Location Name
Breakout Room 5
Description

Ever since Apache Kafka spearheaded the real-time revolution, there has been a real-time vs batch divide in the data engineering community. The tools, architectures, and mindsets were so different that most people worked with one or the other and companies had to effectively maintain two data engineering teams to meet their data processing needs. But the rise of Apache Iceberg is bringing a dramatic shift in the data landscape. We have batch data powerhouses, like Snowflake and Databricks racing to adopt Iceberg support, followed by streaming tools like Apache Flink, and Confluent, arguably the leader in real-time data, adopting Iceberg with its Tableflow product. Now, real-time databases, like Apache Druid, are integrating Iceberg as well, so that we can query both our real-time and batch data with a single tool, often in a single query. I believe we really are seeing a revolution in data engineering. In this session, we’ll take a look at three key players in this data revolution, Kafka, Druid, and Iceberg. We’ll start with a brief introduction to each tool and then we’ll see some examples of architectures that allow us to get the most value from our data regardless of how old it is. Finally, we’ll talk about where this might be heading and how we, as data engineers, can thrive in this brave new world. It is my hope that you’ll leave this session with an understanding of some key tools, architectural patterns, and ways of looking at data that will equip you to, more efficiently, deliver the quality data your organization needs.

Dave Klein
Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer
Tags
Apache Kafka