Session Details: Current London 2025

Session Type

Breakout Session

Name

Blur the line between real-time and batch with Apache Kafka, Druid, and Iceberg

Date

Wednesday, May 21, 2025

Time

9:00 AM - 9:45 AM

Location Name

Breakout Room 5

Description

Ever since Apache Kafka spearheaded the real-time revolution, there has been a real-time vs batch divide in the data engineering community. The tools, architectures, and mindsets were so different that most people worked with one or the other and companies had to effectively maintain two data engineering teams to meet their data processing needs. But the rise of Apache Iceberg is bringing a dramatic shift in the data landscape. We have batch data powerhouses, like Snowflake and Databricks racing to adopt Iceberg support, followed by streaming tools like Apache Flink, and Confluent, arguably the leader in real-time data, adopting Iceberg with its Tableflow product. Now, real-time databases, like Apache Druid, are integrating Iceberg as well, so that we can query both our real-time and batch data with a single tool, often in a single query. I believe we really are seeing a revolution in data engineering. In this session, we’ll take a look at three key players in this data revolution, Kafka, Druid, and Iceberg. We’ll start with a brief introduction to each tool and then we’ll see some examples of architectures that allow us to get the most value from our data regardless of how old it is. Finally, we’ll talk about where this might be heading and how we, as data engineers, can thrive in this brave new world. It is my hope that you’ll leave this session with an understanding of some key tools, architectural patterns, and ways of looking at data that will equip you to, more efficiently, deliver the quality data your organization needs.

Speakers

Dave Klein, Imply

Level

Intermediate

Target Audience

Architect, Data Engineer/Scientist, Developer

Tags

Apache Kafka