Session Type
Breakout Session
Name
Streaming with Apache Iceberg
Date
Tuesday, May 20, 2025
Time
1:00 PM - 1:45 PM
Location Name
Breakout Room 4
Description
Streaming data is a critical component of modern data architectures. This talk explores how to determine your streaming needs and design a robust solution using Apache Iceberg, a next-generation table format built for flexibility and scalability. We’ll dive into the foundational tools that enable streaming pipelines, including Apache Flink, Apache Kafka, Debezium, Kafka Connect, and Apache Spark, breaking down their roles and use cases in processing, transporting, and transforming streaming data. The talk will also highlight Iceberg-specific considerations, such as managing compaction to optimize query performance and dealing with delete files for handling record-level updates and deletes. Whether you’re building real-time analytics, powering machine learning models, or streaming raw data into your data lakehouse, this session will provide actionable insights and best practices for building reliable and efficient streaming workflows with Apache Iceberg.
Will Martin
Level
Introductory
Target Audience
Architect, Data Engineer/Scientist, Developer
Industry
Advertising/Media, Banking/Finance, Education, Energy/Utilities, Entertainment, Gaming, Healthcare, Hospitality, Government, Retail/E-Commerce, Technology, Manufacturing, Telecommunications, Transportation, Insurance, IT
Tags
Analytics, Apache Flink, Apache Iceberg, Apache Kafka, Architecture, Kafka Connect, CDC, Kafka Streams