Session Type
Breakout Session
Name
Tableflow: Not Just Another Kafka-to-Iceberg Connector!
Date
Tuesday, May 20, 2025
Time
4:30 PM - 5:15 PM
Location Name
Breakout Room 3
Description

Ingesting data from Apache Kafka into Apache Iceberg presents a recurring challenge in modern ETL workflows. The conventional approach relies on connectors, yet this method introduces operational hurdles due to the fundamental differences between these systems. Kafka excels at real-time streaming workloads, while Iceberg is optimized for analytical data storage and batch ingestion. Bridging these paradigms creates several inefficiencies:

  1. Batch Operations on Streaming Storage: Attempting batch operations on Kafka, a system designed for streaming, results in ingestion bottlenecks and increased strain on Kafka brokers. One example is initial table hydration, where historical data retrieval often means uncached reads. This significantly delays topic-to-table hydration, impacting broker performance and straining resources in latency-sensitive environments.
  2. Streaming Operations on Batch Storage: Applying streaming-like ingestion patterns to Iceberg generates numerous small Parquet files. These files pollute Iceberg’s metadata, degrade query performance, and increase the need for maintenance operations.
  3. Lack of Unified Table Maintenance: Aggressive creation of small files containing updates will conflict with maintenance operations running in the background, leading to wasteful retries.

In this talk, Alex will share insights and lessons learned from building Tableflow, a unified batch/streaming storage system that allowed us to address all three. He will talk about specific solutions implemented in the Kora storage engine that mitigate these issues, making both systems work cohesively.  Attendees will gain actionable knowledge on overcoming operational challenges, implementing innovative solutions, and designing scalable pipelines that maximize the potential of both Kafka and Iceberg.

Alex Sorokoumov
Level
Advanced
Target Audience
Architect, Developer, Operator/Administrator
Industry
Technology
Tags
Apache Iceberg, Apache Kafka, Architecture, Storage