Session Type
Breakout Session
Name
The Latency-Cost equation for Disaggregated Architectures
Date
Tuesday, May 20, 2025
Time
3:30 PM - 4:15 PM
Location Name
Breakout Room 2
Description
There’s a shift towards disaggregated architectures using object storage and open table formats. Cost efficiency, avoidance of vendor lock-in, standardization, and proper governance with a single source of truth are benefits of this new paradigm. However, there are also challenges. Most of our systems have been designed to work with physical disks, with their own optimization and debugging methods. Object storage works in a totally different way than physical disks and requires a new set of capabilities to minimize latency and decrease cloud costs. In this talk, Anton will share the lessons learned from moving data and systems from block storage to object storage. Using Apache Flink, a popular stream processing engine often used for data lake ingestion, as a case study, we’ll start with an overview of Iceberg and the FileIO pluggable module for reading, writing, and deleting files. We’ll continue with the journey of cost optimization with the Flink File Connector. Then, we'll delve into the creation of a custom Flink connector for object storage, addressing the limitations of the built-in File Connector. This custom connector uses techniques like metadata synchronization and optimized partitioning to reduce the number of requests without introducing additional latency. This talk is ideal for data engineers and architects who are building data lakes on object storage and using Apache Flink for data processing. You'll learn practical strategies and best practices for optimizing performance and cost in disaggregated architectures, including how to build custom Flink connectors tailored to object storage.
Antón Rodríguez
Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer, Operator/Administrator, Executive (Technical), Executive (Non-technical)
Industry
Technology, Telecommunications, IT
Tags
Apache Flink, Architecture, Cloud, Integration, Operations, Storage, Stream Processing, Systems, Tales from the trenches, Apache Iceberg