Session Type
Breakout Session
Name
Five Performance Optimization Techniques you need in a Lakehouse (Iceberg, Hudi, Delta)
Date
Tuesday, May 20, 2025
Time
5:30 PM - 6:15 PM
Location Name
Breakout Room 3
Description

Optimizing performance when dealing with large-scale datasets in table formats such as Delta Lake, Apache Iceberg & Apache Hudi is a tough problem to solve. As data volumes grow, querying effectively requires deliberate tuning and optimization strategies. While your queries might perform well today, they may not stay fast over time. Because over time:

  • Query patterns evolve.
  • New, more complex queries are introduced.
  • Poorly organized or excessively small files can slow things down.

That’s why it’s essential to adopt techniques to structure your data effectively in storage. The goal is simple: Reduce the number of files your query engine has to scan. After all, the less data you read, the faster your queries can run! In this session, we will go over 5 optimization methods - partitioning, compaction, clustering, cleaning & data skipping applicable to open table formats to enhance query performance based on real-world learnings.

Dipankar Mazumdar
Level
Introductory
Target Audience
Architect, Data Engineer/Scientist
Tags
Apache Iceberg, Architecture, Storage