Session Details: Current London 2025

x

Session Details

Session Type

Breakout Session

Name

Five Performance Optimization Techniques you need in a Lakehouse (Iceberg, Hudi, Delta)

Date

Tuesday, May 20, 2025

Time

5:30 PM - 6:15 PM

Location Name

Breakout Room 3

Description

Optimizing performance when dealing with large-scale datasets in table formats such as Delta Lake, Apache Iceberg & Apache Hudi is a tough problem to solve. As data volumes grow, querying effectively requires deliberate tuning and optimization strategies. While your queries might perform well today, they may not stay fast over time. Because over time:

Query patterns evolve.
New, more complex queries are introduced.
Poorly organized or excessively small files can slow things down.

That’s why it’s essential to adopt techniques to structure your data effectively in storage. The goal is simple: Reduce the number of files your query engine has to scan. After all, the less data you read, the faster your queries can run! In this session, we will go over 5 optimization methods - partitioning, compaction, clustering, cleaning & data skipping applicable to open table formats to enhance query performance based on real-world learnings.

Speakers

Dipankar Mazumdar, Onehouse

Level

Introductory

Target Audience

Architect, Data Engineer/Scientist

Tags

Apache Iceberg, Architecture, Storage

Close