Session Type
Breakout Session
Name
Towards an Open Apache Kafka Performance Model
Date
Wednesday, May 21, 2025
Time
3:00 PM - 3:45 PM
Location Name
Breakout Room 7
Description

Imagine having a powerful, customizable model that brings the end-to-end flow of records through your Kafka applications and clusters to life. Picture a tool that allows you to swiftly and affordably understand and predict the performance, scalability, and resource demands of your entire system. With this model, you can explore “what if” scenarios, such as changes to workloads, application and cluster hardware, Kafka configurations, and even dependencies on external systems.

This vision is closer than you think. In this talk, we’ll introduce a simple Kafka performance model and demonstrate its application to Kafka tiered storage sizing. Whether you’re using SSD or EBS local storage or S3 remote storage, this model can predict IO, network requirements, the size and number of brokers, and storage space needs.

But this is just the beginning. We’ll unveil the potential of a fully-featured open Kafka performance model. Discover how it could work, what it could do, and the approaches we’re investigating to build and parameterize it. These include benchmarking workloads separately, applying multivariate regression over metrics from our largest managed Kafka clusters, leveraging Kafka client metrics (KIP-714), and utilizing OpenTelemetry traces. For visualization, we’re exploring Sankey Diagrams and integrating OpenTelemetry data into an open-source GUI.

Our goal is to democratize access to an open Kafka performance model, empowering anyone using, developing, or running Apache Kafka clusters and applications. This model will help predict end-to-end application performance, client and cluster resources, and performance SLAs. It will also aid in capacity planning, cluster sizing/re-sizing, and understanding dynamic changes for variable workloads, elastic cluster resizing, cluster failures, maintenance operations, and more. The scope could even expand to include Kafka stream processing, multiple clusters, and heterogeneous integration scenarios with Kafka Connect.

Paul Brebner
Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer, Executive (Technical), Operator/Administrator
Industry
IT, Technology
Tags
Apache Kafka, Architecture, Integration, Operations, Storage, Systems