Session Type
Breakout Session
Name
Kafka Topics as a Service in Data Mesh at Netflix
Date
Tuesday, May 20, 2025
Time
4:15 PM - 5:15 PM
Location Name
Breakout Room 5
Description

Datamesh at Netflix currently manages ~130 multi-tenant kafka clusters in production which process ~120 gigabytes per second of data. Managing Kafka topics in multi-tenant environments presents significant challenges, particularly when it comes to scalability, diversity of use cases, and resource optimization. This talk introduces the architecture implemented by Datamesh to automatically provision and manage Kafka topics across multi-tenant managed clusters. Key components of our service architecture include: 1. Bulkheading: Clusters are segmented based on three critical dimensions: throughput, position in the streaming pipeline, and use case requirements. This bulkhead approach ensures that each cluster is optimized for its workload, enhancing efficiency, and reliability. 2. Dynamic Topic Provisioning: The service automates the creation and configuration of Kafka topics, tailored to the specific needs of each bulkhead. 3. Robust Topic Mapping: A simple mapping system enables producers to automatically discover the appropriate cluster for their topics. 4. Seamless Topic Migration: Our architecture supports transparent topic moves between clusters, minimizing disruption to producers and consumers and maintaining high availability. This allows us to continuously rebalance traffic across our clusters in a cost efficient manner. This presentation will delve into the technical intricacies of our service architecture. We will discuss the simplicity of the user facing abstraction as well as the scalability, efficiency, and reliability benefits for multi-tenant Kafka deployments.

Timothy Farkas
Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer, Operator/Administrator
Industry
IT, Technology
Tags
Apache Flink, Apache Kafka, Architecture, CDC, Event-Driven Systems, Operations, Stream Processing, Systems