Session Details: Current London 2025

Lightning Talk (15 minutes)

Name

Ensuring Client Continuity in Kafka: High Availability in Confluent Kafka

Date

Wednesday, May 21, 2025

Time

11:00 AM - 11:15 AM

Location Name

Breakout Room 5

Description

Managing large-scale Kafka clusters is both a technical challenge and an art. At Trendyol, our Data Streaming team operates Kafka as the backbone of a vast event-driven ecosystem, ensuring stability and seamless client experiences. However, we faced recurring issues during broker restarts—applications experienced connectivity errors due to misconfigured topics and improper bootstrap server configurations. To address this, we leveraged Confluent Stretch Kafka across multiple data centers, enabling automatic leader elections without service disruptions. Additionally, we enforced topic creation and alter policies and built a custom Prometheus exporter to detect misconfigured topics in real time, allowing us to notify owners and take corrective actions proactively. Through rigorous alerting mechanisms and enforcement via our Internal Development Platform (IDP), we have successfully eliminated disruptions during broker restarts, enabling smooth cluster upgrades and chaos testing. This session will provide practical insights into architecting resilient Kafka deployments, enforcing best practices, and ensuring high availability in a production environment handling thousands of clients.

Attendees will learn:

How multi-DC Kafka clusters ensure client continuity
The impact of misconfigured replication factors and how to prevent them
How real-time monitoring and alerts reduce operational risks
Practical strategies to enforce resilient topic configurations

Speakers

Yalın Doğu Şahin, DSM GRUP DANIŞMANLIK İLETİŞİM VE SATIŞ TİCARET ANONİM ŞİRKETİ
Mehmetcan Güleşçi, Trendyol Group

Intermediate

Audience

Architect, Developer, Operator/Administrator

Industry

IT, Retail/E-Commerce, Technology

Tags

Apache Kafka, Architecture, Event-Driven Systems, Integration, Operations, Systems