Session Type
Breakout Session
Name
Unified Schema Registry: Ensure schema consistency across data systems
Date
Wednesday, May 21, 2025
Time
1:00 PM - 1:45 PM
Location Name
Breakout Room 5
Description

Schema Registry is the backbone of safe schema evolution and efficient data transfer in the Kafka ecosystem. Our data infrastructure spans online APIs (gRPC/Rest.li), databases (MySQL/Oracle/Espresso/TiDB), and powerful streaming and ingestion frameworks built on Kafka. From real-time ETL to OLAP systems like Pinot, from tracking and metrics pipelines feeding into Hadoop, to offline jobs pushing insights into derived stores like Venice; Kafka is the key cog in our data flywheel. As data moves through LinkedIn’s ecosystem, it traverses multiple schema languages and serialization formats, evolving—sometimes seamlessly, sometimes with transformation. But what happens when an upstream schema change inadvertently breaks downstream systems? Traditional siloed validation falls short, leading to data corruption, operational disruptions, and painful debugging. Join us as we dive into LinkedIn’s Universal Schema Registry (USR) solution for seamless, large-scale schema validation. 1. End-to-End Schema Compatibility – USR validates schemas holistically across RPC, Kafka, Espresso, Venice, and more, safeguarding the entire data lineage. 2. CI-Integrated Early Validation – A shift-left approach catches issues at schema authoring time, saving thousands of developer hours and avoiding costly data migrations. 3. Multi-Format Schema Mapping – Supports complex transformations across Proto, Avro, and more, enabling automated migrations like LinkedIn’s move from Rest.li to gRPC for online RPCs. Join us as we share LinkedIn’s journey of building and scaling USR to handle millions of validations every week, the challenges faced, and the best practices that keep our data ecosystem resilient.

Souman Mandal Sarthak Jain
Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer
Industry
IT, Retail/E-Commerce, Technology
Tags
Apache Flink, Apache Iceberg, Apache Kafka, Architecture, CDC, Data Catalog, Integration, Stream Processing