Session Details: Current London 2025

Session Type

Breakout Session

Name

Unified Schema Registry: Ensure schema consistency across data systems

Date

Wednesday, May 21, 2025

Time

1:00 PM - 1:45 PM

Location Name

Breakout Room 5

Description

Schema Registry is the backbone of safe schema evolution and efficient data transfer in the Kafka ecosystem. Our data infrastructure spans online APIs (gRPC/Rest.li), databases (MySQL/Oracle/Espresso/TiDB), and powerful streaming and ingestion frameworks built on Kafka. From real-time ETL to OLAP systems like Pinot, from tracking and metrics pipelines feeding into Hadoop, to offline jobs pushing insights into derived stores like Venice; Kafka is the key cog in our data flywheel. As data moves through LinkedIn’s ecosystem, it traverses multiple schema languages and serialization formats, evolving—sometimes seamlessly, sometimes with transformation. But what happens when an upstream schema change inadvertently breaks downstream systems? Traditional siloed validation falls short, leading to data corruption, operational disruptions, and painful debugging. Join us as we dive into LinkedIn’s Universal Schema Registry (USR) solution for seamless, large-scale schema validation. 1. End-to-End Schema Compatibility – USR validates schemas holistically across RPC, Kafka, Espresso, Venice, and more, safeguarding the entire data lineage. 2. CI-Integrated Early Validation – A shift-left approach catches issues at schema authoring time, saving thousands of developer hours and avoiding costly data migrations. 3. Multi-Format Schema Mapping – Supports complex transformations across Proto, Avro, and more, enabling automated migrations like LinkedIn’s move from Rest.li to gRPC for online RPCs. Join us as we share LinkedIn’s journey of building and scaling USR to handle millions of validations every week, the challenges faced, and the best practices that keep our data ecosystem resilient.

Speakers

Souman Mandal, LinkedIn
Sarthak Jain, LinkedIn

Level

Intermediate

Target Audience

Architect, Data Engineer/Scientist, Developer

Industry

IT, Retail/E-Commerce, Technology

Tags

Apache Flink, Apache Iceberg, Apache Kafka, Architecture, CDC, Data Catalog, Integration, Stream Processing