Session Type
Breakout Session
Name
FlinkSQL Powered Asynchronous Data Processing in Pinterest’s Rule Engine Platform
Date
Tuesday, May 20, 2025
Time
4:30 PM - 5:15 PM
Location Name
Breakout Room 2
Description
Pinterest rule engine platform, also known as Guardian, allows Subject Matter Experts (SMEs) to analyze real time event streams for patterns of abuse and create rules to block those patterns. Guardian addresses various domain-specific challenges, including spam / fraud enforcement, Media Research Council (MRC), account takeover attacks (ATO), risk monitoring, and unsafe content enforcement fanout, etc. However, the legacy Guardian platform was built under a monolithic architecture and is unable to keep up with the data scale and the increasing demands and risks faced by stakeholders.
To tackle these challenges, we redesigned next-gen Guardian with event-driven architecture by choosing FlinkSQL for scalable event processing and integrating with various data storage systems like Kafka, Starrocks, Iceberg and internal KVstore that cater to specific data access requirements. In this talk, we would like to share the design and learnings of building the new system. Specifically, we’ll focus on how FlinkSQL interacts with different storage systems and how FlinkSQL is leveraged to support asynchronous data processing needs, including stream splitting & pruning, data ingestion, rule enforcement and rewind & replay.
Our revamped architecture has yielded significant improvements in scalability, efficiency, development velocity and data compliance. Additionally, we will touch base some ongoing efforts on safe schema evolution, which have become more challenging under the event-driven design with various storage systems and FlinkSQL introduced.
Speakers


Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer, Executive (Technical)
Industry
Advertising/Media
Tags
Event-Driven Systems, Stream Processing, Storage, Architecture, Apache Flink