At Pinterest, counters are at the core of feature engineering, enabling teams to uncover event patterns and transform discoveries into actionable features. Our journey to build a robust counter framework surfaced several distinctive challenges: 1. The demand for a scalable architecture capable of managing hundreds of counters. 2. The ability to explore multiple window sizes from a minute to a week for the same counter with frequent updates to gain richer and faster insights. 3. The continual onboarding of new counters to stay ahead of emerging trends. In this session, we will delve into how we tackled these challenges by building a scalable and efficient real-time event counter framework with Apache Kafka, Apache Flink and a wide-column store. Our approach involves a two-stage data processing layer: - Stage 1: Flink jobs read event streams, apply filtering, enrich them with metadata outlining aggregation logic, and write intermediate records to Kafka. The stateless FlinkSQL queries dynamically generated from user-supplied SQL scripts ensures seamless addition and swift deployment of new counters. - Stage 2: A stateful Flink job consumes intermediate records, computes counter results and writes them to a wide-column store for online serving. To facilitate multiple window sizes with frequent updates, we leveraged a chain-of-window technique to efficiently cascade aggregated results from smaller to larger windows, therefore minimizing redundant computations and reducing data shuffling. We group counter results to emit multiple records in a single write. To avert write traffic surges as windows close, a custom rate limiter intelligently spreads out writes over time. These optimizations efficiently reduce write requests and avoid traffic spikes to the wide-column store, thus lowering costs and improving stability of the overall system. Attendees will gain insights into Flinkās SQL and windowing functionalities for scalable stream processing in real-world applications.

