In this session, Team Yubi demonstrates how an intelligent streaming data pipeline leveraging Apache Kafka creates a unified analytical platform to deliver near real-time insights from a centralized Redshift data warehouse. Business operations teams face challenges approving large-ticket trades due to fragmented data across multiple systems managed by different teams. Fetching and reconciling this data often involves writing complex queries—expertise many operations teams lack—leading to delays in due diligence and decision-making. To solve this, we built a robust streaming data pipeline that centralizes disparate data sources into Redshift. The pipeline uses Apache Kafka for streaming, Kubernetes for scalability, dbt for data transformations, and Redshift WLM with data sharing for optimized query execution. Our custom Kafka sink connectors process data efficiently in two modes—snapshot (replicating the source RDS) and CDC (capturing incremental changes)—within a single flush cycle. This approach keeps the warehouse up-to-date, reduces ETL loads, lowers infrastructure costs, and enables quick data refresh cycles. The unified platform also lays the foundation for AI-based Text-to-SQL (TTS) capabilities, allowing teams to generate SQL queries using natural language for ad-hoc requests and reports. By enabling real-time streaming, Team Yubi empowers operations teams to process high-value transactions—disbursing amounts worth hundreds of crores—quickly and efficiently. The ability to reinitiate actions seamlessly in case of failures minimizes operational bottlenecks and ensures smooth transaction workflows, reducing revenue impact. Join us to learn how real-time data streaming transforms operational efficiency and decision-making.

