Session Type
Lightning Talk
Name
Simplifying Real-Time Vector Store Ingestion with Apache Flink
Date
Wednesday, May 21, 2025
Time
11:00 AM - 11:15 AM
Location Name
Breakout Room 2
Description
Retrieval-Augmented Generation (RAG) has become a foundational paradigm that augments the capabilities of language models—small or large—by attaching information stored in vector databases to provide grounding data. While the concept is straightforward, maintaining up-to-date embeddings as data constantly evolves across various source systems remains a persistent challenge.
This lighting talk explores how to build a real-time vector ingestion pipeline on top of Apache Flink and its extensive connector ecosystem to keep vector stores fresh at all times seamlessly. To eliminate the need for custom code while still preserving a reasonable level of configurability, a handful of composable user-defined functions (UDFs) are discussed to address loading, parsing, chunking, and embedding of data directly from within Flink's Table API or Flink SQL jobs.
Easy-to-follow examples demonstrate how the discussed approach helps to significantly lower the entry barrier for RAG adoption, ensuring that retrieval remains consistent with your latest knowledge.
Speakers

Level
Intermediate
Target Audience
Data Engineer/Scientist, Developer, Architect
Tags
ML/AI Application, Apache Flink