Session Type
Breakout Session
Name
Scaling Semantic Search: Apache Kafka Meets Vector Databases
Date
Tuesday, May 20, 2025
Time
4:30 PM - 5:15 PM
Location Name
Breakout Room 1
Description
This talk presents a performance-tuned Apache Kafka pipeline for generating embeddings on large-scale text data streams. To store embeddings, our implementation supports various vector databases, making it highly adaptable to many applications.
Text embeddings are fundamental for semantic search and recommendation, representing text in high-dimensional vector spaces for efficient similarity search using approximate k-nearest neighbors (kNN). By storing these embeddings and providing semantic search results given a query, vector databases are central to retrieval-augmented generation systems.
We present our Kafka pipeline for continuously embedding texts to enable semantic search on live data. We demonstrate its end-to-end implementation while addressing key technical challenges:
- First, the pipeline performs text chunking to adhere to the maximum input sequence length of the embedding model. We use an optimized overlapping text chunking strategy to ensure that context is maintained across chunks.
- Using HuggingFace’s Text Embeddings Inference (TEI) toolkit in a lightweight, containerized GPU environment, we achieve efficient, scalable text embedding computation. TEI supports a wide range of state-of-the-art embedding models.
- As an alternative to relying on Kafka Streams, our solution implements optimized processing of small batches using Kafka consumer and producer client APIs, allowing batched API calls to TEI. Our benchmark results confirm this choice, indicating high efficiency with significantly improved throughput and reduced latency compared to other approaches.
- Finally, Kafka Connect allows real-time ingestion into vector databases like Qdrant, Milvus, or Vespa, making embeddings instantly available for semantic search and recommendation.
With Kafka’s high-throughput streaming, optimized interactions with GPU-accelerated TEI, and efficient vector serialization, our pipeline achieves scalable embedding computation and ingestion into vector databases.
Speakers


Level
Intermediate
Target Audience
Architect, Data Engineer/Scientist, Developer, Operator/Administrator, Executive (Technical), Executive (Non-technical)
Tags
Analytics, Apache Kafka, Architecture, Event-Driven Systems, Integration, Kafka Connect, Kafka Streams, Microservices, ML/AI Application, ML PLatform, Operations, Stream Processing