Session Details: Current London 2025

Session Type

Breakout Session

Name

Scaling Semantic Search: Apache Kafka Meets Vector Databases

Date

Tuesday, May 20, 2025

Time

4:30 PM - 5:15 PM

Location Name

Breakout Room 1

Description

This talk presents a performance-tuned Apache Kafka pipeline for generating embeddings on large-scale text data streams. To store embeddings, our implementation supports various vector databases, making it highly adaptable to many applications. Text embeddings are fundamental for semantic search and recommendation, representing text in high-dimensional vector spaces for efficient similarity search using approximate k-nearest neighbors (kNN). By storing these embeddings and providing semantic search results given a query, vector databases are central to retrieval-augmented generation systems. We present our Kafka pipeline for continuously embedding texts to enable semantic search on live data. We demonstrate its end-to-end implementation while addressing key technical challenges: - First, the pipeline performs text chunking to adhere to the maximum input sequence length of the embedding model. We use an optimized overlapping text chunking strategy to ensure that context is maintained across chunks. - Using HuggingFace’s Text Embeddings Inference (TEI) toolkit in a lightweight, containerized GPU environment, we achieve efficient, scalable text embedding computation. TEI supports a wide range of state-of-the-art embedding models. - As an alternative to relying on Kafka Streams, our solution implements optimized processing of small batches using Kafka consumer and producer client APIs, allowing batched API calls to TEI. Our benchmark results confirm this choice, indicating high efficiency with significantly improved throughput and reduced latency compared to other approaches. - Finally, Kafka Connect allows real-time ingestion into vector databases like Qdrant, Milvus, or Vespa, making embeddings instantly available for semantic search and recommendation. With Kafka’s high-throughput streaming, optimized interactions with GPU-accelerated TEI, and efficient vector serialization, our pipeline achieves scalable embedding computation and ingestion into vector databases.

Speakers

Jakob Edding, bakdata
Raphael Lachtner, bakdata

Level

Intermediate

Target Audience

Architect, Data Engineer/Scientist, Developer, Operator/Administrator, Executive (Technical), Executive (Non-technical)

Tags

Analytics, Apache Kafka, Architecture, Event-Driven Systems, Integration, Kafka Connect, Kafka Streams, Microservices, ML/AI Application, ML PLatform, Operations, Stream Processing