Datastores¶
The Cross-Dataset Discovery service relies on a Vector Database for fast retrieval.
Vector Database (Qdrant)¶
The core retrieval functionality is powered by Qdrant, a high-performance vector search engine.
- Technology: Qdrant
- Purpose: Stores embeddings and metadata for all datasets. Supports:
- Dense Vectors: 1024-dimensional vectors (generated by
BAAI/bge-m3) for semantic search. - Sparse Vectors: Term-frequency vectors for keyword search (BM25).
- Hybrid Search: Combines dense and sparse results using Reciprocal Rank Fusion (RRF).
- Dense Vectors: 1024-dimensional vectors (generated by
- Location: Running as a separate service within the Kubernetes cluster. Data is persisted on a Persistent Volume Claim (PVC).