Skip to content

Datastores

The Cross-Dataset Discovery service relies on a Vector Database for fast retrieval.

Vector Database (Qdrant)

The core retrieval functionality is powered by Qdrant, a high-performance vector search engine.

  • Technology: Qdrant
  • Purpose: Stores embeddings and metadata for all datasets. Supports:
    • Dense Vectors: 1024-dimensional vectors (generated by BAAI/bge-m3) for semantic search.
    • Sparse Vectors: Term-frequency vectors for keyword search (BM25).
    • Hybrid Search: Combines dense and sparse results using Reciprocal Rank Fusion (RRF).
  • Location: Running as a separate service within the Kubernetes cluster. Data is persisted on a Persistent Volume Claim (PVC).