OrionAI Build logo orionai.build

Picking a Vector DB in 2026: A Decision Framework

By OrionAI Build Editorial · Published 2026-05-10 · // guide

The four questions that prune the field

Before you even glance at a vendor’s brochure, answer these four concrete items. They collapse the combinatorial space of vector stores into a handful of viable candidates.

  1. Scale. Current vector count and 12‑month growth projection. Distinguish sub‑million, 1‑10 M, and 10 M+ regimes.
  2. Latency budget. Measure P95 under realistic concurrency (your typical request rate, not an isolated benchmark). Note the ceiling you can tolerate before the downstream LLM call dominates.
  3. Filtering complexity. Pure vector similarity, heavy metadata predicates, or geospatial constraints? Identify the dominant pattern.
  4. Operations appetite. How many full‑time equivalents can you devote to provisioning, monitoring, upgrades, backups, and disaster recovery?

Decision matrix distilled

Sub‑million vectors, minimal filters

Use pgvector inside an existing PostgreSQL cluster. The extension adds a vector column type, indexes with IVFFlat or HNSW, and leverages the same authentication, backups, and tooling you already run. Adding a separate service for <106 vectors rarely yields a net‑benefit.

1 M–10 M vectors, light filtering

Self‑hosted Qdrant or Weaviate on a modest VM (2 vCPU, 8 GiB RAM, NVMe) typically costs $20–$60 / month on providers such as Hetzner or Vast.ai. Both expose a REST‑ful /search endpoint, support HNSW out‑of‑the‑box, and provide optional ONNX runtime acceleration. In our experience, pgvector begins to show >30 % slowdown around the 5 M‑vector mark unless you manually tune nlist and ef_construction.

1 M–10 M vectors, heavy metadata filtering

Qdrant’s “filterable HNSW” index lets you combine arbitrary boolean predicates with vector similarity in a single pass. This is essential when you need to enforce tenant isolation, product version gating, or time‑range cuts. Weaviate offers a similar GraphQL filter layer, but its performance degrades noticeably when the filter cardinality exceeds a few hundred thousand rows.

10 M+ vectors

At this scale the operational overhead of self‑hosting outweighs raw cost. Managed services—Qdrant Cloud, Pinecone, Weaviate Cloud—provide auto‑scaling, multi‑zone replication, and built‑in monitoring. Their pricing tiers start around $500 / month for 10 M vectors with guaranteed <10 ms P95 latency. Choose the provider whose SLA aligns with your availability requirements; the underlying algorithmic differences are marginal for most production workloads.

Multi‑tenant SaaS pattern

Namespaces (Pinecone) or collection isolation (Qdrant Cloud) let you partition data per customer without spinning up separate clusters. Self‑hosted multi‑tenant architectures demand per‑tenant resource quotas, quota‑aware routing, and a robust RBAC model—effectively a dedicated devops team. In practice, we have seen teams burn 0.5–1 FTE per month just to keep tenant isolation safe.

Geospatial‑aware search

If you need to combine distance‑based filters with vector similarity, pair pgvector with PostGIS. The ST_DWithin predicate runs before the ANN lookup, keeping the candidate set small. Qdrant now ships a geo_filter primitive that indexes latitude/longitude in a separate HNSW layer; it performs well up to a few hundred thousand geo points. Pinecone’s geo support is experimental and currently lacks radius‑based queries.

Common optimisation rabbit holes

What teams obsess over rarely moves the needle in production.

Under‑measured factors that decide winners

These operational dimensions surface only after weeks of real usage.

The 15‑minute sanity check

Turn theory into data with a quick, reproducible experiment.

  1. Export 1,000 representative vectors from your production pipeline (include a mix of recent and legacy embeddings).
  2. Spin up three candidates: pgvector on a local Postgres, Qdrant Docker, and a managed Pinecone sandbox.
  3. Index the sample set, then fire 100 real‑world queries (the exact payloads you expect in production).
  4. Record P95 latency, recall on a hand‑labelled relevance set, and ops overhead (setup time, required config files, monitoring hooks).
  5. If two solutions converge on quality and latency, pick the one with the smallest ops surface. If one fails dramatically, eliminate it before you invest weeks of engineering.

Scaling beyond the matrix

When you outgrow the sweet spot of the matrix—say you hit 50 M vectors or need sub‑5 ms P95 at 10 k QPS—you must revisit two levers.

Cost‑vs‑performance sanity check

Translate the matrix into a dollar figure for your budget holder.

  1. Estimate monthly vector count and growth (e.g., 8 M now, +20 %/mo).
  2. Pick the tier that satisfies your latency budget (e.g., Qdrant Cloud “Standard” at $0.30 per GB‑month).
  3. Add ops overhead: 0.2 FTE for monitoring, 0.1 FTE for backups, 0.05 FTE for CI/CD integration.
  4. Sum hardware, SaaS, and labour. In our recent client, the total landed at $1,200 / month—30 % less than the initial “Pinecone‑only” estimate, with equal latency.

Final checklist before you commit

Run through this list after the 15‑minute test and before signing any contract.

Answering these questions gives you a defensible recommendation in under a quarter hour, sparing weeks of trial‑and‑error and keeping your vector search stack aligned with real production constraints.

This is part of the X cornerstone series