Best Vector Databases in 2026: What’s Free, What’s Paid, and What’s Fast

Top Vector Databases of 2026: Free, Paid, and Performance Comparison

Updated: January 2026

Vector databases are the default storage layer for modern AI search: RAG, agent memory, semantic search, recommendations, and multimodal retrieval.
But in 2026, “vector database” can mean three very different things:

  • Purpose-built vector DBs (Milvus, Qdrant, Weaviate, Pinecone)
  • Vector search inside what you already run (Postgres + pgvector, Redis, MongoDB, OpenSearch, Elasticsearch)
  • Managed vector search platforms (Azure AI Search, Vertex AI Vector Search, Databricks Mosaic AI Vector Search)

This guide breaks down the best vector databases of 2026, which ones are free vs paid, and what benchmarks suggest about real performance.
It also includes a practical code guide you can adapt to your stack.


Table of contents


How to choose a vector database in 2026

Most teams pick a database for “speed”, then discover the real constraints were:

  • Filters (tenant_id, access control, time ranges, tags)
  • Ingestion under load (updates while serving queries)
  • Memory footprint (compression, quantization, storage tiering)
  • Hybrid retrieval (keyword + vector + rerank)
  • Ops overhead (backups, scaling, upgrades, observability)

Here is the standard retrieval pipeline most production systems end up with:

Documents chunks + metadata

Embeddings text / image vectors

Vector DB ANN + filters

Rerank / LLM better answers

Your database choice should match where your complexity lives: ops, filters, hybrid search, or scale.


Top vector databases comparison table (2026)

This table focuses on what matters in real projects: cost model, hosting, and “why teams pick it”.

Database / Service Free? Best for Strengths Tradeoffs
Pinecone Paid (managed-first) Fast production launch with minimal ops Managed experience, easy scaling, clean API Cost for convenience, managed constraints
Milvus (OSS) / Zilliz Cloud (managed) Milvus: Yes (OSS) / Zilliz: Paid + tiers Large-scale vector retrieval, enterprise workloads Strong scaling story, rich ecosystem Self-hosting adds operational complexity
Qdrant Yes (OSS) + paid cloud Filter-heavy apps, practical production RAG Great developer ergonomics, strong filtering, optional quantization Bench your workload and index choices
Weaviate Yes (OSS) + paid cloud Teams wanting OSS plus managed options Flexible collections, compression options (PQ and more) Configuration choices matter a lot
Postgres + pgvector Yes (OSS) “Keep it simple” when you already use Postgres Relational + vectors together, familiar tooling At big scale, specialized DBs can win on ops/perf
MongoDB Atlas Vector Search Paid (managed) + plans Vectors next to operational documents Document model, integrated search workflows Costs and performance depend on cluster choices
Redis Vector Search Depends on edition/license + paid cloud Ultra-low latency, “index next to cache” patterns Multiple vector index types (FLAT, HNSW, SVS-VAMANA) Licensing and memory economics require planning
OpenSearch Yes (OSS) Search teams adding vectors to an existing search stack knn_vector field type, mature search ecosystem More tuning, search-engine style ops
Elasticsearch Paid distributions (varies) Enterprise search with vectors built in dense_vector field for kNN workflows Cost and licensing depend on deployment model
Azure AI Search (Vector + Hybrid) Paid (managed) Hybrid retrieval (keyword + vector) at enterprise scale Built-in hybrid patterns (RRF), tight Azure integration Azure-native workflow, index-centric model
Vertex AI Vector Search Paid (managed) Large-scale managed ANN on Google Cloud Built on ScaNN, high-scale managed indexing GCP-native stack considerations
Databricks Mosaic AI Vector Search Paid (managed) Organizations already living in Databricks Governance + platform integration Best if you are already committed to Databricks
Chroma Yes (OSS) + cloud offerings Local-first development and prototypes Simple dev experience, popular in AI tooling Confirm scaling and ops needs for production
LanceDB Yes (OSS) + cloud offerings Embedded, local tables, offline workflows Apache 2.0 OSS, good local persistence story Evaluate for multi-tenant, high concurrency production
Vespa Yes (OSS) Search + recommendation systems with complex ranking Powerful ranking and retrieval stack Steeper learning curve than “pure VDBs”

Tip: if you need a “one sentence rule”, use this:

If you want the lowest ops burden, pick a managed vector DB or managed search platform.
If you want maximum control and predictable infra, pick an OSS vector DB.
If your company already runs Postgres, MongoDB, Elastic/OpenSearch, or Redis deeply, start there and prove you need to add another system.


Performance comparison (benchmarks that matter)

Vector performance is not one number. The same database can win on one workload and lose on another depending on:

  • index type and parameters
  • target recall
  • filter selectivity and complexity
  • ingestion happening at the same time as queries
  • hardware and cost constraints

A useful public reference point is the VDBBench leaderboard, which reports metrics like P99 latency and QPS under defined setups.
The values below are copied from their leaderboard view for a fixed monthly cost and dataset size.

1) Latency and QPS at a fixed budget

Scenario: “Vector Search Latency and QPS at $1,000 Monthly Cost” on a 1M dataset

System (as listed) P99 latency (ms) QPS
ZillizCloud-8cu-perf 2.5 9704.42
Milvus-16c64g-sq8 2.2 3465.17
OpenSearch-16c128g-force_merge 7.2 3055.01
ElasticCloud-8c60g-force_merge 11.3 1925.3
QdrantCloud-16c64g 6.4 1242.43
Pinecone-p2.x8-1node 13.7 1146.53

What to take from this:

  • This is one workload slice, not a universal ranking.
  • Budget constraints can reshuffle winners compared to “unlimited hardware” benchmarks.
  • P99 latency matters more than average latency for user-facing apps.

Reference: VDBBench leaderboard

2) Streaming performance (search while ingesting)

If you update your index constantly (new docs, chat history, product catalog changes), streaming performance is often the deciding factor.
This table summarizes “Streaming Performance” values shown for a 10M dataset under constant ingestion.

System (as listed) Static QPS QPS @ 500 rows/s ingestion QPS @ 1000 rows/s ingestion
ZillizCloud (8cu-perf) 3957 2119 1860
Pinecone (p2.x8-1node) 1131 367.4 369.7
OpenSearch (16c128g) 505.7 161.7 149.7
QdrantCloud (16c64g) 446.9 393.8 347.6
Milvus (16c64g-sq8) 437.2 306 156
ElasticCloud (8c60g) 376.4 61.67 61.82

Reference: VDBBench leaderboard


Deep dive: best picks by category

Best managed-first vector DB

  • Pinecone: great when you want production speed and minimal cluster work.

Best open-source vector DBs (general purpose)

  • Milvus: strong scaling story and ecosystem, good for big deployments.
  • Qdrant: excellent for metadata filtering and practical production usage.
  • Weaviate: flexible collections and compression options like PQ to reduce memory usage.

Best “use what you already run” options

  • Postgres + pgvector: best when “one database” simplicity matters.
  • MongoDB Atlas Vector Search: best if your app is document-first.
  • Redis Vector Search: best for low-latency retrieval near cache and sessions.
  • OpenSearch / Elasticsearch: best for search teams that already operate a search stack.

Best managed search platforms with vectors

  • Azure AI Search: great for hybrid retrieval (keyword + vector), enterprise patterns.
  • Vertex AI Vector Search: strong managed ANN on Google Cloud (ScaNN-based).
  • Databricks Mosaic AI Vector Search: ideal if Databricks is your data home.

Best local-first / embedded

  • Chroma: easy dev workflow and popular in AI tooling.
  • LanceDB: strong local persistence story and OSS.

Code guide: ingest + search (copy-ready examples)

Below is a minimal, practical setup:

  1. Generate embeddings (one-time per chunk).
  2. Store vectors + metadata.
  3. Query by vector, apply filters, return top-k.

Step 0: Create embeddings (example in Python)

Use any embedding model you like. This example uses sentence-transformers.

# pip install sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

texts = [
    "Refund policy for annual subscriptions",
    "How to reset your password",
    "Troubleshooting login issues",
]
vectors = model.encode(texts, normalize_embeddings=True).tolist()

Pinecone (managed)

# pip install pinecone

import os
from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion, VectorType

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_config = pc.create_index(
    name="docs-index",
    dimension=384,
    spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
    vector_type=VectorType.DENSE,
)

idx = pc.Index(host=index_config.host)

# Upsert vectors with metadata
idx.upsert(
    vectors=[
        ("doc-1", vectors[0], {"topic": "billing"}),
        ("doc-2", vectors[1], {"topic": "account"}),
        ("doc-3", vectors[2], {"topic": "account"}),
    ],
    namespace="kb",
)

# Query with an optional metadata filter
query_vec = model.encode(["refund for yearly plan"], normalize_embeddings=True).tolist()[0]
res = idx.query(
    vector=query_vec,
    top_k=5,
    include_metadata=True,
    filter={"topic": {"$eq": "billing"}},
    namespace="kb",
)

print(res)

Qdrant (OSS or Cloud)

# pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue

# For local dev: QdrantClient(":memory:") or QdrantClient(path="qdrant.db")
client = QdrantClient(url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
                      api_key=os.environ.get("QDRANT_API_KEY"))

collection = "kb"
if not client.collection_exists(collection):
    client.create_collection(
        collection_name=collection,
        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    )

client.upsert(
    collection_name=collection,
    points=[
        PointStruct(id="doc-1", vector=vectors[0], payload={"topic": "billing"}),
        PointStruct(id="doc-2", vector=vectors[1], payload={"topic": "account"}),
        PointStruct(id="doc-3", vector=vectors[2], payload={"topic": "account"}),
    ],
)

query_vec = model.encode(["refund for yearly plan"], normalize_embeddings=True).tolist()[0]
hits = client.search(
    collection_name=collection,
    query_vector=query_vec,
    limit=5,
    query_filter=Filter(
        must=[FieldCondition(key="topic", match=MatchValue(value="billing"))]
    ),
)

for h in hits:
    print(h.id, h.score, h.payload)

Weaviate (OSS or Cloud)

# pip install -U weaviate-client

import os
import weaviate
from weaviate.classes.config import Configure

weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]

with weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=weaviate_api_key,
) as client:
    # Create a collection (example uses a built-in vectorizer option; you can also import your own vectors)
    if not client.collections.exists("Movie"):
        client.collections.create(
            name="Movie",
            vector_config=Configure.Vectors.text2vec_weaviate(),
        )

    movies = client.collections.use("Movie")
    with movies.batch.fixed_size(batch_size=200) as batch:
        batch.add_object(properties={
            "title": "Refund policy",
            "description": "Refund policy for annual subscriptions",
            "topic": "billing",
        })

    # Search example (vectorize query depending on your configured vectorizer)
    # For custom vectors, store your own embedding and query by that embedding.

Postgres + pgvector (free and very practical)

-- 1) Install extension (varies by hosting)
CREATE EXTENSION IF NOT EXISTS vector;

-- 2) Table with an embedding column
CREATE TABLE IF NOT EXISTS kb_docs (
  id TEXT PRIMARY KEY,
  content TEXT NOT NULL,
  topic TEXT NOT NULL,
  embedding vector(384) NOT NULL
);

-- 3) Create an ANN index (choose HNSW or IVFFlat)
-- HNSW example:
CREATE INDEX IF NOT EXISTS kb_docs_embedding_hnsw
ON kb_docs
USING hnsw (embedding vector_cosine_ops);

-- 4) Query: order by cosine distance (lower is more similar)
-- Replace :query_embedding with your 384-d vector
SELECT id, content, topic
FROM kb_docs
WHERE topic = 'billing'
ORDER BY embedding <=> :query_embedding
LIMIT 5;

OpenSearch (if you already run it)

PUT my-index
{
  "settings": { "index": { "knn": true } },
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 384
      },
      "topic": { "type": "keyword" },
      "content": { "type": "text" }
    }
  }
}

Final checklist before you commit

  • Do you need strict multi-tenancy? Confirm filter performance and isolation strategy.
  • Do you ingest continuously? Benchmark “search while ingesting”, not just static search.
  • How big is your metadata? Some systems are fast until filters get complex.
  • Do you need hybrid search? If yes, a search platform (Azure AI Search / Elastic / OpenSearch) can simplify.
  • Do you want to run it yourself? If not, managed-first usually wins time-to-production.

If you want one safe, boring recommendation:

  • Enterprise managed: Azure AI Search (hybrid) or Vertex AI Vector Search (GCP) depending on your cloud.
  • Managed vector DB: Pinecone or a managed Milvus option.
  • Open source: Qdrant, Weaviate, or Milvus.
  • Simplest path: Postgres + pgvector if your scale is not extreme and you want fewer moving parts.

Happy benchmarking.


Author update

I will add live benchmarks as newer vector DB versions ship. If you want a comparison on your workload shape, share the index size and query pattern.

Leave a Reply

Your email address will not be published. Required fields are marked *