← Back to Insights

Vector Databases: pgvector vs Dedicated Stores

Metasphere Engineering 13 min read

A team follows a LangChain tutorial one afternoon and spins up Pinecone for their RAG application. Six months later, they’re running Pinecone for vector search, PostgreSQL for user data, Redis for caching, and a Lambda function that syncs document metadata between Pinecone and Postgres every 5 minutes. Three databases. Three operational domains. Three failure modes. Three filing cabinets that are supposed to contain the same documents. And a consistency bug where deleted documents stay searchable for up to 5 minutes because the sync job hasn’t run yet. The document shredded in one cabinet. Still in the other two.

A user asks a question, gets an answer grounded in a document that was deleted an hour ago, and files a support ticket. Nobody can figure out why the AI is “hallucinating” until someone checks the sync job timing. (It wasn’t hallucinating. It was reading from the cabinet nobody cleaned out.)

Key takeaways
  • PostgreSQL + pgvector handles most RAG workloads under 10M vectors. Same database, same transactions, same consistency guarantees. One cabinet. No sync jobs. No consistency bugs.
  • Dedicated vector databases earn their complexity above 10M vectors or when sub-10ms P99 latency on high-dimensional queries is a hard requirement.
  • Deleted documents stay searchable until the sync runs. When documents live in one database and embeddings in another, this consistency gap is the default failure mode.
  • Metadata filtering before vector search sharpens relevance fast. Filter by tenant, document type, or date range before computing similarity so the ANN index only scores vectors that could actually matter.
  • Multi-tenant isolation requires explicit namespace or partition separation. A missing tenant filter returns other tenants’ documents with HTTP 200 and no warning.

Somewhere along the way, every LangChain tutorial started with “Step 1: Set up Pinecone.” Teams treated it as a requirement. It was always a choice. And for more production RAG applications than most teams realize, pgvector on PostgreSQL works well enough with way less operational headache.

Multi-tenant vector database data leakage from missing tenant filterAnimated diagram showing how a missing tenant_id filter on a shared vector index causes silent cross-tenant data leakage with HTTP 200 OK, then shows the fix with proper namespace isolation.Silent Cross-Tenant Data LeakageSHARED VECTOR INDEXTenant ATenant BTenant CQuery from Tenant A:"revenue forecast"BUG: No tenant_id filterSearch scans ALL vectorsRESULTS RETURNEDQ3 revenue modelAnnual forecast draftB: M&A projectionsBudget summaryB: Revenue pipelineC: Board financialsHTTP 200 OKTenant B's confidential dataserved to Tenant ANo error. No warning. HTTP 200.FIX: Add tenant filterFILTERED VECTOR INDEXQuery from Tenant A:"revenue forecast"WHERE tenant_id = "A"RESULTS (Tenant A only)Q3 revenue modelAnnual forecast draftBudget summaryOne missing filter. Full data exposure.No error code will save you. The silence is the danger.

What Vector Search Actually Does

Strip away the marketing and a vector database does one thing. Given a query vector (a list of 768 or 1536 floating-point numbers), find the k stored vectors most similar by cosine distance or dot product. Everything else, the document storage, the metadata filtering, the hybrid search, is built on top of that single primitive.

Exact nearest neighbor search compares the query against every stored vector. Perfectly accurate. Scales as O(n). At 100K vectors, it finishes fast. At 1 million, response times cross 100ms. At 10 million, you are measuring in seconds. Too slow for anything interactive.

Approximate nearest neighbor (ANN) algorithms trade perfect accuracy for speed. They build indexes that skip most of the vectors, finding near-perfect matches without scanning everything. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to its nearest neighbors. Queries navigate the graph from a random entry point, getting closer to the target at each hop. At ef=128 (the search beam width), recall sits at 95-99%. For RAG, that 1-5% miss rate is irrelevant. The context you missed gets replaced by context that works just as well. Nobody notices.

IVF (Inverted File Index) partitions the vector space into clusters and only searches the nearest clusters at query time. Faster index builds than HNSW, but lower recall at equivalent query speed. Better suited for workloads with frequent bulk inserts where rebuilding an HNSW graph is expensive.

Index TypeBuild TimeQuery LatencyRecall at SpeedBest For
Exact (flat)NoneO(n), slow above 1M100%Small corpora, validation benchmarks
HNSWSlow (graph construction)Sub-10ms at millions95-99% at ef=128Most production RAG, interactive search
IVFModerate (clustering)Low, but recall-sensitive to nprobes90-97% depending on nprobesHigh-churn corpora, frequent bulk inserts

Picking the right approach means weighing accuracy, latency, and operational requirements against your actual AI and ML maturity. Do not optimize for ten million vectors when you have fifty thousand.

Vector search pipeline: embed, index, search, rerankText to vector via embedding model. Vector stored in HNSW index. Query embedded at search time, ANN finds top 100 candidates. Cross-encoder reranks to top 10 with precision.Vector Search: Embed, Index, Search, RerankDocumentRaw text inputEmbedSentence transformerText to 768-dim vectorIndex + SearchHNSW approximate NNTop 100 candidates fastRerankCross-encoder scores eachTop 10 with true relevanceEmbedding finds candidates fast. Reranking makes them accurate.

The Case for pgvector

For applications with fewer than 5 million documents, pgvector has advantages that vector database marketing doesn’t want to talk about.

ACID consistency: insert a document, its metadata, and its embedding in a single BEGIN...COMMIT. When the transaction commits, the document is searchable. Roll it back and it vanishes. No sync jobs, no consistency windows, no phantom results from deleted documents hanging around in a separate store. An entire class of consistency bugs just disappears.

-- Document and embedding in one transaction. Searchable immediately.
BEGIN;
INSERT INTO documents (id, title, body, user_id, created_at)
VALUES ($1, $2, $3, $4, now());

INSERT INTO document_embeddings (document_id, embedding)
VALUES ($1, $5::vector);  -- pgvector type

COMMIT;

-- Search with full SQL expressiveness
SELECT d.title, d.body,
       e.embedding <=> $query_vector AS distance
FROM document_embeddings e
JOIN documents d ON d.id = e.document_id
WHERE d.user_id = $current_user
  AND d.created_at > now() - interval '30 days'
ORDER BY distance
LIMIT 10;

Full SQL expressiveness: filter vectors by user_id, date range, category, or any relational predicate. Joins work. Foreign keys work. Try writing that query against Pinecone’s API.

Operational simplicity: your team already knows PostgreSQL. Adding pgvector is CREATE EXTENSION vector; and a new index type. No new operational domain to learn, no new failure modes to debug, no new on-call runbook to write. Operational simplicity has real engineering value that never shows up in benchmark comparisons.

The pgvector HNSW index (added in v0.5.0) performs well for workloads under a few million vectors. Benchmarks show 95-99% recall at under 10ms latency for typical RAG queries. The point where a dedicated vector database actually justifies its cost is higher than most teams think. Production RAG systems on pgvector handle millions of vectors without breaking a sweat.

When pgvector is enough vs when you need a dedicated vector databaseUnder 1M vectors and moderate query load: pgvector in your existing Postgres. No new infrastructure, familiar tooling, ACID transactions. Over 10M vectors or sub-10ms latency requirements: dedicated vector DB like Pinecone, Weaviate, or Qdrant.pgvector vs Dedicated Vector DBpgvector (Postgres Extension)Under 1M vectors, moderate QPS+ No new infrastructure+ ACID transactions with business data+ Familiar SQL, familiar ops- HNSW index rebuild on large updatesStart here. Most teams never outgrow it.Dedicated Vector DBOver 10M vectors or sub-10ms P99+ Purpose-built indexing (HNSW, IVF)+ Horizontal scaling, sharding+ Metadata filtering at scale- Another system to operateWhen pgvector hits measured limits.Add infrastructure when data proves you need it, not when vendors say so.

The Multi-Tenancy Trap

Multi-tenancy in vector databases needs serious thought because the failure mode isn’t “wrong results.” It’s silent data leakage.

The metadata filter approach stores all tenant embeddings in a single index and applies a tenant_id filter at query time. Operationally simple. Works well. Right up until someone writes a query path that forgets the filter. Maybe a new endpoint. Maybe a refactor that accidentally drops the filter parameter. Maybe a test that hardcodes the query without the filter and accidentally ships to production. The query does not fail. It returns HTTP 200. It returns results. Those results include documents from other tenants. No error. No warning. No audit log entry.

Anti-pattern

Don’t: Rely on application-level tenant_id filters as your only tenant isolation boundary. Every query path, every new endpoint, every refactor must remember to include it. One miss and you leak data silently.

Do: Use database-level enforcement. With pgvector, PostgreSQL’s RLS (Row Level Security) policies enforce the tenant filter at the database layer. The application cannot bypass it even if code omits the parameter. For dedicated vector stores, use namespace separation per tenant.

In data engineering contexts where AI applications handle sensitive customer data, cross-tenant exposure is a liability issue, not a quality issue.

Namespace isolation (separate collections or indexes per tenant) gives you hard boundaries but makes index management harder as tenant count grows. For applications where data isolation is a security or compliance requirement, the extra complexity is worth it.

If your compliance framework requires “demonstrate data isolation,” metadata filtering will fail the audit. Namespace isolation passes. Cluster isolation passes without discussion.

ApproachHow It WorksData IsolationRiskPerformance
Metadata filterAll tenants share one index. Query includes WHERE tenant_id = XLogical only. Data co-located in same indexIf tenant_id filter is omitted (bug, misconfiguration), query returns other tenants’ data. Silent leakageFastest. Single index, no overhead
Namespace/collection per tenantSeparate collection or namespace per tenant within same clusterStrong logical isolation. No filter requiredMisconfigured permissions could still cross namespaces. But no silent leakage from missing filterSlight overhead. More collections to manage
Cluster per tenantDedicated vector DB cluster per tenantPhysical isolation. No shared infrastructureHighest cost. Operational overhead scales linearly with tenantsBest. No noisy neighbor. No cross-tenant risk

Make this architecture decision at design time with your security requirements , not after a customer discovers documents from another tenant in their search results.

Embedding Model Upgrades: The Migration Nobody Plans For

Embedding model upgrades invalidate your entire index. All of it. Immediately. No migration path. Every team gets caught off guard by this one.

A new model produces different-dimension vectors. Your existing index is mathematically incompatible. Mixing old and new embeddings makes cosine similarity meaningless. Comparing Fahrenheit to Celsius. Numbers present, meaning absent.

Re-embedding a multi-million-document corpus is not catastrophic on compute. But the operational coordination is real. Build a new index in parallel. Validate search quality against a test suite. Cut over. For a multi-tenant system with dozens of customers, that coordination multiplies.

Prerequisites
  1. Source text stored alongside every embedding so re-embedding is a batch job
  2. Model version and embedding timestamp recorded per vector
  3. Search quality test suite with known-good query/result pairs for regression testing
  4. Parallel index infrastructure capable of running old and new indexes at the same time during cutover
  5. Embedding pipeline that can reprocess the full corpus within your maintenance window
The high-cardinality filter problem most teams discover too late

ANN indexes optimize for global nearest neighbor search. When you add a high-cardinality filter like user_id across millions of users, the index cannot efficiently prune the search space. Each user’s search space is tiny relative to the full index, but the ANN algorithm still navigates the full graph before applying the filter.

The crossover point is typically 10K-50K unique filter values. Below that, filtered search against a single index works fine. Above it, per-user (or per-group) collections often run 10x faster than filtered search. This is an architecture decision that must happen at design time. Migrating from a single filtered index to per-tenant collections after the product is live means re-indexing every vector with a new collection assignment.

The 5-Minute Ghost The consistency gap that haunts every architecture where documents live in one database and embeddings live in another. A user deletes a document. The relational database reflects the deletion immediately. The vector store still returns that document in search results until the next sync job runs. The user asks a question, gets an answer grounded in a document that was deleted an hour ago, and files a support ticket about “hallucination.” Not hallucination. Architecture.

What the Industry Gets Wrong About Vector Databases

“Every RAG application needs a dedicated vector database.” Most LangChain tutorials begin with “Step 1: Set up Pinecone.” pgvector in PostgreSQL handles most RAG workloads under 10 million vectors with full ACID consistency, no sync jobs, and zero additional operational domains. The dedicated vector database earns its complexity at scale. Most teams will not reach that scale for years.

“Metadata filtering provides adequate multi-tenant isolation.” It does, until a code path omits the tenant filter. The query returns HTTP 200 with results from other tenants. No error, no warning, no audit log entry. For security-sensitive workloads, namespace isolation or database-level RLS is the only defensible architecture.

“Embedding model upgrades are seamless.” Every model upgrade invalidates your entire index immediately. New models produce vectors in different dimensional spaces. You cannot mix old and new embeddings. If you have not stored source text alongside embeddings, re-embedding becomes a data archaeology expedition instead of a batch job.

Our take The vector database market is selling infrastructure complexity to teams that need a PostgreSQL extension. Most vendor comparisons benchmark at scales their customers will never reach, on workloads their customers don’t run. The honest evaluation: run your actual queries against pgvector. If it meets your latency requirements, stop shopping. The dedicated vector database conversation should start with measured evidence of pgvector failing, not with a vendor pitch about theoretical throughput at 100 million vectors.

Three databases, three failure modes, one sync job quietly producing ghost results. Or one database, one transaction, and documents that are searchable the instant they exist and gone the instant they’re deleted. Store source text alongside embeddings. Store model version and timestamp. When the embedding model upgrade arrives, re-embedding is a weekend batch job , not a crisis.

Three Databases, Three Failure Modes, One Sync Bug

Wrong vector storage architecture creates operational debt that compounds with data volume. Some teams over-engineer for scale years away. Others under-design for isolation they need today. Matching vector search to actual requirements prevents both traps.

Architect Vector Search

Frequently Asked Questions

What is approximate nearest neighbor search and when is exact search better?

+

ANN algorithms like HNSW and IVF find similar vectors without scanning every stored vector. ANN is essential above roughly 1 million vectors, where exact search response times exceed 100ms. Below that threshold, pgvector’s HNSW index provides under 10ms latency with full ACID semantics and no additional infrastructure. The recall trade-off is typically 95-99% at ef=128, meaning you occasionally miss the absolute best match but find very close alternatives.

When is pgvector the right choice over a dedicated vector database?

+

pgvector wins when your dataset is under 5 million vectors, you need ACID transactions combining vector search with relational queries, your team already operates PostgreSQL, or operational simplicity outweighs peak performance. Dedicated databases like Pinecone, Weaviate, and Qdrant scale better above 10M vectors but add a separate operational domain, failure modes, and consistency challenges.

What happens when you upgrade your embedding model?

+

All stored embeddings become invalid immediately. The new model produces vectors in a different dimensional space, mathematically incompatible with your existing index. You must re-embed the entire corpus. Plan for this from day one: store source text alongside embeddings so re-embedding is a batch job, not a data reconstruction crisis.

What is the multi-tenancy data leakage risk in vector databases?

+

Most vector databases support multi-tenancy through metadata filtering, storing all tenant embeddings in one index with filter predicates at query time. The risk: code that omits the tenant filter returns data from other tenants with no error, no warning, just HTTP 200. This class of bug has caused real data leakage in production RAG applications. For security-grade isolation, use namespace separation per tenant.

What is the cardinality problem in filtered vector search?

+

High-cardinality filters like user_id across millions of users degrade ANN index performance. ANN indexes optimize for global nearest neighbor search. When each user’s search space is isolated, per-user collections often outperform filtered search against a single large index. Make this architecture decision at design time, not after degraded performance at scale. The crossover point is typically around 10K-50K unique filter values.