Vector Databases: pgvector vs Dedicated Stores
A team follows a LangChain tutorial one afternoon and spins up Pinecone for their RAG application. Six months later, they’re running Pinecone for vector search, PostgreSQL for user data, Redis for caching, and a Lambda function that syncs document metadata between Pinecone and Postgres every 5 minutes. Three databases. Three operational domains. Three failure modes. Three filing cabinets that are supposed to contain the same documents. And a consistency bug where deleted documents stay searchable for up to 5 minutes because the sync job hasn’t run yet. The document shredded in one cabinet. Still in the other two.
A user asks a question, gets an answer grounded in a document that was deleted an hour ago, and files a support ticket. Nobody can figure out why the AI is “hallucinating” until someone checks the sync job timing. (It wasn’t hallucinating. It was reading from the cabinet nobody cleaned out.)
- PostgreSQL + pgvector handles most RAG workloads under 10M vectors. Same database, same transactions, same consistency guarantees. One cabinet. No sync jobs. No consistency bugs.
- Dedicated vector databases earn their complexity above 10M vectors or when sub-10ms P99 latency on high-dimensional queries is a hard requirement.
- Deleted documents stay searchable until the sync runs. When documents live in one database and embeddings in another, this consistency gap is the default failure mode.
- Metadata filtering before vector search sharpens relevance fast. Filter by tenant, document type, or date range before computing similarity so the ANN index only scores vectors that could actually matter.
- Multi-tenant isolation requires explicit namespace or partition separation. A missing tenant filter returns other tenants’ documents with HTTP 200 and no warning.
Somewhere along the way, every LangChain tutorial started with “Step 1: Set up Pinecone.” Teams treated it as a requirement. It was always a choice. And for more production RAG applications than most teams realize, pgvector on PostgreSQL works well enough with way less operational headache.
What Vector Search Actually Does
Strip away the marketing and a vector database does one thing. Given a query vector (a list of 768 or 1536 floating-point numbers), find the k stored vectors most similar by cosine distance or dot product. Everything else, the document storage, the metadata filtering, the hybrid search, is built on top of that single primitive.
Exact nearest neighbor search compares the query against every stored vector. Perfectly accurate. Scales as O(n). At 100K vectors, it finishes fast. At 1 million, response times cross 100ms. At 10 million, you are measuring in seconds. Too slow for anything interactive.
Approximate nearest neighbor (ANN) algorithms trade perfect accuracy for speed. They build indexes that skip most of the vectors, finding near-perfect matches without scanning everything. HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each node connects to its nearest neighbors. Queries navigate the graph from a random entry point, getting closer to the target at each hop. At ef=128 (the search beam width), recall sits at 95-99%. For RAG, that 1-5% miss rate is irrelevant. The context you missed gets replaced by context that works just as well. Nobody notices.
IVF (Inverted File Index) partitions the vector space into clusters and only searches the nearest clusters at query time. Faster index builds than HNSW, but lower recall at equivalent query speed. Better suited for workloads with frequent bulk inserts where rebuilding an HNSW graph is expensive.
| Index Type | Build Time | Query Latency | Recall at Speed | Best For |
|---|---|---|---|---|
| Exact (flat) | None | O(n), slow above 1M | 100% | Small corpora, validation benchmarks |
| HNSW | Slow (graph construction) | Sub-10ms at millions | 95-99% at ef=128 | Most production RAG, interactive search |
| IVF | Moderate (clustering) | Low, but recall-sensitive to nprobes | 90-97% depending on nprobes | High-churn corpora, frequent bulk inserts |
Picking the right approach means weighing accuracy, latency, and operational requirements against your actual AI and ML maturity. Do not optimize for ten million vectors when you have fifty thousand.
The Case for pgvector
For applications with fewer than 5 million documents, pgvector has advantages that vector database marketing doesn’t want to talk about.
ACID consistency: insert a document, its metadata, and its embedding in a single BEGIN...COMMIT. When the transaction commits, the document is searchable. Roll it back and it vanishes. No sync jobs, no consistency windows, no phantom results from deleted documents hanging around in a separate store. An entire class of consistency bugs just disappears.
-- Document and embedding in one transaction. Searchable immediately.
BEGIN;
INSERT INTO documents (id, title, body, user_id, created_at)
VALUES ($1, $2, $3, $4, now());
INSERT INTO document_embeddings (document_id, embedding)
VALUES ($1, $5::vector); -- pgvector type
COMMIT;
-- Search with full SQL expressiveness
SELECT d.title, d.body,
e.embedding <=> $query_vector AS distance
FROM document_embeddings e
JOIN documents d ON d.id = e.document_id
WHERE d.user_id = $current_user
AND d.created_at > now() - interval '30 days'
ORDER BY distance
LIMIT 10;
Full SQL expressiveness: filter vectors by user_id, date range, category, or any relational predicate. Joins work. Foreign keys work. Try writing that query against Pinecone’s API.
Operational simplicity: your team already knows PostgreSQL. Adding pgvector is CREATE EXTENSION vector; and a new index type. No new operational domain to learn, no new failure modes to debug, no new on-call runbook to write. Operational simplicity has real engineering value that never shows up in benchmark comparisons.
The pgvector HNSW index (added in v0.5.0) performs well for workloads under a few million vectors. Benchmarks show 95-99% recall at under 10ms latency for typical RAG queries. The point where a dedicated vector database actually justifies its cost is higher than most teams think. Production RAG systems on pgvector handle millions of vectors without breaking a sweat.
The Multi-Tenancy Trap
Multi-tenancy in vector databases needs serious thought because the failure mode isn’t “wrong results.” It’s silent data leakage.
The metadata filter approach stores all tenant embeddings in a single index and applies a tenant_id filter at query time. Operationally simple. Works well. Right up until someone writes a query path that forgets the filter. Maybe a new endpoint. Maybe a refactor that accidentally drops the filter parameter. Maybe a test that hardcodes the query without the filter and accidentally ships to production. The query does not fail. It returns HTTP 200. It returns results. Those results include documents from other tenants. No error. No warning. No audit log entry.
Don’t: Rely on application-level tenant_id filters as your only tenant isolation boundary. Every query path, every new endpoint, every refactor must remember to include it. One miss and you leak data silently.
Do: Use database-level enforcement. With pgvector, PostgreSQL’s RLS (Row Level Security) policies enforce the tenant filter at the database layer. The application cannot bypass it even if code omits the parameter. For dedicated vector stores, use namespace separation per tenant.
In data engineering contexts where AI applications handle sensitive customer data, cross-tenant exposure is a liability issue, not a quality issue.
Namespace isolation (separate collections or indexes per tenant) gives you hard boundaries but makes index management harder as tenant count grows. For applications where data isolation is a security or compliance requirement, the extra complexity is worth it.
If your compliance framework requires “demonstrate data isolation,” metadata filtering will fail the audit. Namespace isolation passes. Cluster isolation passes without discussion.
| Approach | How It Works | Data Isolation | Risk | Performance |
|---|---|---|---|---|
| Metadata filter | All tenants share one index. Query includes WHERE tenant_id = X | Logical only. Data co-located in same index | If tenant_id filter is omitted (bug, misconfiguration), query returns other tenants’ data. Silent leakage | Fastest. Single index, no overhead |
| Namespace/collection per tenant | Separate collection or namespace per tenant within same cluster | Strong logical isolation. No filter required | Misconfigured permissions could still cross namespaces. But no silent leakage from missing filter | Slight overhead. More collections to manage |
| Cluster per tenant | Dedicated vector DB cluster per tenant | Physical isolation. No shared infrastructure | Highest cost. Operational overhead scales linearly with tenants | Best. No noisy neighbor. No cross-tenant risk |
Make this architecture decision at design time with your security requirements , not after a customer discovers documents from another tenant in their search results.
Embedding Model Upgrades: The Migration Nobody Plans For
Embedding model upgrades invalidate your entire index. All of it. Immediately. No migration path. Every team gets caught off guard by this one.
A new model produces different-dimension vectors. Your existing index is mathematically incompatible. Mixing old and new embeddings makes cosine similarity meaningless. Comparing Fahrenheit to Celsius. Numbers present, meaning absent.
Re-embedding a multi-million-document corpus is not catastrophic on compute. But the operational coordination is real. Build a new index in parallel. Validate search quality against a test suite. Cut over. For a multi-tenant system with dozens of customers, that coordination multiplies.
- Source text stored alongside every embedding so re-embedding is a batch job
- Model version and embedding timestamp recorded per vector
- Search quality test suite with known-good query/result pairs for regression testing
- Parallel index infrastructure capable of running old and new indexes at the same time during cutover
- Embedding pipeline that can reprocess the full corpus within your maintenance window
The high-cardinality filter problem most teams discover too late
ANN indexes optimize for global nearest neighbor search. When you add a high-cardinality filter like user_id across millions of users, the index cannot efficiently prune the search space. Each user’s search space is tiny relative to the full index, but the ANN algorithm still navigates the full graph before applying the filter.
The crossover point is typically 10K-50K unique filter values. Below that, filtered search against a single index works fine. Above it, per-user (or per-group) collections often run 10x faster than filtered search. This is an architecture decision that must happen at design time. Migrating from a single filtered index to per-tenant collections after the product is live means re-indexing every vector with a new collection assignment.
What the Industry Gets Wrong About Vector Databases
“Every RAG application needs a dedicated vector database.” Most LangChain tutorials begin with “Step 1: Set up Pinecone.” pgvector in PostgreSQL handles most RAG workloads under 10 million vectors with full ACID consistency, no sync jobs, and zero additional operational domains. The dedicated vector database earns its complexity at scale. Most teams will not reach that scale for years.
“Metadata filtering provides adequate multi-tenant isolation.” It does, until a code path omits the tenant filter. The query returns HTTP 200 with results from other tenants. No error, no warning, no audit log entry. For security-sensitive workloads, namespace isolation or database-level RLS is the only defensible architecture.
“Embedding model upgrades are seamless.” Every model upgrade invalidates your entire index immediately. New models produce vectors in different dimensional spaces. You cannot mix old and new embeddings. If you have not stored source text alongside embeddings, re-embedding becomes a data archaeology expedition instead of a batch job.
Three databases, three failure modes, one sync job quietly producing ghost results. Or one database, one transaction, and documents that are searchable the instant they exist and gone the instant they’re deleted. Store source text alongside embeddings. Store model version and timestamp. When the embedding model upgrade arrives, re-embedding is a weekend batch job , not a crisis.