Vector Databases for Enterprise: pgvector vs Dedicated Stores
A team follows a LangChain tutorial on a Friday afternoon and spins up Pinecone for their RAG application. Six months later, they are operating Pinecone for vector search, PostgreSQL for user data, Redis for caching, and a Lambda function that syncs document metadata between Pinecone and Postgres every 5 minutes. Three databases. Three operational domains. Three failure modes. And a consistency bug where deleted documents are still searchable for up to 5 minutes because the sync job has not run yet. A user asks a question, gets an answer grounded in a document that was deleted an hour ago, and files a support ticket. Nobody can figure out why the AI is “hallucinating” until someone checks the sync job timing.
The coupling between “building an LLM application” and “needing a dedicated vector database” deserves real scrutiny. Vector databases are the right tool for specific problems at specific scales. They add genuine operational overhead. And for a significant portion of production RAG applications (far more than most teams realize), a well-configured PostgreSQL instance with pgvector provides adequate performance with dramatically less complexity. The document and its embedding live in the same database, in the same transaction, with the same consistency guarantees your team already knows how to operate.
What Vector Search Actually Does
Strip away the marketing and a vector database does one thing: given a query vector (a list of 768 or 1536 floating-point numbers), find the k stored vectors that are most similar by cosine distance or dot product. That is it. Everything else (the document storage, the metadata filtering, the hybrid search) is built on top of that single primitive.
Exact nearest neighbor search compares the query against every stored vector. Perfectly accurate. Scales as O(n). At 100K vectors, it is fast enough. At 1 million vectors, response times cross 100ms. At 10 million, you are measuring in seconds. Interactive applications cannot wait that long.
ANN algorithms like HNSW (Hierarchical Navigable Small World) and IVF (Inverted File Index) build index structures that enable sub-linear query time at the cost of occasionally missing the true nearest neighbor. HNSW builds a multi-layer graph where each node connects to its nearest neighbors. Queries navigate the graph from a random entry point, getting closer to the target at each hop. At ef=128 (the search beam width), recall is typically 95-99%. For RAG applications, that 1-5% miss rate is irrelevant. The context you missed gets replaced by slightly-less-similar context that works just as well. Nobody notices.
Selecting the right retrieval architecture requires evaluating actual accuracy, latency, and operational requirements against your AI and machine learning maturity.
The Case for pgvector
For applications with fewer than 5 million documents, pgvector in PostgreSQL provides advantages that the vector database marketing consistently underplays. Here is what you actually get.
Transactional consistency: insert a document, its metadata, and its embedding in a single BEGIN...COMMIT. When the transaction commits, the document is searchable. When it rolls back, it is not. No sync jobs. No consistency windows. No phantom results where a deleted document is still searchable because the vector store has not caught up with the relational database. This alone eliminates an entire class of bugs that teams with separate stores spend weeks debugging. Remember that support ticket from the opening? This is how you never get it.
Full SQL expressiveness: filter vectors by user_id, date range, category, or any relational predicate. Joins work. Foreign keys work. You can write SELECT * FROM documents ORDER BY embedding <=> $query_vector LIMIT 10 WHERE user_id = $current_user AND created_at > now() - interval '30 days'. Try doing that in Pinecone.
Operational simplicity: your team already knows how to operate, back up, and monitor PostgreSQL. Adding pgvector is CREATE EXTENSION vector; and a new index type. It does not add a new operational domain, new failure modes, or new on-call procedures. That operational simplicity has real engineering value that never shows up in benchmark comparisons.
The pgvector HNSW index (added in v0.5.0) provides competitive ANN performance for workloads under a few million vectors. Benchmarks show 95-99% recall at under 10ms latency for typical RAG queries. The performance crossover point, where a dedicated vector database becomes worth its operational cost, is higher than most teams assume. Production RAG systems on pgvector handle 3 million vectors at sub-15ms p99 latency. Most teams will not hit that scale for years, if ever. Do not over-engineer for a future that may never arrive.
The Multi-Tenancy Risk
Multi-tenancy in vector databases deserves explicit engineering attention because the failure mode is not “wrong results.” It is data leakage. And the failure is completely silent.
The metadata filter approach stores all tenant embeddings in a single index and applies a tenant_id filter at query time. Operationally simple. Works well. Until someone writes a query path that forgets the filter. Maybe it is a new endpoint. Maybe it is a refactor that accidentally drops the filter parameter. Maybe it is a test that hardcodes the query without the filter and accidentally ships to production. The query does not fail. It returns HTTP 200. It returns results. Those results include documents from other tenants. No error. No warning. No audit log entry saying “cross-tenant access detected.” Just a customer asking “why is my AI assistant citing a document from Company X?”
This is not theoretical. This class of bug has caused real data leakage incidents in production RAG applications. It happens regularly in production. In data engineering contexts where AI applications handle sensitive customer data, this is a liability issue, not a quality issue.
Namespace isolation (separate collections or indexes per tenant) provides hard boundaries at the cost of more complex index management as tenant count grows. For applications where tenant data isolation is a security or compliance requirement, the operational complexity is the correct trade-off. Do not optimize for simplicity here. With pgvector, you can implement tenant isolation using PostgreSQL’s Row Level Security policies, which enforce the filter at the database level rather than trusting application code to always include it. Make this decision at design time with your security requirements, not after a customer discovers documents from another tenant in their search results.
The consequences of choosing the wrong isolation model are asymmetric. Namespace isolation adds operational complexity that teams can manage. A metadata filter omission causes silent cross-tenant data exposure that can take weeks to discover and may have regulatory implications.
Planning for Embedding Model Upgrades
Here is the operational reality that catches teams off guard: every embedding model upgrade invalidates your entire index. All of it. Immediately. OpenAI releases text-embedding-3-large to replace text-embedding-ada-002. The new model produces 3072-dimensional vectors instead of 1536. Your existing index is mathematically incompatible. You cannot mix old and new embeddings in the same index because cosine similarity between vectors from different models is meaningless.
This means re-embedding your entire corpus. For a 10-million-document corpus, the re-embedding compute and API cost is modest. Not catastrophic. But the operational cost is real: you need to run the re-embedding batch job, build a new index in parallel, validate search quality against a test set, then cut over. For a multi-tenant system with 50 customers, that is a coordinated migration.
The architectural lesson is simple: always store the source text alongside the embedding. If your vector store only contains the embedding and a document ID that points to another system, re-embedding requires reconstructing the text from the source. If the source has changed since the original embedding, your data pipeline just got a lot more complicated. Store the text. Store the model version used. Store the embedding timestamp. When the upgrade comes (and it will), re-embedding is a batch job you run over a weekend, not a data archaeology expedition.