AI Embedding Drift: Why Vector Search Quality Degrades Over Time

The Three Sources of Drift

Query distribution drift

Users start asking different questions than they did at launch. Your eval set is stale before you notice.

Corpus drift

Documents are added, edited, removed. The embeddings for old chunks no longer reflect current content.

Embedding model drift

Vendor releases a new embedding model. Your existing vectors are now incompatible with new ones — silent quality crash if mixed.

Tokenizer/preprocessing drift

Subtle changes to how text is cleaned, chunked, or normalized between when you indexed and when you query. Hard to spot, brutal in effect.

Detection Signals

Drift rarely announces itself. The signals are subtle: lower citation accuracy, more "I don't know" responses, longer query reformulations, more user retries. The teams that catch drift early track these signals continuously.

Top-k similarity scores trending down

If the average similarity of the top retrieved chunk is dropping, queries are matching less well. Often the earliest signal.

Empty or low-confidence retrievals

Track the % of queries where no chunk passes a similarity threshold. Trending up = drift.

User reformulation rate

If users increasingly type the same query 2-3 times, retrieval isn't serving them. Behavioral drift signal.

Citation accuracy decline

If your eval set shows citation correctness dropping over time without prompt changes, retrieval is the suspect.

Retrieval latency creep

Indexes growing without optimization eventually slow down. Latency creep often correlates with quality drift.

Re-indexing Strategy

Re-indexing is the brute-force fix. Done right, it keeps quality high. Done wrong, it's expensive and disruptive. The strategy depends on which drift you're fighting.

Incremental re-indexing

Add new chunks; update changed chunks; delete removed chunks. The default daily-or-weekly motion. Cheapest.

Full re-index on model upgrade

When the embedding model changes, you must re-embed everything. Plan for it; budget compute.

Shadow re-index for migration

Re-embed in parallel; compare quality on a sample before swapping. Avoids the "upgraded and crashed" risk.

Tiered re-indexing

Hot content (last 90 days) re-indexed weekly; cold content monthly. Balances cost and freshness.

Keep RAG Quality High Over Years

The AI PM Masterclass walks through real RAG operations — eval, drift detection, re-indexing playbooks — taught by a Salesforce Sr. Director PM.

Embedding Model Migrations

Why migrations are dangerous

Old vectors and new vectors are not directly comparable. Mixing them in the same index produces gibberish similarity scores. Migrate everything or nothing.

The shadow migration playbook

Re-embed your corpus with the new model into a parallel index. Run queries against both. Compare retrieval quality on golden set. Switch over only when new index is clearly better.

Cost considerations

Re-embedding 10M chunks at ~$0.02/1K tokens is real money. Budget for migrations every 12-24 months for major model changes.

Versioned embeddings

Store the model version alongside each embedding. Lets you detect mixed-vintage indexes immediately.

Mistakes That Make Drift Worse

No eval on retrieval

If you only eval end-to-end answer quality, you can't isolate retrieval drift. Eval retrieval as its own component.

Mixing embedding model versions

Old and new vectors in the same index destroy quality silently. Always version-tag.

Over-indexing on similarity scores

A 0.85 similarity today and a 0.85 similarity in six months may not be comparable. Watch trends, not absolute numbers.

No corpus refresh discipline

If old documents stay indexed forever, they pollute results. Implement TTLs or staleness flags.

Refusing to re-index

Some teams hold off on re-indexing because it's expensive — and pay 5x in support load and lost trust. Budget the work.