2026 · 01 · 176 min read

RAG Is Not Architecture

RAG is a technique. If your 'AI strategy' is a vector database, you don't have one.

01 · Product surface→02 · Decisions the AI makes→03 · How we know it's wrong→04 · Human in the loop→05 · Retrieval · here→06 · Failure mode→07 · Unit economics

Fig., Architectural decisions, in order

I get this deck at least once a quarter. Slide one: the company's problem. Slide two: an arrow pointing at a vector database. Slide three: 'RAG'. Slide four: 'launch'.

That is not a strategy. That is a vendor list.

Retrieval-augmented generation is, at its best, a small technique inside a larger system, the same way caching is a technique inside a web stack. You do not walk into a CTO meeting in 2012 and say your mobile strategy is memcached. You shouldn't do it now.

The actual architectural decisions, in order: (1) what is the surface area of the product, (2) what decisions is the AI making on behalf of the user, (3) how do we know when those decisions are wrong, (4) where is the human in the loop, (5) what happens when the model is unavailable, (6) what data crosses which boundary, and (7) what is the unit economics at scale. Retrieval shows up around point four, and it is in service of every other point.

When a team leads with RAG, what they are telling me is: we could not get the model to do the thing, so we are going to ship the model with a library card and hope. This sometimes works. More often, you end up with a system that confidently cites real documents to support false conclusions, the worst failure mode in all of applied AI, because every layer of the stack looks healthy.

If you want to lead with something, lead with the eval. Tell me the 300 questions your system is expected to answer, the correct answers, and how often it gets them right today. Then we can talk about whether you need retrieval, fine-tuning, a tool call, or a small model trained on your own data. Usually the answer is three of the four.

§ Related, keep reading

2026 · 04 · 1211 min read

The Research → Product Gap

Why most 'AI breakthroughs' never ship, and the 12-week playbook I used at Google Brain to move them from paper to production.

read →

2025 · 11 · 0914 min read

Notes on Governing AI at Hyperscale

What I learned authoring Google's company-wide AI/ML privacy framework, and how I'd rewrite it for 2026.

read →

If this was useful, the weekly Brief covers shorter ideas like this every Wednesday.

Read the Briefs →