RAG Is Not Architecture
I get this deck at least once a quarter. Slide one: the company's problem. Slide two: an arrow pointing at a vector database. Slide three: 'RAG'. Slide four: 'launch'.
That is not a strategy. That is a vendor list.
Retrieval-augmented generation is, at its best, a small technique inside a larger system, the same way caching is a technique inside a web stack. You do not walk into a CTO meeting in 2012 and say your mobile strategy is memcached. You shouldn't do it now.
The actual architectural decisions, in order: (1) what is the surface area of the product, (2) what decisions is the AI making on behalf of the user, (3) how do we know when those decisions are wrong, (4) where is the human in the loop, (5) what happens when the model is unavailable, (6) what data crosses which boundary, and (7) what is the unit economics at scale. Retrieval shows up around point four, and it is in service of every other point.
When a team leads with RAG, what they are telling me is: we could not get the model to do the thing, so we are going to ship the model with a library card and hope. This sometimes works. More often, you end up with a system that confidently cites real documents to support false conclusions, the worst failure mode in all of applied AI, because every layer of the stack looks healthy.
If you want to lead with something, lead with the eval. Tell me the 300 questions your system is expected to answer, the correct answers, and how often it gets them right today. Then we can talk about whether you need retrieval, fine-tuning, a tool call, or a small model trained on your own data. Usually the answer is three of the four.