Fine-Tuning vs. RAG vs. Prompting: A Decision Framework
Three ways to adapt an LLM to your domain, each with a different cost-quality profile. A practical framework for choosing.
Teams reach for fine-tuning when they should reach for prompting, and for RAG when they need fine-tuning. The three techniques solve different problems, and choosing wrong wastes weeks.
Start with prompting
Prompt engineering — clear instructions, examples, and structured outputs — is the cheapest, fastest lever and surprisingly often enough. Before anything more involved, exhaust what good prompting and few-shot examples can do. The iteration loop is minutes, not days.
Use RAG for knowledge
If the problem is that the model doesn't know your facts — your docs, your data, your policies — that's a retrieval problem, not a training problem. RAG injects the right knowledge at inference time and keeps it current without retraining. Fine-tuning facts into a model is expensive and goes stale.
Fine-tune for behaviour
Fine-tuning shines when you need a consistent format, tone, or task behaviour that prompting can't reliably enforce, or when you want a smaller, cheaper model to match a larger one on a narrow task. It changes how the model behaves, not what it knows.
- Prompting: cheapest, fastest, try first
- RAG: for current, proprietary knowledge
- Fine-tuning: for consistent behaviour and cost reduction
- Often the answer is a combination of all three
RAG changes what the model knows; fine-tuning changes how it behaves. Don't confuse the two.
Keep reading
The Discipline of LLM Evaluation
If you can't measure your LLM application's quality, you can't improve it. Building evaluation into the core of your development loop.
Architecting Production AI Agents That Don't Break
The gap between an agent demo and a production agent is enormous. Here's the architecture that closes it: planning, typed tools, memory, and guardrails.
RAG That Actually Works: Beyond the Naive Pipeline
Naive RAG — embed, retrieve top-k, stuff into a prompt — fails the moment it meets a real corpus. Here's what production retrieval requires.