All articles
LLMs
December 10, 20255 min read

Fine-Tuning vs. RAG vs. Prompting: A Decision Framework

Three ways to adapt an LLM to your domain, each with a different cost-quality profile. A practical framework for choosing.

A
Arrayz Engineering
Get It Deployed Engineering

Teams reach for fine-tuning when they should reach for prompting, and for RAG when they need fine-tuning. The three techniques solve different problems, and choosing wrong wastes weeks.

Start with prompting

Prompt engineering — clear instructions, examples, and structured outputs — is the cheapest, fastest lever and surprisingly often enough. Before anything more involved, exhaust what good prompting and few-shot examples can do. The iteration loop is minutes, not days.

Use RAG for knowledge

If the problem is that the model doesn't know your facts — your docs, your data, your policies — that's a retrieval problem, not a training problem. RAG injects the right knowledge at inference time and keeps it current without retraining. Fine-tuning facts into a model is expensive and goes stale.

Fine-tune for behaviour

Fine-tuning shines when you need a consistent format, tone, or task behaviour that prompting can't reliably enforce, or when you want a smaller, cheaper model to match a larger one on a narrow task. It changes how the model behaves, not what it knows.

  • Prompting: cheapest, fastest, try first
  • RAG: for current, proprietary knowledge
  • Fine-tuning: for consistent behaviour and cost reduction
  • Often the answer is a combination of all three

RAG changes what the model knows; fine-tuning changes how it behaves. Don't confuse the two.

#llm#fine-tuning#rag

Let's build something that ships.

Bring us a problem. We'll tell you honestly whether AI is the right tool — and exactly how we'd build it.