Fine-Tuning vs. RAG vs. Prompting: A Decision Framework

Three ways to adapt an LLM to your domain, each with a different cost-quality profile. A practical framework for choosing.

Arrayz Engineering

Get It Deployed Engineering

Teams reach for fine-tuning when they should reach for prompting, and for RAG when they need fine-tuning. The three techniques solve different problems, and choosing wrong wastes weeks.

Start with prompting

Prompt engineering — clear instructions, examples, and structured outputs — is the cheapest, fastest lever and surprisingly often enough. Before anything more involved, exhaust what good prompting and few-shot examples can do. The iteration loop is minutes, not days.

Use RAG for knowledge

If the problem is that the model doesn't know your facts — your docs, your data, your policies — that's a retrieval problem, not a training problem. RAG injects the right knowledge at inference time and keeps it current without retraining. Fine-tuning facts into a model is expensive and goes stale.

Fine-tune for behaviour

Fine-tuning shines when you need a consistent format, tone, or task behaviour that prompting can't reliably enforce, or when you want a smaller, cheaper model to match a larger one on a narrow task. It changes how the model behaves, not what it knows.

Prompting: cheapest, fastest, try first
RAG: for current, proprietary knowledge
Fine-tuning: for consistent behaviour and cost reduction
Often the answer is a combination of all three

RAG changes what the model knows; fine-tuning changes how it behaves. Don't confuse the two.

#llm#fine-tuning#rag

Keep reading

LLMs

5 min read

The Discipline of LLM Evaluation

If you can't measure your LLM application's quality, you can't improve it. Building evaluation into the core of your development loop.

November 15, 2025Read

AI Agents

6 min read

Architecting Production AI Agents That Don't Break

The gap between an agent demo and a production agent is enormous. Here's the architecture that closes it: planning, typed tools, memory, and guardrails.

January 22, 2026Read

RAG

6 min read

RAG That Actually Works: Beyond the Naive Pipeline

Naive RAG — embed, retrieve top-k, stuff into a prompt — fails the moment it meets a real corpus. Here's what production retrieval requires.

January 15, 2026Read

View all articles