Fine-Tuning vs RAG

Section: 01-AI-Foundations Status: COMPLETE Last Updated: 2026-06-30 Difficulty: Intermediate

Trade-offs and Considerations

Total Cost of Ownership Comparison

ℹ Note

Note on cost figures: Specific pricing is not quoted here because AI infrastructure costs change frequently. Verify current rates for embedding APIs, vector database hosting, and fine-tuning compute in official vendor documentation. The structural cost comparison below is the durable insight.

RAG (one-time setup + ongoing):

Initial: document ingestion, embedding, and indexing costs scale with corpus size
Ongoing: incremental index updates + vector store hosting (cloud-managed options available from all major providers)
No GPU infrastructure required
No data curation labor beyond document quality review

Fine-tuning (high upfront, low ongoing):

Data curation: 500–5,000 high-quality examples × clinical review labor (this is typically the dominant cost — the human review, not the compute)
Training compute: per fine-tuning run via API fine-tuning (consult provider documentation for current rates)
Evaluation: clinical review of test set outputs (often requires specialized clinical informatics staff)
Re-training on knowledge updates: recurring cost that can accumulate
Total for a single clinical fine-tuning project is substantially higher than RAG setup

For most enterprise clinical AI use cases, RAG has dramatically lower TCO and higher knowledge freshness. Fine-tuning is justified only when the behavioral improvement cannot be achieved through prompt engineering and the use case is stable enough to amortize the training investment.

Fine-Tuning Risks

Risk	Description	Mitigation
Catastrophic forgetting	Fine-tuning on a narrow dataset degrades performance on general tasks	Use low learning rates; evaluate on general benchmarks post-tuning
Training data poisoning	Malicious examples in training data can embed adversarial behaviors	Human review of all training examples before fine-tuning
Knowledge staleness	Fine-tuned knowledge becomes outdated	Hybrid approach: fine-tune for behavior, RAG for knowledge
Hallucination amplification	Fine-tuning can make hallucination more confident	Never fine-tune on factually incorrect examples; rigorous evaluation
Overfitting	Too few examples → model memorizes rather than generalizes	Minimum 200 examples per class; use validation loss to detect

Comparison Table

Dimension	Prompt Engineering	RAG	Fine-Tuning	Hybrid (RAG + FT)
Time to production	Days	Weeks	Months	Months
Knowledge freshness	Static (prompt)	Real-time	Stale	Real-time
Behavioral control	Moderate	Moderate	High	Very High
Source attribution	Manual	Natural	None	Natural
Training cost	None	None	High	High
Inference cost	Baseline	Slightly higher	Baseline	Slightly higher
Clinical use case fit	Simple tasks	Knowledge Q&A	Format/style	Complex clinical AI

Interview Questions

Q1: A healthcare AI company wants to build a clinical note generation system. Should they fine-tune their model or use RAG? What additional information do you need?

Category: Architecture / System Design Difficulty: Senior Role: AI Architect

Answer Framework:

My default position is to use prompt engineering first, then RAG, then fine-tuning — in that order — because each step increases cost and complexity substantially. But let me ask the clarifying questions that would change this answer:

What specifically is failing with the baseline model? If the problem is that the model doesn't know the hospital's specific formulary restrictions → RAG. If the model doesn't know the latest clinical guidelines → RAG. If the model doesn't produce output in the required clinical documentation format → try few-shot prompt engineering first; if compliance rate is still insufficient (below ~90%) → fine-tuning. If both knowledge and format are problems → hybrid.

How stable is the target knowledge? Drug formularies change monthly. Clinical guidelines change quarterly. ICD-10 codes update annually. Any of these should live in RAG, not fine-tuning. Only truly stable, behavioral properties (output format, tone, reasoning structure, safety constraints) are appropriate for fine-tuning.

Do we need source attribution? If every clinical recommendation must cite the guideline it came from (required for liability and regulatory compliance in many healthcare AI contexts) → RAG is the only option. Fine-tuned knowledge cannot be attributed to a specific source.

What is the latency requirement? Real-time intra-encounter documentation (<500ms) may require a small fine-tuned model. Post-encounter documentation (batch, no SLA) can use a large RAG-augmented model.

In the typical HMS clinical note generation scenario, the right answer is hybrid: fine-tune a mid-tier Claude model on 500+ physician-authored note examples (for format and style), plus RAG retrieval of current clinical guidelines and patient-specific EHR context (for knowledge freshness and source attribution).

Fine-Tuning vs RAG#

Trade-offs and Considerations#

Total Cost of Ownership Comparison#

Fine-Tuning Risks#

Comparison Table#

Interview Questions#

Q1: A healthcare AI company wants to build a clinical note generation system. Should they fine-tune their model or use RAG? What additional information do you need?#