Prompt Engineering
Section: 01-AI-Foundations Status: COMPLETE Last Updated: 2026-06-30 Difficulty: Intermediate
Trade-offs and Considerations
Prompt Length vs. Cost
System prompts are cached by Claude and many other providers after the first call. A longer, more detailed system prompt costs more on the first call but is effectively free on subsequent calls within the cache TTL (5 minutes for Claude). This means:
- Don't truncate the system prompt to save tokens at the cost of output quality
- Do structure the system prompt so stable content comes first (cached) and dynamic content comes last
- Monitor cache hit rates — low cache hit rates indicate the system prompt is changing too frequently or calls are too infrequent to benefit from caching
Prompt Versioning
Prompts must be version-controlled like code:
# prompts/claude/clinical/discharge-summary-v2.1.yaml
# Educational Example — Illustrative prompt versioning structure.
# Verify current model IDs in official documentation before use.
name: discharge-summary
version: "2.1"
model: claude-opus-4-8 # Verify current model ID at docs.anthropic.com
last_updated: "2026-06-30"
reviewed_by: "[CMIO or designated clinical reviewer]"
clinical_approval: true
change_log:
- version: "2.1"
date: "2026-06-30"
change: "Added explicit contraindication handling for high-alert medications"
- version: "2.0"
date: "2026-04-15"
change: "Restructured output format to match Joint Commission documentation standards"
system_prompt: |
[full system prompt text]
test_cases:
- input: "..."
expected_output_contains: ["Assessment", "Plan", "Physician review required"]Comparison Table
| Technique | When to Use | Overhead | Quality Gain |
|---|---|---|---|
| Zero-shot | Well-defined, simple tasks | None | Baseline |
| Few-shot (2–5 examples) | Format-sensitive, judgment-requiring tasks | ~500–1000 tokens | High |
| Chain-of-thought | Complex multi-step reasoning | ~200–500 output tokens | Very High |
| Structured output (JSON schema) | Programmatically parsed outputs | Minimal | Critical for reliability |
| Role + constraints | Any production use case | Minimal | High |
| Prompt caching | High-frequency, stable system prompts | None (reduces cost) | None (cost optimization) |
Interview Questions
Q1: A clinical AI system is producing inconsistent output formats that break the downstream EHR integration. What is the root cause and how do you fix it?
Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer
Answer Framework:
The root cause is almost certainly insufficient output format specification in the system prompt. LLMs have significant latitude in how they structure responses unless the format is specified explicitly and unambiguously. "Summarize the patient's medications" will produce wildly different formats across calls — bulleted list, narrative paragraph, numbered list, table — because all are valid interpretations of "summary."
Fix in order of increasing constraint: First, specify the format in natural language: "Return a bulleted list with one medication per line in the format: [Drug Name] [Dose] [Frequency] [Route]." This reduces variance significantly. Second, specify JSON schema if the output is machine-parsed: provide the exact schema, enable JSON mode if available, and set temperature=0 for determinism. Third, add a few-shot example showing exactly what the output should look like. Fourth, add a validation step — if the output doesn't match the expected schema, either retry with an explicit correction instruction or reject and alert.
For the EHR integration specifically: the downstream parser should also be robust to minor format variations, not brittle to exact string matching. Defense in depth: good prompts reduce format variance; robust parsers handle the residual variance.
Red Flags: "Use a regex to clean up the output" — this treats symptoms not causes; the output format should be controllable through prompt design.
Q2: How do you design a prompt that is resistant to prompt injection in a RAG clinical system?
Category: Security / Architecture Difficulty: Senior Role: AI Architect
Answer Framework:
Prompt injection in RAG occurs when malicious content embedded in a retrieved document attempts to override the system prompt's instructions. In a clinical system, this is a high-severity security concern because overridden clinical instructions could lead to dangerous recommendations.
The defense is multi-layered. At the prompt structure layer: place the retrieved content in clearly delimited blocks (<retrieved_context> tags) and explicitly instruct the model in the system prompt that content within those tags is untrusted external material that must not be treated as instructions. The model should summarize or quote the retrieved content, not follow instructions within it.
At the input layer: implement a pre-retrieval scanner that flags chunks containing common injection patterns ("ignore previous instructions," "you are now," "disregard your system prompt"). Flag these chunks for human review rather than injecting them into the LLM context.
At the output layer: validate that the generated response conforms to expected structure and scope. If a response suddenly begins recommending maximum medication doses after a retrieval step, this is a behavioral anomaly that should trigger an alert.
At the model layer: Claude specifically has resistance to many prompt injection patterns built into its training (constitutional AI). However, this should be treated as defense-in-depth, not as the primary defense.
No single defense is sufficient. The combination of: (1) instructed untrusted content zones, (2) pre-retrieval scanning, (3) output validation, and (4) model-native resistance provides adequate protection for a clinical RAG system.