Interview Preparation — Quick Reference

One-Line Definition

The AI Architect interview tests whether you can design, evaluate, and operate enterprise AI systems — not just implement them.

Interview Stage Quick Reference

Stage	Duration	What's Tested	Primary Differentiator
Recruiter Screen	30 min	Role fit, communication	Your narrative: what AI systems have you built?
Technical Phone	60 min	RAG, agents, infra depth	Two levels deeper than what you claim
System Design	60–90 min	End-to-end architecture	Structure before details; trade-offs unprompted
Architecture Review	60 min	Finding failure modes	Ask clarifying questions first; prioritize findings
Coding	45–60 min	Python AI engineering patterns	State approach before writing; identify edge cases
Behavioral	45–60 min	Leadership, failure, influence	STAR framework; first person singular

8-Step System Design Framework

text

1. Clarify requirements (5 min)
   → Functional? Scale? Latency? Quality? Constraints?

2. Identify AI problem type (2 min)
   → RAG / Classification / Generation / Agentic?

3. High-level architecture (10 min)
   → Draw zones/layers BEFORE named components

4. Walk a representative request (10 min)
   → Trace one request end-to-end through every component

5. Deep dive on critical components (15 min)
   → Let interviewer guide; have depth on: pipeline, retrieval, caching, LLM layer

6. Trade-offs and alternatives (10 min)
   → What changes at 10x? What did you NOT choose and why?

7. Non-functional requirements (5 min)
   → Security, observability, cost model, DR

8. Invite dialogue (throughout)
   → "I'm assuming X — let me know if you'd like me to change that"

Architecture Checklist — Components to Cover

For every AI system design, check these off before finishing:

text

Infrastructure:
☐ Authentication and authorization (JWT, SMART on FHIR for healthcare)
☐ Rate limiting (token-per-minute, not request-per-minute)
☐ Circuit breaker (per-provider, Redis-backed)
☐ Semantic cache (threshold, TTL, invalidation)

Quality:
☐ Evaluation pipeline (golden queries, MRR, LLM-as-judge)
☐ Model version pinning
☐ Rollback mechanism

Security (add all for healthcare):
☐ PHI scope (is PHI in the prompt?)
☐ BAA status (who needs it?)
☐ HIPAA audit log (patient_id + user_id + action, never prompt text)
☐ Prompt injection defense (structural prompting + input validation)

Operations:
☐ Latency budget (breakdown: embed + search + LLM + network)
☐ Failure handling (what happens when each component fails?)
☐ Observability (which metrics alert you before users notice?)
☐ Cost model (token usage × model tier × volume = monthly estimate)

Common Interview Failure Modes

Mistake	What to do instead
Jump into components without requirements	Spend 5 min on clarification first
Design without latency budget	Name the SLA; estimate each component's contribution
Mention HIPAA without specifics	Name: BAA, audit log fields, minimum necessary, WORM retention
Skip trade-offs until asked	Bring up trade-offs before the interviewer asks
"We" instead of "I" in behavioral answers	First person singular throughout
No result in STAR answer	Quantify or qualify outcome
Describe only what the system does, not how it fails	Always describe: what happens when component X fails?

Technical Vocabulary Quick Reference

Use	Not
Retrieval-Augmented Generation (RAG)	AI-enhanced search
Embedding model	Vector model
Agentic workflow	AI automation
Inference endpoint	AI API call
Context window	Memory limit
Tool call / function call	Plugin call
Multi-agent system	AI team
Orchestration layer	AI controller
PHI (Protected Health Information)	Patient data
BAA (Business Associate Agreement)	HIPAA contract
Minimum necessary standard	Data minimization
CDS Hooks	EHR integration API
SMART on FHIR	Healthcare OAuth
Continuous batching	Batch inference
PagedAttention (vLLM)	Memory management
HNSW	Vector index

Interview Questions for You to Ask

Asking strong questions signals principal-level thinking:

About the AI systems:

"What is the scale of your current AI workloads? Tokens per day? Concurrent users?"
"What is the PHI surface area of your AI systems? How do you handle HIPAA compliance today?"
"How do you detect quality regressions before users notice?"

About the team and platform:

"How do product teams consume AI capabilities — directly via API, or through a platform layer?"
"Who owns evaluation? Is it on the model team, the product team, or the platform team?"

About the role:

"What does success look like for this role in the first 90 days?"
"What architectural decision that was already made would you reconsider if you could?"

Common Interview Questions — One-Line Answers

"What is RAG?" Retrieve relevant documents at inference time and inject them into the LLM prompt — grounds generation in current, organization-specific knowledge with citations.

"When would you fine-tune vs. use RAG?" RAG for knowledge gaps (current, org-specific info); fine-tune for format/vocabulary adherence on high-volume tasks. Try RAG first.

"What is the CDS Hooks 5-second SLA?" EHR will timeout the CDS service at 5 seconds; return empty cards {cards: []} on timeout, never a 500 error.

"What is PHI-safe logging?" Log metadata only: userid, patientid, action, model, token counts. Never log prompt text or response text.

"What is the minimum necessary standard?" Only include in AI context the PHI fields the use case actually requires. Drug interaction check: medications + allergies only, no name/address.

"What is a circuit breaker?" After N failures, stop sending requests to the failing provider and route to secondary. Reset after a cooldown period.

"What is semantic caching?" Cache query-response pairs; on new query, embed and compare to cached embeddings — return cached response if cosine similarity ≥ threshold.

Interview Preparation — Quick Reference#

One-Line Definition#

Interview Stage Quick Reference#

8-Step System Design Framework#

Architecture Checklist — Components to Cover#

Common Interview Failure Modes#

Technical Vocabulary Quick Reference#

Interview Questions for You to Ask#

Common Interview Questions — One-Line Answers#

See Also#