Interview Preparation — Quick Reference
One-Line Definition
The AI Architect interview tests whether you can design, evaluate, and operate enterprise AI systems — not just implement them.
Interview Stage Quick Reference
| Stage | Duration | What's Tested | Primary Differentiator |
|---|---|---|---|
| Recruiter Screen | 30 min | Role fit, communication | Your narrative: what AI systems have you built? |
| Technical Phone | 60 min | RAG, agents, infra depth | Two levels deeper than what you claim |
| System Design | 60–90 min | End-to-end architecture | Structure before details; trade-offs unprompted |
| Architecture Review | 60 min | Finding failure modes | Ask clarifying questions first; prioritize findings |
| Coding | 45–60 min | Python AI engineering patterns | State approach before writing; identify edge cases |
| Behavioral | 45–60 min | Leadership, failure, influence | STAR framework; first person singular |
8-Step System Design Framework
1. Clarify requirements (5 min)
→ Functional? Scale? Latency? Quality? Constraints?
2. Identify AI problem type (2 min)
→ RAG / Classification / Generation / Agentic?
3. High-level architecture (10 min)
→ Draw zones/layers BEFORE named components
4. Walk a representative request (10 min)
→ Trace one request end-to-end through every component
5. Deep dive on critical components (15 min)
→ Let interviewer guide; have depth on: pipeline, retrieval, caching, LLM layer
6. Trade-offs and alternatives (10 min)
→ What changes at 10x? What did you NOT choose and why?
7. Non-functional requirements (5 min)
→ Security, observability, cost model, DR
8. Invite dialogue (throughout)
→ "I'm assuming X — let me know if you'd like me to change that"Architecture Checklist — Components to Cover
For every AI system design, check these off before finishing:
Infrastructure:
☐ Authentication and authorization (JWT, SMART on FHIR for healthcare)
☐ Rate limiting (token-per-minute, not request-per-minute)
☐ Circuit breaker (per-provider, Redis-backed)
☐ Semantic cache (threshold, TTL, invalidation)
Quality:
☐ Evaluation pipeline (golden queries, MRR, LLM-as-judge)
☐ Model version pinning
☐ Rollback mechanism
Security (add all for healthcare):
☐ PHI scope (is PHI in the prompt?)
☐ BAA status (who needs it?)
☐ HIPAA audit log (patient_id + user_id + action, never prompt text)
☐ Prompt injection defense (structural prompting + input validation)
Operations:
☐ Latency budget (breakdown: embed + search + LLM + network)
☐ Failure handling (what happens when each component fails?)
☐ Observability (which metrics alert you before users notice?)
☐ Cost model (token usage × model tier × volume = monthly estimate)Common Interview Failure Modes
| Mistake | What to do instead |
|---|---|
| Jump into components without requirements | Spend 5 min on clarification first |
| Design without latency budget | Name the SLA; estimate each component's contribution |
| Mention HIPAA without specifics | Name: BAA, audit log fields, minimum necessary, WORM retention |
| Skip trade-offs until asked | Bring up trade-offs before the interviewer asks |
| "We" instead of "I" in behavioral answers | First person singular throughout |
| No result in STAR answer | Quantify or qualify outcome |
| Describe only what the system does, not how it fails | Always describe: what happens when component X fails? |
Technical Vocabulary Quick Reference
| Use | Not |
|---|---|
| Retrieval-Augmented Generation (RAG) | AI-enhanced search |
| Embedding model | Vector model |
| Agentic workflow | AI automation |
| Inference endpoint | AI API call |
| Context window | Memory limit |
| Tool call / function call | Plugin call |
| Multi-agent system | AI team |
| Orchestration layer | AI controller |
| PHI (Protected Health Information) | Patient data |
| BAA (Business Associate Agreement) | HIPAA contract |
| Minimum necessary standard | Data minimization |
| CDS Hooks | EHR integration API |
| SMART on FHIR | Healthcare OAuth |
| Continuous batching | Batch inference |
| PagedAttention (vLLM) | Memory management |
| HNSW | Vector index |
Interview Questions for You to Ask
Asking strong questions signals principal-level thinking:
About the AI systems:
- "What is the scale of your current AI workloads? Tokens per day? Concurrent users?"
- "What is the PHI surface area of your AI systems? How do you handle HIPAA compliance today?"
- "How do you detect quality regressions before users notice?"
About the team and platform:
- "How do product teams consume AI capabilities — directly via API, or through a platform layer?"
- "Who owns evaluation? Is it on the model team, the product team, or the platform team?"
About the role:
- "What does success look like for this role in the first 90 days?"
- "What architectural decision that was already made would you reconsider if you could?"
Common Interview Questions — One-Line Answers
"What is RAG?" Retrieve relevant documents at inference time and inject them into the LLM prompt — grounds generation in current, organization-specific knowledge with citations.
"When would you fine-tune vs. use RAG?" RAG for knowledge gaps (current, org-specific info); fine-tune for format/vocabulary adherence on high-volume tasks. Try RAG first.
"What is the CDS Hooks 5-second SLA?" EHR will timeout the CDS service at 5 seconds; return empty cards {cards: []} on timeout, never a 500 error.
"What is PHI-safe logging?" Log metadata only: userid, patientid, action, model, token counts. Never log prompt text or response text.
"What is the minimum necessary standard?" Only include in AI context the PHI fields the use case actually requires. Drug interaction check: medications + allergies only, no name/address.
"What is a circuit breaker?" After N failures, stop sending requests to the failing provider and route to secondary. Reset after a cooldown period.
"What is semantic caching?" Cache query-response pairs; on new query, embed and compare to cached embeddings — return cached response if cosine similarity ≥ threshold.