Interview Preparation — Quick Reference

One-Line Definition

The AI Architect interview tests whether you can design, evaluate, and operate enterprise AI systems — not just implement them.


Interview Stage Quick Reference

Stage Duration What's Tested Primary Differentiator
Recruiter Screen 30 min Role fit, communication Your narrative: what AI systems have you built?
Technical Phone 60 min RAG, agents, infra depth Two levels deeper than what you claim
System Design 60–90 min End-to-end architecture Structure before details; trade-offs unprompted
Architecture Review 60 min Finding failure modes Ask clarifying questions first; prioritize findings
Coding 45–60 min Python AI engineering patterns State approach before writing; identify edge cases
Behavioral 45–60 min Leadership, failure, influence STAR framework; first person singular

8-Step System Design Framework

text
1. Clarify requirements (5 min)
   → Functional? Scale? Latency? Quality? Constraints?

2. Identify AI problem type (2 min)
   → RAG / Classification / Generation / Agentic?

3. High-level architecture (10 min)
   → Draw zones/layers BEFORE named components

4. Walk a representative request (10 min)
   → Trace one request end-to-end through every component

5. Deep dive on critical components (15 min)
   → Let interviewer guide; have depth on: pipeline, retrieval, caching, LLM layer

6. Trade-offs and alternatives (10 min)
   → What changes at 10x? What did you NOT choose and why?

7. Non-functional requirements (5 min)
   → Security, observability, cost model, DR

8. Invite dialogue (throughout)
   → "I'm assuming X — let me know if you'd like me to change that"

Architecture Checklist — Components to Cover

For every AI system design, check these off before finishing:

text
Infrastructure:
☐ Authentication and authorization (JWT, SMART on FHIR for healthcare)
☐ Rate limiting (token-per-minute, not request-per-minute)
☐ Circuit breaker (per-provider, Redis-backed)
☐ Semantic cache (threshold, TTL, invalidation)

Quality:
☐ Evaluation pipeline (golden queries, MRR, LLM-as-judge)
☐ Model version pinning
☐ Rollback mechanism

Security (add all for healthcare):
☐ PHI scope (is PHI in the prompt?)
☐ BAA status (who needs it?)
☐ HIPAA audit log (patient_id + user_id + action, never prompt text)
☐ Prompt injection defense (structural prompting + input validation)

Operations:
☐ Latency budget (breakdown: embed + search + LLM + network)
☐ Failure handling (what happens when each component fails?)
☐ Observability (which metrics alert you before users notice?)
☐ Cost model (token usage × model tier × volume = monthly estimate)

Common Interview Failure Modes

Mistake What to do instead
Jump into components without requirements Spend 5 min on clarification first
Design without latency budget Name the SLA; estimate each component's contribution
Mention HIPAA without specifics Name: BAA, audit log fields, minimum necessary, WORM retention
Skip trade-offs until asked Bring up trade-offs before the interviewer asks
"We" instead of "I" in behavioral answers First person singular throughout
No result in STAR answer Quantify or qualify outcome
Describe only what the system does, not how it fails Always describe: what happens when component X fails?

Technical Vocabulary Quick Reference

Use Not
Retrieval-Augmented Generation (RAG) AI-enhanced search
Embedding model Vector model
Agentic workflow AI automation
Inference endpoint AI API call
Context window Memory limit
Tool call / function call Plugin call
Multi-agent system AI team
Orchestration layer AI controller
PHI (Protected Health Information) Patient data
BAA (Business Associate Agreement) HIPAA contract
Minimum necessary standard Data minimization
CDS Hooks EHR integration API
SMART on FHIR Healthcare OAuth
Continuous batching Batch inference
PagedAttention (vLLM) Memory management
HNSW Vector index

Interview Questions for You to Ask

Asking strong questions signals principal-level thinking:

About the AI systems:

  • "What is the scale of your current AI workloads? Tokens per day? Concurrent users?"
  • "What is the PHI surface area of your AI systems? How do you handle HIPAA compliance today?"
  • "How do you detect quality regressions before users notice?"

About the team and platform:

  • "How do product teams consume AI capabilities — directly via API, or through a platform layer?"
  • "Who owns evaluation? Is it on the model team, the product team, or the platform team?"

About the role:

  • "What does success look like for this role in the first 90 days?"
  • "What architectural decision that was already made would you reconsider if you could?"

Common Interview Questions — One-Line Answers

"What is RAG?" Retrieve relevant documents at inference time and inject them into the LLM prompt — grounds generation in current, organization-specific knowledge with citations.

"When would you fine-tune vs. use RAG?" RAG for knowledge gaps (current, org-specific info); fine-tune for format/vocabulary adherence on high-volume tasks. Try RAG first.

"What is the CDS Hooks 5-second SLA?" EHR will timeout the CDS service at 5 seconds; return empty cards {cards: []} on timeout, never a 500 error.

"What is PHI-safe logging?" Log metadata only: userid, patientid, action, model, token counts. Never log prompt text or response text.

"What is the minimum necessary standard?" Only include in AI context the PHI fields the use case actually requires. Drug interaction check: medications + allergies only, no name/address.

"What is a circuit breaker?" After N failures, stop sending requests to the failing provider and route to secondary. Reset after a cooldown period.

"What is semantic caching?" Cache query-response pairs; on new query, embed and compare to cached embeddings — return cached response if cosine similarity ≥ threshold.


See Also