Architecture Review Facilitation
Conceptual Explanation
An architecture review has two phases that must be kept separate:
Phase 1 — Current State Elicitation: The FDE maps the client's actual current-state architecture. This requires listening and questioning, not presenting. The FDE's job in this phase is to build an accurate model of what exists, not to evaluate it.
Phase 2 — Gap and Risk Analysis: The FDE compares the current state against the required state for the target AI deployment. Gaps are the differences. Risks are the consequences of not closing those gaps.
Mixing these phases — critiquing the architecture while still eliciting information — causes the client to become defensive and stop sharing accurate information. The phases must be distinct.
Core Architecture: The Review Framework
Pre-Review Preparation (FDE Working Independently)
Before the review session, the FDE builds a pre-review architecture map from discovery and assessment artifacts:
@dataclass
class PreReviewArchitectureMap:
"""
FDE's working model of the client's architecture before the review session.
All items should be marked as CONFIRMED (verified) or ASSUMED (to confirm in review).
"""
# EHR Layer
ehr_system: str # "Epic FHIR R4, version 2023.1 — CONFIRMED"
ehr_integration_standard: str # "FHIR R4 + HL7 v2 ADT — CONFIRMED"
smart_on_fhir_status: str # "Available, App Orchard pending — CONFIRMED"
cds_hooks_availability: str # "Available in Epic — ASSUMED: confirm in review"
# Cloud / Network
cloud_provider: str # "Azure — CONFIRMED"
phi_in_cloud_policy: str # "Approved for HIPAA BAA vendors — CONFIRMED"
llm_outbound_access: str # "Permitted — ASSUMED: test not yet run"
ai_gateway_status: str # "Not deployed — CONFIRMED"
network_topology: str # "Single datacenter + Azure region — ASSUMED"
# Security
tls_policy: str # "TLS 1.2+ required — CONFIRMED"
audit_logging_capability: str # "Splunk SIEM — CONFIRMED"
baa_status: str # "Azure BAA signed; Anthropic BAA pending — CONFIRMED"
security_review_process: str # "IT Security review + Compliance sign-off — CONFIRMED"
# Organizational
ai_gateway_owner: str # "IT Infrastructure team — CONFIRMED"
model_governance: str # "No formal process — ASSUMED: confirm in review"
prompt_management: str # "Ad hoc — ASSUMED"
# Unknown / To Elicit
open_questions: list[str] # Items to resolve in the review sessionReview preparation checklist:
[ ] Current state architecture map drafted with CONFIRMED / ASSUMED tags
[ ] Discovery Summary and Assessment Report reviewed
[ ] Integration risk patterns identified (from standard pattern library)
[ ] Open questions documented (to resolve in review session)
[ ] Review agenda prepared and shared with client 48h before session
[ ] Whiteboard or diagramming tool ready for live architecture drawing
[ ] Architecture Review Report template prepared (fill during session)Review Session Structure
Duration: 3–4 hours for a major architecture review. Break at 90-minute intervals.
Participants:
- FDE (facilitator)
- Client: IT Architect, IT Director, Clinical Informatics Engineer (for healthcare)
- Optional: Cloud architect, security architect (for infrastructure-heavy reviews)
- Not in the room: Executives, sales, non-technical stakeholders (they get the report)
Agenda:
BLOCK 1 — Current State Architecture Walk (60–90 min)
Objective: Build an accurate current-state diagram collaboratively.
Method: FDE draws on whiteboard; client corrects and adds.
Opening: "I want to start by making sure I have an accurate picture of your
current architecture. I've built a working map from our prior conversations —
let's validate and correct it together."
Technique: Draw the FDE's pre-review map on the whiteboard. Ask the client
to correct it. Incorrect assumptions surface more information than open-ended
questions. "I have it that your integration engine sends ADT^A01 messages
to Epic — is that right, or does it go the other direction?"
Cover:
- Data flows for the target use case
- Authentication and authorization paths
- Network topology (on-prem / cloud / peering)
- Security architecture (where PHI flows, what controls exist)
- Existing integration points relevant to the AI use case
- Current monitoring and observability
BREAK — 15 minutes
BLOCK 2 — Target State and Gap Identification (60 min)
Objective: Map where the AI system needs to connect and identify the gaps.
Technique: Add the target AI architecture to the current-state diagram.
Draw the connections that need to exist for the AI system to work.
Ask: "What has to be true for this connection to work? Does that condition
currently hold?"
Cover:
- AI system component placement (SMART app, CDS Hooks service, AI gateway, LLM)
- Data paths from EHR to AI to LLM and back
- Authentication path (SMART token, AI gateway virtual key, LLM API key)
- Security controls on each path
- Failure modes for each connection
BREAK — 15 minutes
BLOCK 3 — Risk Assessment and Prioritization (45 min)
Objective: Identify, categorize, and prioritize the architectural risks.
Technique: Walk through each gap identified in Block 2. For each gap:
1. Is it a risk? (Could it cause failure or security incident?)
2. What is the likelihood? (High / Medium / Low given client's environment)
3. What is the consequence? (Service unavailability / HIPAA incident / clinical harm)
4. What is the mitigation?
Cover:
- PHI data path risks
- Availability risks (LLM downtime, latency, rate limits)
- Security risks (authentication gaps, audit log gaps)
- Governance risks (model updates without re-evaluation)
- Clinical safety risks (AI failure affecting clinical workflow)
BLOCK 4 — Recommendations and Next Steps (30 min)
Objective: Agree on which risks to mitigate, in what order, and who owns each.
Output: Architecture Review Report action items with owners and dates.Risk Pattern Library
Experienced FDEs recognize a small set of recurring architectural risk patterns in enterprise AI systems. Building these into a pattern library allows faster and more consistent risk identification:
ARCHITECTURE_RISK_PATTERNS = [
{
"pattern": "PHI in observability traces",
"description": "LLM inference requests containing PHI are logged verbatim in observability systems",
"detection_question": "Where do LLM inference request and response payloads go? Are they logged?",
"consequence": "HIPAA Security Rule violation; PHI accessible to any engineer with logging access",
"mitigation": "AI gateway scrubs PHI from all traces; only hashed patient IDs and metadata in logs"
},
{
"pattern": "No AI gateway — direct LLM API calls",
"description": "Application code calls LLM APIs directly without a centralized gateway",
"detection_question": "Where do LLM API keys live? Are they per-application or centralized?",
"consequence": "No cost attribution; no audit logging; no rate limiting; no prompt versioning",
"mitigation": "Deploy AI gateway (LiteLLM, Azure AI Foundry) before production"
},
{
"pattern": "CDS Hook with blocking LLM dependency",
"description": "CDS Hook service makes synchronous LLM call without circuit breaker",
"detection_question": "What happens in your CDS Hook service if the LLM API is unavailable or slow?",
"consequence": "EHR workflow blocked when LLM API times out; patient care delay",
"mitigation": "3-second timeout with circuit breaker; return empty card array on timeout"
},
{
"pattern": "No model version pinning",
"description": "Production system calls LLM API with 'latest' model or unpinned version",
"detection_question": "How is the model version specified in production API calls?",
"consequence": "Unexpected behavior change when vendor updates model; production incident",
"mitigation": "Pin exact model version; define evaluation and approval process for updates"
},
{
"pattern": "AI output without physician review gate",
"description": "AI-generated clinical content is written to EHR without physician approval",
"detection_question": "What is the workflow from AI generation to note appearing in the EHR?",
"consequence": "AI error enters medical record; potential patient harm; liability",
"mitigation": "Physician review and approval required before DocumentReference write; no auto-filing"
},
{
"pattern": "No prompt version management",
"description": "Prompts are modified directly in production without version control or evaluation",
"detection_question": "How are prompts changed in production? Who approves changes?",
"consequence": "Undetected quality regression; no rollback capability",
"mitigation": "Prompt Registry with versioning; evaluation before production deployment"
},
{
"pattern": "FHIR access with over-broad scopes",
"description": "SMART application requests system-level FHIR scopes instead of patient-level",
"detection_question": "What FHIR scopes does the application request? System/* or patient/*?",
"consequence": "HIPAA Minimum Necessary violation; access to all patients instead of current patient",
"mitigation": "Minimum Necessary scopes: patient/{resource}.read per use case"
},
{
"pattern": "No fallback when AI service is unavailable",
"description": "Clinical workflow has no fallback path when AI system is unavailable",
"detection_question": "What do clinicians do if the AI tool is down? Is there a manual fallback?",
"consequence": "Clinical workflow disruption; workarounds that bypass safety controls",
"mitigation": "Design fallback workflow before launch; never make AI availability a dependency"
},
]Architecture Diagram
Architecture Diagram
[Current state + target state diagram with AI system components added. Mark each connection with its security classification (PHI path / internal / external)]
Common Mistakes
1. Skipping the current state elicitation and jumping to recommendations. FDEs who arrive at the review with a pre-determined architecture recommendation and spend the session defending it are running a sales presentation, not an architecture review. Current state must be accurately mapped first.
2. Letting executives in the review room. Executive presence changes the dynamic — technical staff become less forthcoming about problems and limitations. Architecture reviews are engineering sessions. Executives get the report.
3. Not documenting CONFIRMED vs. ASSUMED. Architecture maps that do not distinguish confirmed facts from assumptions create false confidence. Assumptions that turn out to be wrong in production are architectural failures that were foreseeable.
4. Missing the AI-specific risk patterns. Reviewers who apply only traditional software architecture risk patterns miss the AI-specific risks (no prompt version management, no circuit breaker on LLM dependency, PHI in traces). The pattern library must include AI-specific patterns.
5. Producing a report that is too long to be read. An Architecture Review Report that is 40 pages long will not be read by the IT Director. The report must be scannable — Risk Register table, Action Plan with owners and dates, Architecture Diagram. Supporting detail goes in appendices.
Best Practices
- Prepare a pre-review architecture map with CONFIRMED / ASSUMED tags before the session
- Draw on the whiteboard; let the client correct; never present a complete diagram and ask for agreement
- Run Block 1 (current state) and Block 2 (gap analysis) as separate sessions — do not mix
- Apply the AI-specific risk pattern library systematically
- Produce the Architecture Review Report within 48 hours of the session
- Require blocking issues to be resolved before production go-live — not after
- Schedule the next architecture review at 30 days post-launch
Trade-offs
Depth vs. breadth: A review that covers only the integration path (narrow) misses organizational and governance risks. A review that covers everything (broad) loses focus. The right scope is determined by the highest-risk dimensions identified in the assessment.
Directness vs. client relationship: Surfacing a significant architectural flaw that the client's team designed can create awkwardness. The FDE's credibility is built on directness — but the delivery must be constructive. Frame risks as "here's what we need to solve together" rather than "this is wrong."
Interview Questions
Q: What AI-specific architectural risks do you look for that traditional software architecture reviews miss?
Category: Architecture Difficulty: Principal Role: FDE / AI Architect
Answer Framework:
Traditional software architecture reviews miss several risk patterns that are specific to AI systems:
PHI in observability: LLM requests contain PHI in the prompt payload. Traditional monitoring would log the full request. In healthcare, this creates a HIPAA incident. AI systems need gateway-level PHI scrubbing before logs are written.
Non-determinism and drift: Traditional systems have deterministic behavior. AI systems degrade gradually — model version updates, prompt changes, and data distribution shifts all cause quality drift that is invisible without monitoring. The review must ask: how will you know when the output quality has degraded?
LLM API as a single point of failure in clinical workflows: A synchronous LLM dependency in a clinical workflow (CDS Hook) will block the workflow when the API experiences latency. The circuit breaker pattern is required — but traditional architecture reviews do not ask about it.
Model version as a governance artifact: When an LLM vendor updates a model, the system behavior changes. The architecture must include a model version registry and a re-evaluation process before updating. Traditional software architecture does not have an analog for this.
Key Points to Hit:
- PHI scrubbing at gateway level for observability
- Quality drift monitoring as a production requirement
- Circuit breaker on LLM dependencies in clinical workflows
- Model version governance as an architectural requirement
Red Flags:
- Not distinguishing AI risks from traditional software risks
- Not mentioning PHI/HIPAA considerations
Key Takeaways
- Architecture reviews are the highest-leverage technical activity an FDE performs
- Current state elicitation and gap/risk analysis must be run as separate phases
- The pre-review architecture map should distinguish CONFIRMED from ASSUMED
- Eight AI-specific risk patterns require systematic evaluation in every review
- PHI data flow must be explicitly traced and every control confirmed
- Architecture Review Report should be delivered within 48 hours of the session
- Blocking risks must be resolved before production go-live — not tracked as open items