Architecture Review Facilitation

Executive Summary

The architecture review is the highest-leverage technical activity an FDE performs — it is where the FDE's combined knowledge of the AI product, the client's environment, and industry patterns produces insights that the client's team cannot generate alone. A well-facilitated architecture review surfaces integration risks before they become production incidents, builds client architectural confidence, and positions the FDE as a trusted technical advisor rather than a vendor representative. This chapter defines the full architecture review lifecycle — preparation, facilitation, risk identification, recommendation framing, and follow-up — with the specific techniques that distinguish a principal-level review from a slide deck walkthrough. The healthcare context receives dedicated treatment, as clinical AI architecture reviews involve regulatory, safety, and clinical workflow dimensions that require specialized facilitation.

Learning Objectives

  • Prepare for an architecture review by building a current-state map from discovery and assessment artifacts
  • Facilitate a structured review session that elicits accurate current-state information without leading the client
  • Identify integration risks using a systematic pattern-matching framework
  • Frame architectural recommendations in terms of risk and consequence, not vendor preference
  • Produce an Architecture Review Report that serves as an actionable engineering document
  • Adapt the review framework for healthcare clients where HIPAA, FHIR, and clinical safety introduce additional review dimensions

Business Problem

Enterprise AI integrations fail predictably at a small set of architectural pressure points: PHI traversing an unsecured path, LLM latency exceeding a CDS Hook timeout, a model version update that breaks a production prompt, or an AI system that cannot degrade gracefully when the LLM API is unavailable. These failures are not edge cases — they are the expected outcomes when architectural design decisions are made without systematic review.

The architecture review is the mechanism for identifying these failure modes before they occur in production. A review that does not surface at least two or three genuine risks has not looked hard enough.

Why Architecture Reviews Matter

Architecture reviews in AI systems have characteristics that differ from traditional software architecture reviews:

Non-determinism: AI system behavior is probabilistic. The review must address not just "does this work?" but "what happens when the AI produces low-quality output?"

Regulatory surface: In healthcare, the architecture review must cover HIPAA compliance, FDA regulatory classification, and clinical safety patterns that do not appear in general-purpose software reviews.

Vendor dependency: AI systems have significant vendor dependencies (LLM API providers, embedding model vendors, vector database vendors). The review must surface the contractual, operational, and technical risks of these dependencies.

Rapid evolution: AI system components evolve faster than traditional software. The review must assess the change management implications of model updates, API changes, and capability changes.

Conceptual Explanation

An architecture review has two phases that must be kept separate:

Phase 1 — Current State Elicitation: The FDE maps the client's actual current-state architecture. This requires listening and questioning, not presenting. The FDE's job in this phase is to build an accurate model of what exists, not to evaluate it.

Phase 2 — Gap and Risk Analysis: The FDE compares the current state against the required state for the target AI deployment. Gaps are the differences. Risks are the consequences of not closing those gaps.

Mixing these phases — critiquing the architecture while still eliciting information — causes the client to become defensive and stop sharing accurate information. The phases must be distinct.

Core Architecture: The Review Framework

Pre-Review Preparation (FDE Working Independently)

Before the review session, the FDE builds a pre-review architecture map from discovery and assessment artifacts:

python
@dataclass
class PreReviewArchitectureMap:
    """
    FDE's working model of the client's architecture before the review session.
    All items should be marked as CONFIRMED (verified) or ASSUMED (to confirm in review).
    """
    
    # EHR Layer
    ehr_system: str                    # "Epic FHIR R4, version 2023.1 — CONFIRMED"
    ehr_integration_standard: str     # "FHIR R4 + HL7 v2 ADT — CONFIRMED"
    smart_on_fhir_status: str         # "Available, App Orchard pending — CONFIRMED"
    cds_hooks_availability: str       # "Available in Epic — ASSUMED: confirm in review"
    
    # Cloud / Network
    cloud_provider: str               # "Azure — CONFIRMED"
    phi_in_cloud_policy: str          # "Approved for HIPAA BAA vendors — CONFIRMED"
    llm_outbound_access: str          # "Permitted — ASSUMED: test not yet run"
    ai_gateway_status: str            # "Not deployed — CONFIRMED"
    network_topology: str             # "Single datacenter + Azure region — ASSUMED"
    
    # Security
    tls_policy: str                   # "TLS 1.2+ required — CONFIRMED"
    audit_logging_capability: str     # "Splunk SIEM — CONFIRMED"
    baa_status: str                   # "Azure BAA signed; Anthropic BAA pending — CONFIRMED"
    security_review_process: str      # "IT Security review + Compliance sign-off — CONFIRMED"
    
    # Organizational
    ai_gateway_owner: str             # "IT Infrastructure team — CONFIRMED"
    model_governance: str             # "No formal process — ASSUMED: confirm in review"
    prompt_management: str            # "Ad hoc — ASSUMED"
    
    # Unknown / To Elicit
    open_questions: list[str]         # Items to resolve in the review session

Review preparation checklist:

text
[ ] Current state architecture map drafted with CONFIRMED / ASSUMED tags
[ ] Discovery Summary and Assessment Report reviewed
[ ] Integration risk patterns identified (from standard pattern library)
[ ] Open questions documented (to resolve in review session)
[ ] Review agenda prepared and shared with client 48h before session
[ ] Whiteboard or diagramming tool ready for live architecture drawing
[ ] Architecture Review Report template prepared (fill during session)

Review Session Structure

Duration: 3–4 hours for a major architecture review. Break at 90-minute intervals.

Participants:

  • FDE (facilitator)
  • Client: IT Architect, IT Director, Clinical Informatics Engineer (for healthcare)
  • Optional: Cloud architect, security architect (for infrastructure-heavy reviews)
  • Not in the room: Executives, sales, non-technical stakeholders (they get the report)

Agenda:

text
BLOCK 1 — Current State Architecture Walk (60–90 min)

Objective: Build an accurate current-state diagram collaboratively.
Method: FDE draws on whiteboard; client corrects and adds.

Opening: "I want to start by making sure I have an accurate picture of your
 current architecture. I've built a working map from our prior conversations —
 let's validate and correct it together."

Technique: Draw the FDE's pre-review map on the whiteboard. Ask the client
to correct it. Incorrect assumptions surface more information than open-ended
questions. "I have it that your integration engine sends ADT^A01 messages 
to Epic — is that right, or does it go the other direction?"

Cover:
  - Data flows for the target use case
  - Authentication and authorization paths
  - Network topology (on-prem / cloud / peering)
  - Security architecture (where PHI flows, what controls exist)
  - Existing integration points relevant to the AI use case
  - Current monitoring and observability

BREAK — 15 minutes

BLOCK 2 — Target State and Gap Identification (60 min)

Objective: Map where the AI system needs to connect and identify the gaps.

Technique: Add the target AI architecture to the current-state diagram.
Draw the connections that need to exist for the AI system to work.
Ask: "What has to be true for this connection to work? Does that condition
currently hold?"

Cover:
  - AI system component placement (SMART app, CDS Hooks service, AI gateway, LLM)
  - Data paths from EHR to AI to LLM and back
  - Authentication path (SMART token, AI gateway virtual key, LLM API key)
  - Security controls on each path
  - Failure modes for each connection

BREAK — 15 minutes

BLOCK 3 — Risk Assessment and Prioritization (45 min)

Objective: Identify, categorize, and prioritize the architectural risks.

Technique: Walk through each gap identified in Block 2. For each gap:
  1. Is it a risk? (Could it cause failure or security incident?)
  2. What is the likelihood? (High / Medium / Low given client's environment)
  3. What is the consequence? (Service unavailability / HIPAA incident / clinical harm)
  4. What is the mitigation?

Cover:
  - PHI data path risks
  - Availability risks (LLM downtime, latency, rate limits)
  - Security risks (authentication gaps, audit log gaps)
  - Governance risks (model updates without re-evaluation)
  - Clinical safety risks (AI failure affecting clinical workflow)

BLOCK 4 — Recommendations and Next Steps (30 min)

Objective: Agree on which risks to mitigate, in what order, and who owns each.

Output: Architecture Review Report action items with owners and dates.

Risk Pattern Library

Experienced FDEs recognize a small set of recurring architectural risk patterns in enterprise AI systems. Building these into a pattern library allows faster and more consistent risk identification:

python
ARCHITECTURE_RISK_PATTERNS = [
    {
        "pattern": "PHI in observability traces",
        "description": "LLM inference requests containing PHI are logged verbatim in observability systems",
        "detection_question": "Where do LLM inference request and response payloads go? Are they logged?",
        "consequence": "HIPAA Security Rule violation; PHI accessible to any engineer with logging access",
        "mitigation": "AI gateway scrubs PHI from all traces; only hashed patient IDs and metadata in logs"
    },
    {
        "pattern": "No AI gateway — direct LLM API calls",
        "description": "Application code calls LLM APIs directly without a centralized gateway",
        "detection_question": "Where do LLM API keys live? Are they per-application or centralized?",
        "consequence": "No cost attribution; no audit logging; no rate limiting; no prompt versioning",
        "mitigation": "Deploy AI gateway (LiteLLM, Azure AI Foundry) before production"
    },
    {
        "pattern": "CDS Hook with blocking LLM dependency",
        "description": "CDS Hook service makes synchronous LLM call without circuit breaker",
        "detection_question": "What happens in your CDS Hook service if the LLM API is unavailable or slow?",
        "consequence": "EHR workflow blocked when LLM API times out; patient care delay",
        "mitigation": "3-second timeout with circuit breaker; return empty card array on timeout"
    },
    {
        "pattern": "No model version pinning",
        "description": "Production system calls LLM API with 'latest' model or unpinned version",
        "detection_question": "How is the model version specified in production API calls?",
        "consequence": "Unexpected behavior change when vendor updates model; production incident",
        "mitigation": "Pin exact model version; define evaluation and approval process for updates"
    },
    {
        "pattern": "AI output without physician review gate",
        "description": "AI-generated clinical content is written to EHR without physician approval",
        "detection_question": "What is the workflow from AI generation to note appearing in the EHR?",
        "consequence": "AI error enters medical record; potential patient harm; liability",
        "mitigation": "Physician review and approval required before DocumentReference write; no auto-filing"
    },
    {
        "pattern": "No prompt version management",
        "description": "Prompts are modified directly in production without version control or evaluation",
        "detection_question": "How are prompts changed in production? Who approves changes?",
        "consequence": "Undetected quality regression; no rollback capability",
        "mitigation": "Prompt Registry with versioning; evaluation before production deployment"
    },
    {
        "pattern": "FHIR access with over-broad scopes",
        "description": "SMART application requests system-level FHIR scopes instead of patient-level",
        "detection_question": "What FHIR scopes does the application request? System/* or patient/*?",
        "consequence": "HIPAA Minimum Necessary violation; access to all patients instead of current patient",
        "mitigation": "Minimum Necessary scopes: patient/{resource}.read per use case"
    },
    {
        "pattern": "No fallback when AI service is unavailable",
        "description": "Clinical workflow has no fallback path when AI system is unavailable",
        "detection_question": "What do clinicians do if the AI tool is down? Is there a manual fallback?",
        "consequence": "Clinical workflow disruption; workarounds that bypass safety controls",
        "mitigation": "Design fallback workflow before launch; never make AI availability a dependency"
    },
]

Architecture Diagram

Implementation Patterns

Architecture Review Report Template

markdown
# Architecture Review Report
**Client:** Reference Healthcare Organization
**Use Case:** Discharge Summary AI
**Review Date:** [Date]
**FDE:** [Name]
**Attendees:** [Roles, not names]

---

## Executive Summary

[3-sentence summary for CIO/CMIO: overall architectural readiness, 
 primary risks identified, blocking items before production]

## Architecture Diagram

[Current state + target state diagram with AI system components added.
 Mark each connection with its security classification (PHI path / internal / external)]

## Risk Register

| Risk ID | Pattern | Likelihood | Consequence | Priority | Owner | Resolution Date |
|---------|---------|-----------|-------------|----------|-------|----------------|
| R-01 | PHI in observability traces | High | HIPAA incident | BLOCKING | IT Security | [Date] |
| R-02 | No AI gateway | High | No audit log; no cost control | BLOCKING | IT Infra | [Date] |
| R-03 | CDS Hook without circuit breaker | Medium | EHR workflow disruption | IMPORTANT | Engineering | [Date] |
| R-04 | No prompt version management | Medium | Quality regression | IMPORTANT | AI Team | [Date] |
| R-05 | FHIR over-broad scopes | Low | HIPAA Minimum Necessary | ADVISORY | Engineering | [Date] |

## Action Plan

### BLOCKING — Must Resolve Before Production

**R-01: PHI in Observability Traces**
Current State: LLM inference payloads logged verbatim in Splunk
Required State: AI gateway scrubs PHI; only hashed patient IDs and metadata logged
Owner: IT Security + AI Engineering team
Target Date: [Date]
Validation: Security architect sign-off on log inspection

**R-02: AI Gateway Not Deployed**
Current State: Application planned to call Anthropic API directly
Required State: LiteLLM Proxy deployed in HIPAA-eligible Azure VNet
Owner: IT Infrastructure
Target Date: [Date]
Validation: Successful inference through gateway; audit log confirmed

### IMPORTANT — Address Before Production Launch (Not Blocking POC)

**R-03: CDS Hook Circuit Breaker**
...

### ADVISORY — Address Within 60 Days of Launch

**R-05: FHIR Scope Review**
...

## Architecture Decisions Confirmed

[List of architectural decisions that were explicitly confirmed as correct in the review,
 to prevent them from being revisited without cause later in the engagement]

## Next Review

[Date of next architecture review — recommended at 30 days post-launch]

Facilitation Techniques for Healthcare Reviews

Getting accurate HIPAA information: Compliance officers often give conservative answers. The FDE should ask operational questions, not compliance questions: "Walk me through what happens when an LLM inference request is made — where does the request log go?" surfaces the actual behavior better than "Do you have HIPAA-compliant logging?"

Clinical workflow accuracy: Ask a clinician to describe their workflow, not IT. IT descriptions of clinical workflows are often idealized; clinicians describe what actually happens.

Surfacing undocumented constraints: "Is there anything about your environment that would make the architecture we've been discussing not work?" Ask this at the end of Block 2. This open-ended question often surfaces constraints that were not mentioned because the client assumed the FDE already knew.

Enterprise Considerations

Multiple-system reviews: Enterprise clients with complex environments may need separate review sessions for different layers — clinical systems (EHR, clinical data platform), cloud infrastructure, security architecture, and governance. Plan accordingly.

Review cadence: Architecture reviews are not one-time events. A review should occur at POC design, before production launch, and at 30 days post-launch. Scheduled reviews are more effective than reactive reviews.

Review as trust investment: A rigorous architecture review that surfaces real risks — including risks that create more work for the FDE — builds more trust than a review that validates the client's existing plan. Clients who have been through a rigorous review understand that the FDE's recommendations are based on technical reality, not sales convenience.

Security Considerations

Architecture reviews for healthcare AI systems must cover:

PHI data flow audit: Every path that PHI traverses must be explicitly traced and every control on that path confirmed. The review must produce a PHI data flow diagram that the client's privacy officer can review.

AI-specific threat model: Traditional security reviews do not cover AI-specific attack surfaces: prompt injection, training data extraction, model inversion. The review should include a brief AI threat model assessment.

Audit log completeness: HIPAA requires audit logs for access to PHI. The review must confirm that audit logging covers every PHI access in the AI integration path, including AI gateway requests.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Architecture Review. Not intended as clinical guidance.

In a Reference Healthcare Organization architecture review for a CDS medication safety integration, the review surfaces Risk R-03 (CDS Hook without circuit breaker) as a blocking issue. The current design makes a synchronous LLM call inside the order-sign CDS Hook with a 30-second timeout. Epic's CDS Hook timeout is 5 seconds.

This is a clinical safety risk: if the LLM API experiences latency (common during peak inference periods), the medication order workflow will be blocked — a physician trying to sign an urgent order will see a spinning loader. The mitigation is a 3-second timeout with a circuit breaker that returns an empty card array, allowing the order to proceed without AI assistance.

This risk would have been discovered in week 3 of production deployment without the architecture review. Discovered in the review, it is addressed in the engineering design before any code is written.

Common Mistakes

1. Skipping the current state elicitation and jumping to recommendations. FDEs who arrive at the review with a pre-determined architecture recommendation and spend the session defending it are running a sales presentation, not an architecture review. Current state must be accurately mapped first.

2. Letting executives in the review room. Executive presence changes the dynamic — technical staff become less forthcoming about problems and limitations. Architecture reviews are engineering sessions. Executives get the report.

3. Not documenting CONFIRMED vs. ASSUMED. Architecture maps that do not distinguish confirmed facts from assumptions create false confidence. Assumptions that turn out to be wrong in production are architectural failures that were foreseeable.

4. Missing the AI-specific risk patterns. Reviewers who apply only traditional software architecture risk patterns miss the AI-specific risks (no prompt version management, no circuit breaker on LLM dependency, PHI in traces). The pattern library must include AI-specific patterns.

5. Producing a report that is too long to be read. An Architecture Review Report that is 40 pages long will not be read by the IT Director. The report must be scannable — Risk Register table, Action Plan with owners and dates, Architecture Diagram. Supporting detail goes in appendices.

Best Practices

  • Prepare a pre-review architecture map with CONFIRMED / ASSUMED tags before the session
  • Draw on the whiteboard; let the client correct; never present a complete diagram and ask for agreement
  • Run Block 1 (current state) and Block 2 (gap analysis) as separate sessions — do not mix
  • Apply the AI-specific risk pattern library systematically
  • Produce the Architecture Review Report within 48 hours of the session
  • Require blocking issues to be resolved before production go-live — not after
  • Schedule the next architecture review at 30 days post-launch

Trade-offs

Depth vs. breadth: A review that covers only the integration path (narrow) misses organizational and governance risks. A review that covers everything (broad) loses focus. The right scope is determined by the highest-risk dimensions identified in the assessment.

Directness vs. client relationship: Surfacing a significant architectural flaw that the client's team designed can create awkwardness. The FDE's credibility is built on directness — but the delivery must be constructive. Frame risks as "here's what we need to solve together" rather than "this is wrong."

Interview Questions

Q: What AI-specific architectural risks do you look for that traditional software architecture reviews miss?

Category: Architecture Difficulty: Principal Role: FDE / AI Architect

Answer Framework:

Traditional software architecture reviews miss several risk patterns that are specific to AI systems:

PHI in observability: LLM requests contain PHI in the prompt payload. Traditional monitoring would log the full request. In healthcare, this creates a HIPAA incident. AI systems need gateway-level PHI scrubbing before logs are written.

Non-determinism and drift: Traditional systems have deterministic behavior. AI systems degrade gradually — model version updates, prompt changes, and data distribution shifts all cause quality drift that is invisible without monitoring. The review must ask: how will you know when the output quality has degraded?

LLM API as a single point of failure in clinical workflows: A synchronous LLM dependency in a clinical workflow (CDS Hook) will block the workflow when the API experiences latency. The circuit breaker pattern is required — but traditional architecture reviews do not ask about it.

Model version as a governance artifact: When an LLM vendor updates a model, the system behavior changes. The architecture must include a model version registry and a re-evaluation process before updating. Traditional software architecture does not have an analog for this.

Key Points to Hit:

  • PHI scrubbing at gateway level for observability
  • Quality drift monitoring as a production requirement
  • Circuit breaker on LLM dependencies in clinical workflows
  • Model version governance as an architectural requirement

Red Flags:

  • Not distinguishing AI risks from traditional software risks
  • Not mentioning PHI/HIPAA considerations

Key Takeaways

  • Architecture reviews are the highest-leverage technical activity an FDE performs
  • Current state elicitation and gap/risk analysis must be run as separate phases
  • The pre-review architecture map should distinguish CONFIRMED from ASSUMED
  • Eight AI-specific risk patterns require systematic evaluation in every review
  • PHI data flow must be explicitly traced and every control confirmed
  • Architecture Review Report should be delivered within 48 hours of the session
  • Blocking risks must be resolved before production go-live — not tracked as open items

Glossary

Circuit Breaker: A design pattern that stops calls to a failing service after a threshold of failures, allowing the calling system to degrade gracefully rather than blocking indefinitely.

HIPAA Minimum Necessary: The HIPAA Privacy Rule requirement that uses and disclosures of PHI be limited to the minimum necessary to accomplish the purpose.

PHI Data Flow Diagram: An architectural artifact that traces every path through which Protected Health Information moves in a system, with the security controls applied at each path.

Prompt Registry: A versioned repository of production prompt templates with evaluation results, approval records, and model compatibility metadata.

Further Reading