Architecture Review Facilitation

Executive Summary

The architecture review is the highest-leverage technical activity an FDE performs — it is where the FDE's combined knowledge of the AI product, the client's environment, and industry patterns produces insights that the client's team cannot generate alone. A well-facilitated architecture review surfaces integration risks before they become production incidents, builds client architectural confidence, and positions the FDE as a trusted technical advisor rather than a vendor representative. This chapter defines the full architecture review lifecycle — preparation, facilitation, risk identification, recommendation framing, and follow-up — with the specific techniques that distinguish a principal-level review from a slide deck walkthrough. The healthcare context receives dedicated treatment, as clinical AI architecture reviews involve regulatory, safety, and clinical workflow dimensions that require specialized facilitation.

Learning Objectives

  • Prepare for an architecture review by building a current-state map from discovery and assessment artifacts
  • Facilitate a structured review session that elicits accurate current-state information without leading the client
  • Identify integration risks using a systematic pattern-matching framework
  • Frame architectural recommendations in terms of risk and consequence, not vendor preference
  • Produce an Architecture Review Report that serves as an actionable engineering document
  • Adapt the review framework for healthcare clients where HIPAA, FHIR, and clinical safety introduce additional review dimensions

Business Problem

Enterprise AI integrations fail predictably at a small set of architectural pressure points: PHI traversing an unsecured path, LLM latency exceeding a CDS Hook timeout, a model version update that breaks a production prompt, or an AI system that cannot degrade gracefully when the LLM API is unavailable. These failures are not edge cases — they are the expected outcomes when architectural design decisions are made without systematic review.

The architecture review is the mechanism for identifying these failure modes before they occur in production. A review that does not surface at least two or three genuine risks has not looked hard enough.

Conceptual Explanation

An architecture review has two phases that must be kept separate:

Phase 1 — Current State Elicitation: The FDE maps the client's actual current-state architecture. This requires listening and questioning, not presenting. The FDE's job in this phase is to build an accurate model of what exists, not to evaluate it.

Phase 2 — Gap and Risk Analysis: The FDE compares the current state against the required state for the target AI deployment. Gaps are the differences. Risks are the consequences of not closing those gaps.

Mixing these phases — critiquing the architecture while still eliciting information — causes the client to become defensive and stop sharing accurate information. The phases must be distinct.

Core Architecture: The Review Framework

Pre-Review Preparation (FDE Working Independently)

Before the review session, the FDE builds a pre-review architecture map from discovery and assessment artifacts:

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

Review preparation checklist:

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

Review Session Structure

Duration: 3–4 hours for a major architecture review. Break at 90-minute intervals.

Participants:

  • FDE (facilitator)
  • Client: IT Architect, IT Director, Clinical Informatics Engineer (for healthcare)
  • Optional: Cloud architect, security architect (for infrastructure-heavy reviews)
  • Not in the room: Executives, sales, non-technical stakeholders (they get the report)

Agenda:

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

Risk Pattern Library

Experienced FDEs recognize a small set of recurring architectural risk patterns in enterprise AI systems. Building these into a pattern library allows faster and more consistent risk identification:

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

Architecture Diagram

Executive Summary

[3-sentence summary for CIO/CMIO: overall architectural readiness, primary risks identified, blocking items before production]

Architecture Diagram

[Current state + target state diagram with AI system components added. Mark each connection with its security classification (PHI path / internal / external)]

Enterprise Considerations

Multiple-system reviews: Enterprise clients with complex environments may need separate review sessions for different layers — clinical systems (EHR, clinical data platform), cloud infrastructure, security architecture, and governance. Plan accordingly.

Review cadence: Architecture reviews are not one-time events. A review should occur at POC design, before production launch, and at 30 days post-launch. Scheduled reviews are more effective than reactive reviews.

Review as trust investment: A rigorous architecture review that surfaces real risks — including risks that create more work for the FDE — builds more trust than a review that validates the client's existing plan. Clients who have been through a rigorous review understand that the FDE's recommendations are based on technical reality, not sales convenience.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Architecture Review. Not intended as clinical guidance.

In a Reference Healthcare Organization architecture review for a CDS medication safety integration, the review surfaces Risk R-03 (CDS Hook without circuit breaker) as a blocking issue. The current design makes a synchronous LLM call inside the order-sign CDS Hook with a 30-second timeout. Epic's CDS Hook timeout is 5 seconds.

This is a clinical safety risk: if the LLM API experiences latency (common during peak inference periods), the medication order workflow will be blocked — a physician trying to sign an urgent order will see a spinning loader. The mitigation is a 3-second timeout with a circuit breaker that returns an empty card array, allowing the order to proceed without AI assistance.

This risk would have been discovered in week 3 of production deployment without the architecture review. Discovered in the review, it is addressed in the engineering design before any code is written.

Common Mistakes

1. Skipping the current state elicitation and jumping to recommendations. FDEs who arrive at the review with a pre-determined architecture recommendation and spend the session defending it are running a sales presentation, not an architecture review. Current state must be accurately mapped first.

2. Letting executives in the review room. Executive presence changes the dynamic — technical staff become less forthcoming about problems and limitations. Architecture reviews are engineering sessions. Executives get the report.

3. Not documenting CONFIRMED vs. ASSUMED. Architecture maps that do not distinguish confirmed facts from assumptions create false confidence. Assumptions that turn out to be wrong in production are architectural failures that were foreseeable.

4. Missing the AI-specific risk patterns. Reviewers who apply only traditional software architecture risk patterns miss the AI-specific risks (no prompt version management, no circuit breaker on LLM dependency, PHI in traces). The pattern library must include AI-specific patterns.

5. Producing a report that is too long to be read. An Architecture Review Report that is 40 pages long will not be read by the IT Director. The report must be scannable — Risk Register table, Action Plan with owners and dates, Architecture Diagram. Supporting detail goes in appendices.

Best Practices

  • Prepare a pre-review architecture map with CONFIRMED / ASSUMED tags before the session
  • Draw on the whiteboard; let the client correct; never present a complete diagram and ask for agreement
  • Run Block 1 (current state) and Block 2 (gap analysis) as separate sessions — do not mix
  • Apply the AI-specific risk pattern library systematically
  • Produce the Architecture Review Report within 48 hours of the session
  • Require blocking issues to be resolved before production go-live — not after
  • Schedule the next architecture review at 30 days post-launch

Trade-offs

Depth vs. breadth: A review that covers only the integration path (narrow) misses organizational and governance risks. A review that covers everything (broad) loses focus. The right scope is determined by the highest-risk dimensions identified in the assessment.

Directness vs. client relationship: Surfacing a significant architectural flaw that the client's team designed can create awkwardness. The FDE's credibility is built on directness — but the delivery must be constructive. Frame risks as "here's what we need to solve together" rather than "this is wrong."

Interview Questions

Q: What AI-specific architectural risks do you look for that traditional software architecture reviews miss?

Category: Architecture Difficulty: Principal Role: FDE / AI Architect

Answer Framework:

Traditional software architecture reviews miss several risk patterns that are specific to AI systems:

PHI in observability: LLM requests contain PHI in the prompt payload. Traditional monitoring would log the full request. In healthcare, this creates a HIPAA incident. AI systems need gateway-level PHI scrubbing before logs are written.

Non-determinism and drift: Traditional systems have deterministic behavior. AI systems degrade gradually — model version updates, prompt changes, and data distribution shifts all cause quality drift that is invisible without monitoring. The review must ask: how will you know when the output quality has degraded?

LLM API as a single point of failure in clinical workflows: A synchronous LLM dependency in a clinical workflow (CDS Hook) will block the workflow when the API experiences latency. The circuit breaker pattern is required — but traditional architecture reviews do not ask about it.

Model version as a governance artifact: When an LLM vendor updates a model, the system behavior changes. The architecture must include a model version registry and a re-evaluation process before updating. Traditional software architecture does not have an analog for this.

Key Points to Hit:

  • PHI scrubbing at gateway level for observability
  • Quality drift monitoring as a production requirement
  • Circuit breaker on LLM dependencies in clinical workflows
  • Model version governance as an architectural requirement

Red Flags:

  • Not distinguishing AI risks from traditional software risks
  • Not mentioning PHI/HIPAA considerations

Key Takeaways

  • Architecture reviews are the highest-leverage technical activity an FDE performs
  • Current state elicitation and gap/risk analysis must be run as separate phases
  • The pre-review architecture map should distinguish CONFIRMED from ASSUMED
  • Eight AI-specific risk patterns require systematic evaluation in every review
  • PHI data flow must be explicitly traced and every control confirmed
  • Architecture Review Report should be delivered within 48 hours of the session
  • Blocking risks must be resolved before production go-live — not tracked as open items

Further Reading