Demo Engineering

Executive Summary

A technical demo is not a product walkthrough — it is an engineered artifact designed to produce a specific audience response under controlled conditions. Demo engineering is the discipline of designing, building, and operating live technical demonstrations that convey architectural credibility, surface product capability at enterprise scale, and survive the inevitable live failure mode. This chapter covers the full demo engineering lifecycle: demo architecture design, environment isolation, data strategy, failure handling, and the specific considerations for healthcare AI demos where regulatory sensitivity requires careful content design. FDEs who treat demos as ad hoc screen shares consistently underperform FDEs who engineer their demos with the same rigor they apply to production systems.

Learning Objectives

  • Design a demo architecture that is isolated, repeatable, and failure-resistant
  • Select appropriate data strategies for different demo contexts (synthetic, de-identified, sandboxed live)
  • Structure a demo script that leads with the problem, not the product
  • Build failure handling into the demo design rather than hoping failure does not occur
  • Apply healthcare-specific constraints to demo content including PHI avoidance and medical disclaimers

Business Problem

Enterprise AI demos fail in three ways: they work technically but do not convince the audience; they work technically but the audience cannot see the relevance to their specific problem; or they fail live in ways that erode trust. The first two failures are design problems. The third is an engineering problem.

The business consequence of a failed demo in an enterprise AI engagement is disproportionate: a live failure in front of a client's executive team can set back an engagement by months, even if the failure was non-representative of the product's production reliability. Stakeholders who observe a failure during a demo form a strong and durable negative impression that factual correction rarely overcomes.

Demo engineering treats demo reliability as a first-class engineering constraint — not as an afterthought.

Why This Discipline Exists

The sales engineer tradition treats demos as product-centric walkthroughs: here is the product, here are its features. This works when the product is simple enough that feature demonstration produces conviction.

Enterprise AI products are not simple. A clinical documentation AI demo that shows a generic patient encounter and produces a generic discharge summary does not answer the audience's actual question: "Will this work with our Epic FHIR data, our clinical workflow, our security architecture, and our physicians' documentation habits?"

Demo engineering addresses this by making demos specific to the client's environment and problem. The most effective demo in an enterprise AI engagement shows the product working on the client's use case, with context that matches the client's environment, producing output that is recognizable as useful by the clinician in the room.

Conceptual Explanation

A demo has three layers that must be engineered independently:

Layer 1 — The Environment: Where does the demo run? Is it isolated from production? Is it reproducible? What happens if the primary environment fails?

Layer 2 — The Data: What data does the demo process? Synthetic? De-identified client data? Sandbox data from the client's EHR? The data choice determines how convincing the demo is and what regulatory constraints apply.

Layer 3 — The Narrative: What story does the demo tell? Demos that start with features fail. Demos that start with the client's specific problem and show how the product solves it succeed.

All three layers must be designed together. A brilliant narrative with unreliable infrastructure produces a failed demo. A reliable infrastructure with generic data produces a forgettable demo.

Core Architecture: Demo Infrastructure

Environment Design Principles

Isolation: The demo environment must be completely isolated from any client data, production systems, or shared infrastructure that could introduce failures. A demo that depends on a production LLM endpoint that is experiencing latency will fail at exactly the moment the audience's attention is highest.

Reproducibility: Every component of the demo must be reproducible on demand. Scripts, not clicks. Infrastructure as code, not manual setup. A demo that requires 45 minutes of manual setup before each session is not demo-ready.

Fallback tiers: Every demo must have at least two fallback tiers:

  • Tier 1: Live demo against the primary environment
  • Tier 2: Live demo against a local backup environment (no internet dependency)
  • Tier 3: Recorded walkthrough (never the first choice, but always available)
python
# Demo environment configuration pattern
# All values are environment variables — never hardcoded in demo scripts

import os
from dataclasses import dataclass
from enum import Enum

class DemoEnvironment(Enum):
    PRIMARY = "primary"     # Cloud environment with optimal performance
    LOCAL_BACKUP = "local"  # Local container fallback — no internet dependency
    RECORDED = "recorded"   # Pre-recorded walkthrough — last resort

@dataclass
class DemoConfig:
    environment: DemoEnvironment
    llm_endpoint: str        # Primary: cloud API; Local: Ollama or LM Studio
    fhir_endpoint: str       # Synthetic FHIR server — never production
    demo_data_path: str      # Pre-loaded synthetic patient encounters
    model_id: str            # Specific model version — pinned, not "latest"
    max_response_timeout: float  # 8.0 seconds — if LLM exceeds this, show fallback
    
    @classmethod
    def from_environment(cls) -> "DemoConfig":
        env = DemoEnvironment(os.getenv("DEMO_ENV", "primary"))
        
        if env == DemoEnvironment.PRIMARY:
            return cls(
                environment=env,
                llm_endpoint=os.getenv("LLM_API_ENDPOINT"),
                fhir_endpoint=os.getenv("FHIR_DEMO_ENDPOINT"),
                demo_data_path=os.getenv("DEMO_DATA_PATH", "./demo-data"),
                model_id=os.getenv("DEMO_MODEL_ID"),  # Pin exact version
                max_response_timeout=8.0
            )
        elif env == DemoEnvironment.LOCAL_BACKUP:
            return cls(
                environment=env,
                llm_endpoint="http://localhost:11434",  # Ollama
                fhir_endpoint="http://localhost:8080",  # Local HAPI FHIR
                demo_data_path="./demo-data",
                model_id="llama3",  # Local model
                max_response_timeout=30.0  # Local inference is slower
            )

Pinned versions: Every demo dependency — LLM model version, API version, library version — must be pinned. "Latest" in a demo is an undefined behavior contract. A model update that changes output style overnight can render a carefully scripted demo incoherent.

Data Strategy

The data strategy for a demo determines its credibility and its regulatory risk:

Data Type Credibility Regulatory Risk Best For
Synthetic (generated) Low — obviously fake None General product demos, early evaluation
De-identified client data High — client recognizes their patterns Low (if properly de-identified) Client-specific demos post-assessment
Sandbox EHR data High — authentic data structure Low (sandbox = not real PHI) Technical integration demos
Production data Maximum High (HIPAA risk — avoid) Never in demo contexts

For healthcare demos, the rule is absolute: no real PHI in demo environments. Synthetic patient data generators (Synthea, custom scripts) can produce realistic clinical encounters that demonstrate the AI's capability without creating HIPAA risk.

python
# Synthetic patient encounter builder for HMS demo
# Educational Example — Not intended for clinical use

from typing import Optional
import json
from datetime import date, timedelta
import random

def build_synthetic_discharge_encounter(
    primary_diagnosis: str = "Community-acquired pneumonia",
    age_range: tuple = (65, 80),
    los_days_range: tuple = (3, 7),
    comorbidities: Optional[list[str]] = None,
) -> dict:
    """
    Build a synthetic inpatient encounter for demo purposes.
    All values are synthetic — no real patient data.
    """
    if comorbidities is None:
        comorbidities = ["Type 2 diabetes mellitus", "Hypertension", "Chronic kidney disease, Stage 3"]
    
    admit_date = date.today() - timedelta(days=random.randint(*los_days_range))
    discharge_date = date.today()
    
    return {
        "patient": {
            "id": "DEMO-PT-001",  # Clearly synthetic ID
            "name": "Demo Patient",  # Generic name — never a real name
            "age": random.randint(*age_range),
            "gender": "Male",
            "mrn": "DEMO-00001"  # Clearly demo MRN
        },
        "encounter": {
            "id": "DEMO-ENC-001",
            "type": "Inpatient",
            "admit_date": admit_date.isoformat(),
            "discharge_date": discharge_date.isoformat(),
            "los_days": (discharge_date - admit_date).days,
            "attending_physician": "Demo Attending, MD"
        },
        "diagnoses": [
            {"code": "J18.9", "description": primary_diagnosis, "type": "Primary"},
            *[{"code": "DEMO", "description": c, "type": "Secondary"} for c in comorbidities]
        ],
        "medications": [
            {"name": "Azithromycin", "dose": "500mg", "route": "PO", "frequency": "Daily"},
            {"name": "Ceftriaxone", "dose": "1g", "route": "IV", "frequency": "Q24H"},
            {"name": "Metformin", "dose": "1000mg", "route": "PO", "frequency": "BID"},
            {"name": "Lisinopril", "dose": "10mg", "route": "PO", "frequency": "Daily"},
        ],
        "vitals_summary": {
            "admit": {"temp": 38.8, "hr": 102, "sbp": 128, "dbp": 78, "o2_sat": 91, "rr": 22},
            "current": {"temp": 37.1, "hr": 82, "sbp": 124, "dbp": 76, "o2_sat": 97, "rr": 16}
        },
        "labs_summary": {
            "wbc_admit": 14.2,
            "wbc_current": 9.1,
            "creatinine_admit": 1.6,
            "creatinine_current": 1.4,
            "cxr_finding": "Right lower lobe infiltrate, improving compared to admission"
        },
        "disclaimer": "SYNTHETIC DEMO DATA — NOT REAL PATIENT INFORMATION"
    }

Demo Script Engineering

A demo script is not a spoken script — it is a structured guide that defines:

  • The problem statement opening
  • The transition from problem to product
  • The specific actions performed in the product (keystrokes, clicks, inputs)
  • The expected outputs and how to narrate them
  • The audience questions anticipated at each stage
  • The fallback response if the expected output does not appear

Demo script structure for HMS discharge summary demo:

markdown
# Demo Script: Discharge Summary AI
# Audience: CMIO + Physician Champion + IT Director
# Duration: 20 minutes + Q&A
# Environment: Demo environment, synthetic patient data

## OPENING — The Problem (3 minutes)

[Don't touch the product yet]

"Thank you for having us. Before I show you anything technical, I want to
 make sure we're solving the right problem together.

 You mentioned in our discovery session that your hospitalists are spending
 20–30 minutes per discharge summary, which works out to [X] hours per day
 across your hospitalist group. And that the quality is inconsistent —
 some sections are complete, others are thin.

 Let me show you what an AI-assisted version of this workflow looks like —
 running against a synthetic patient who looks a lot like the pneumonia
 admissions you see every day."

## TRANSITION — Context Setting (2 minutes)

"This is the discharge summary tool, which would sit inside Epic as a
 SMART on FHIR application. The physician opens it from within the
 patient's chart — no context switching, no copy-paste.

 [Open the demo application]

 The tool has already pulled the patient's active diagnoses, medications,
 vitals trend, and key lab results from FHIR. The physician sees the
 patient context on the left — identical to what they'd see in the chart.
 On the right, they can generate an AI draft."

## CORE DEMO — Generate the Draft (5 minutes)

[Click: Generate Draft]
[Note: Expected response time 3–5 seconds; if > 8 seconds, narrate: 
 "The generation typically completes in 3–5 seconds — let's wait a moment"]

"The AI is pulling together the clinical narrative from the structured data.
 What it produces is a complete draft covering the [list sections visible].

 [Point to specific sections as they render]

 Notice that it's generated an accurate summary of the clinical course —
 the initial presentation, the treatment, the response, the current status.
 It's also included the discharge medications, which it pulled directly
 from the medication list, with the correct doses.

 Now, this is a draft. The physician owns this document. They can edit
 any section directly."

## AUDIENCE ENGAGEMENT — Let Them Drive (5 minutes)

"[Physician champion's name], would you be willing to look at this summary
 and tell me what you'd change? What's accurate, what's missing?"

[Let the physician review and comment — this is the highest-value
 part of the demo. Their engagement produces internal advocacy.]

## TECHNICAL DEPTH — For IT Director (3 minutes)

"From an integration standpoint — this runs over FHIR R4, authenticated
 via SMART on FHIR, which you already have available in your Epic instance.
 The AI inference goes through an AI gateway that provides audit logging,
 so every generation is logged with the encounter ID and the prompt version
 used — no raw PHI in the logs.

 The output is written back to Epic as a DocumentReference — it becomes
 a draft in the physician's in-basket, not a signed note."

## CLOSE — Next Step (2 minutes)

"What we're proposing as a next step is a 4-week POC where we run this
 against 50 real encounters from your Epic environment — with a clear
 success criterion: physician edit rate below 30% and section completeness
 above 95%, evaluated by your hospitalist champion.

 What questions do you have before we discuss what that would look like?"

Architecture Diagram

Implementation Patterns

Failure Handling During Live Demo

Every demo must have explicit handling for the three most common failure modes:

Failure Mode 1 — LLM latency / timeout:

python
import asyncio
from contextlib import asynccontextmanager

async def generate_with_demo_fallback(
    prompt: str,
    llm_client,
    fallback_output_path: str,
    timeout_seconds: float = 8.0
) -> tuple[str, bool]:
    """
    Generate LLM output with automatic fallback to pre-generated response.
    
    Returns (output_text, used_fallback).
    Educational example — not for production clinical use.
    """
    try:
        response = await asyncio.wait_for(
            llm_client.generate(prompt),
            timeout=timeout_seconds
        )
        return response.text, False
    except asyncio.TimeoutError:
        # Load pre-generated fallback — prepared from previous successful run
        with open(fallback_output_path, "r") as f:
            fallback_text = f.read()
        return fallback_text, True
    except Exception:
        with open(fallback_output_path, "r") as f:
            fallback_text = f.read()
        return fallback_text, True

Demo narration for fallback activation: "Our demo environment is occasionally a bit slow when multiple demos are running — let me show you a generation I captured earlier, which is representative of typical output."

Failure Mode 2 — FHIR data not loading: Pre-load all FHIR data into demo application state at startup. The demo never makes live FHIR calls during the presentation — all FHIR data is pre-fetched and cached. If the FHIR server is unavailable, the demo runs on cached data with no visible impact.

Failure Mode 3 — Audience asks for a scenario not in the demo: Prepare 3–5 pre-generated scenarios (different primary diagnoses, different patient demographics) so that "Can you show me a cardiac patient?" can be answered by switching to a pre-built scenario.

Healthcare Considerations

Healthcare demos require specific content design beyond standard enterprise AI demos:

Medical disclaimer placement: Every healthcare demo should open with a verbal and visible disclaimer: "This is an educational demonstration using synthetic patient data. All patient information shown is fictional. This tool, in production, is designed to assist physicians — all output requires physician review and approval before entering the medical record."

Clinical realism: Healthcare demos that show clinically implausible scenarios lose the physician audience immediately. The synthetic patient data must be medically coherent — correct drug classes for the diagnoses shown, realistic lab values, appropriate clinical course. A hospitalist who sees a pneumonia patient prescribed a cardiac medication loses confidence in the AI's output quality.

Physician language: Clinical terminology must be correct. "Comorbidity" not "co-condition." "Attending physician" not "doctor in charge." "Discharge medications" not "pills to take home." Incorrect clinical language signals to the physician audience that the product was not built by people who understand clinical workflows.

The "will this hurt patients?" question: Every healthcare demo must be prepared for this question. The answer is always framed around physician-in-the-loop: "The AI produces a draft that the physician reviews, edits, and signs. The physician's signature attests to the accuracy and completeness of the document. The AI does not make clinical decisions."

Common Mistakes

1. Using "latest" as the model version. A model update the night before the demo can change output style, introduce unexpected formatting, or produce unexpected content. Pin the exact model version used during rehearsal.

2. Depending on live internet during the demo. Hotel WiFi, conference center networks, and even enterprise office networks are unreliable during demos. The local backup environment must be ready and pre-tested.

3. Starting with the product. FDEs who open the demo application as their first action lose the audience who needed to understand the problem first. The problem statement is the opening.

4. Not running rehearsal on the actual demo network. Rehearsals on the FDE's home network do not reveal corporate proxy issues, DNS resolution failures, or port blocks that appear in the client's environment.

5. Using obviously fake data that breaks clinical realism. "Patient Name: Test User, Diagnosis: Test Diagnosis" destroys the demo's credibility. Synthetic data must look realistic.

6. Not preparing for the "Can you show me X?" question. Having only one pre-built scenario means that any question outside the scripted path produces an "I'll show you that later" deflection — which signals lack of depth.

Best Practices

  • Pin every version in the demo environment — model, library, API
  • Always have a local backup environment ready and pre-tested on the day
  • Open with the client's specific problem, not the product
  • Synthetic patient data must be clinically coherent — have a clinician review it
  • Prepare 3–5 pre-built scenarios for different clinical contexts
  • Run a full rehearsal on the client's network (or representative network) the day before
  • Have pre-generated fallback outputs for every generation step
  • Never show real PHI in a demo — not even de-identified data that might be traceable
  • End every demo with a specific next step — not "let us know if you have questions"

Alternatives

Demo Approach When to Use Trade-off
Live interactive demo (primary) When audience includes technical stakeholders; when client-specific data is available Highest credibility; failure risk
Recorded walkthrough When live demo is too risky (key executive meeting, spotty network) No failure risk; lowest credibility
Client-data sandbox demo When assessment is complete and sandbox access is available Maximum relevance; setup complexity
Collaborative build session When technical audience wants to see the engineering Deepest credibility; time-intensive

Trade-offs

Realism vs. risk: The more realistic the demo (real client data structures, authentic clinical scenarios), the higher the audience engagement — but also the higher the setup complexity and the failure risk. FDEs must calibrate the realism level to the stakes of the meeting.

Depth vs. breadth: A 20-minute demo that goes deep on one use case is more credible than a 20-minute demo that skims five features. Healthcare FDEs should default to depth on the specific use case that matches the client's primary pain point.

Interview Questions

Q: How do you design a demo environment to be reliable for live client presentations?

Category: System Design Difficulty: Senior Role: FDE

Answer Framework:

Demo reliability is an engineering problem, not a hope. The design principles are isolation, reproducibility, and fallback tiers.

Isolation means the demo environment has no dependencies on production systems, shared infrastructure, or the client's network that the FDE does not control. All FHIR data is pre-fetched and cached. The LLM model version is pinned. All API credentials are demo-specific.

Reproducibility means everything that runs in the demo was created by a script, not manually. The environment can be torn down and rebuilt in under 30 minutes if something goes wrong.

Fallback tiers mean there are always at least two alternatives if the primary environment fails. A local container environment (Ollama for LLM inference, HAPI FHIR for data) can run the same demo without internet. Pre-recorded walkthroughs exist for catastrophic failure scenarios.

Additionally, all LLM generation steps have timeout handling that gracefully falls back to a pre-generated response. The narration for fallback activation is scripted in advance.

Key Points to Hit:

  • Isolation from production/shared infrastructure
  • Pinned versions for all dependencies
  • Local backup environment always ready
  • Pre-generated fallbacks for all generation steps
  • Rehearsal on client-representative network

Red Flags:

  • "We rely on the live API — it's usually fast enough"
  • Not having a local backup environment

Key Takeaways

  • A demo is an engineered artifact, not a screen share — treat demo reliability as a first-class constraint
  • Three layers must be engineered independently: environment, data, and narrative
  • Pin every version — model, library, API — nothing "latest"
  • Always have a local backup environment ready and pre-tested on demo day
  • Open with the client's problem, not the product
  • Synthetic data must be clinically coherent to retain healthcare audience credibility
  • All healthcare demos require a medical disclaimer — verbal and visible
  • Pre-generate fallback outputs for every LLM generation step

Glossary

Synthea: An open-source synthetic patient data generator that produces realistic clinical records following standard medical coding conventions. Commonly used for healthcare AI demo data.

HAPI FHIR: An open-source Java-based FHIR server that can be run locally for development and demo purposes.

Ollama: An open-source local LLM runtime that can serve language models on consumer hardware without internet connectivity — suitable for demo fallback environments.

SMART on FHIR: A standard for building applications that launch from within EHR systems with patient context automatically provided. The integration pattern used in clinical AI demos.

Further Reading