Enterprise AI Operations — Quick Reference

Last Updated: 2026-06-30 Full Chapters: docs/03-Enterprise-AI/


AI Strategy — Use Case Scoring

Dimension Weight Score 1–5
Clinical / business impact 30% Incremental → Transformational
Technical feasibility 25% Low confidence → High confidence
Data readiness 20% No data → Clean, labeled, available
Regulatory / compliance risk 15% High risk → Low risk
Time to value 10% > 18 months → < 3 months

Priority Tiers: Score ≥ 4.0 → Tier 1 (Strategic); 3.0–3.9 → Tier 2 (Tactical); < 3.0 → Defer


AI Governance — Model Risk Tiers

Tier Description Examples Governance Requirement
1 — Clinical Directly influences patient care Discharge summary AI, drug interaction alerts Model Review Board approval, clinical panel validation, explicit override logging
2 — Administrative Influences clinical operations Prior auth, scheduling, coding assist Department manager approval, automated quality evaluation
3 — Informational No direct care influence Staff policy search, training material summarization Standard software change management

Tier 1 AI must have: Model card, bias evaluation across demographic subgroups, clinical validation study, signed-off training dataset lineage


Production Deployment — Rollout Stages

Stage Traffic Proceed When Abort If
Shadow mode 0% delivered Quality score < 0.90 vs. baseline
Canary 5% 48h, zero critical errors Error rate > 2× baseline
Blue-green 50% 7d stable P95 latency > 2× SLA
Full production 100% 14d stable Any Tier 1 safety event

Rollback trigger: Quality score drops > 15% from 30-day baseline → automated rollback to previous model version


Cost Management — Token Economics

Layer Action Typical Savings
Prompt caching Cache stable system prompt prefix 60–80% cost reduction on cached tokens
Model tier routing Economy for classification, Premium for clinical synthesis 40–60% blended cost reduction
Output length control max_tokens per use case, not global default 15–25% reduction
Batch processing Async batch API where latency allows 25–50% reduction on batch-eligible workloads

Token budget alert trigger: Burn rate > 110% of daily budget for 3 consecutive days → alert to engineering lead


Observability — Key Metrics

Metric Warning Threshold Critical Threshold Owner
Quality score (7d rolling vs. 30d baseline) > 10% drop > 20% drop AI Platform
Override rate (clinical) > 30% > 40% Clinical Informatics
P95 latency vs. SLA > 120% > 150% AI Platform
Hallucination rate (NLI score) > 5% > 10% AI Platform + Governance
Daily cost vs. budget > 110% > 130% AI Platform + Finance
Human review flag rate > 5% > 10% Clinical Informatics

AI Platform — Gateway Virtual Key Checklist

Before issuing a virtual key for a new clinical AI use case:

  • [ ] Application has signed use case intent (department, clinical owner)
  • [ ] Use case classified in model risk tier registry
  • [ ] Allowed model tiers documented (economy / standard / premium)
  • [ ] Monthly token budget approved by department budget owner
  • [ ] Rate limit per minute set (default: 60 RPM for Tier 2, custom for Tier 1)
  • [ ] HIPAA BAA covers this application's data scope
  • [ ] PHI handling review complete (no raw PHI in logs — hashed identifiers only)

Vendor Evaluation — Qualification Gate

Must pass ALL criteria before technical evaluation:

Criterion Requirement
HIPAA BAA Signed BAA available; covers the specific service
PHI used for training Confirmed NOT used for training by default
Data residency Inference in required region (US-only if applicable)
SOC 2 Type II Current certification (within 12 months)
Data retention Confirmed retention policy; PHI not retained for training

After qualification: Evaluate model quality on use-case-specific de-identified test set (minimum 100 cases), latency (P50 and P95), and cost at production scale.


AI Platform — Architecture Components

Component Purpose Build vs. Buy
AI Gateway Auth, rate limit, routing, audit log Buy (LiteLLM) or Build (FastAPI)
Prompt Registry Versioned prompts, governance lifecycle Build (version-controlled YAML + API)
Model Registry Approved models, BAA status, eval results Build (lightweight DB + API)
Embedding Service Shared clinical vector store Build (wrapper) + Buy (vendor model)
Evaluation Pipeline CI/CD for AI quality Build on CI/CD platform
Observability Traces, metrics, dashboards Buy (OpenTelemetry + vendor backend)

Change Management — Adoption Health Indicators

Signal Healthy Investigate
Override rate 5–20% (use case dependent) < 2% (rubber-stamping?) or > 30% (rejection?)
AI literacy completion > 95% before access granted < 80% after first 30 days
Feedback submissions Steady low volume Zero (feedback channel broken?) or spike (quality event?)
Champion engagement Monthly check-in, active No feedback in 4 weeks
Shadow AI tool use None reported Any reported use of non-approved AI for clinical tasks

Change Management — Rollout Phase Gates

Phase Duration Advance When
Champion pilot ≥ 14 days Quality meets baseline; champions satisfied; ≥ 1 integration design issue resolved
Department rollout ≥ 21 days Override rate stable; satisfaction ≥ 3.5/5.0; zero unresolved safety flags
Hospital-wide ≥ 30 days All department metrics met; Model Review Board sign-off

Interview Quick Reference

AI Strategy:

  • Build/Buy/Partner decision turns on: strategic differentiation value, time-to-value, internal ML engineering capacity
  • Use case scoring must include regulatory risk — high-risk clinical AI changes the ROI denominator

AI Governance:

  • Tier 1 AI requires Model Review Board approval, not just engineering sign-off
  • Audit records must use hashed patient identifiers — never raw PHI in AI logs

Production Deployment:

  • Shadow mode first, always — validate before clinicians see output
  • Rollback policy must be defined in advance, not in response to an incident

Cost Management:

  • Prompt caching: cache_control with type="ephemeral" on stable system prompt prefix
  • Model tier routing: classify request complexity first, then route — don't send everything to the premium model

Observability:

  • Quality drift detection: compare 7-day rolling average to 30-day baseline, not to a fixed threshold
  • Override rate is a quality signal — both very high and very low rates are problems

AI Platform:

  • AI gateway is the security boundary; enforce at network layer, not by convention
  • Prompt registry is a governance requirement, not a developer convenience

Vendor Evaluation:

  • BAA signed before PHI can be transmitted — this is a legal prerequisite
  • Model training opt-out must be confirmed in writing, not assumed

Change Management:

  • 3% override rate on a clinical document tool = rubber-stamping risk, not success
  • Alert fatigue is a design failure — lower volume, higher specificity

See Also