Enterprise AI Operations — Quick Reference
Last Updated: 2026-06-30 Full Chapters: docs/03-Enterprise-AI/
AI Strategy — Use Case Scoring
| Dimension | Weight | Score 1–5 |
|---|---|---|
| Clinical / business impact | 30% | Incremental → Transformational |
| Technical feasibility | 25% | Low confidence → High confidence |
| Data readiness | 20% | No data → Clean, labeled, available |
| Regulatory / compliance risk | 15% | High risk → Low risk |
| Time to value | 10% | > 18 months → < 3 months |
Priority Tiers: Score ≥ 4.0 → Tier 1 (Strategic); 3.0–3.9 → Tier 2 (Tactical); < 3.0 → Defer
AI Governance — Model Risk Tiers
| Tier | Description | Examples | Governance Requirement |
|---|---|---|---|
| 1 — Clinical | Directly influences patient care | Discharge summary AI, drug interaction alerts | Model Review Board approval, clinical panel validation, explicit override logging |
| 2 — Administrative | Influences clinical operations | Prior auth, scheduling, coding assist | Department manager approval, automated quality evaluation |
| 3 — Informational | No direct care influence | Staff policy search, training material summarization | Standard software change management |
Tier 1 AI must have: Model card, bias evaluation across demographic subgroups, clinical validation study, signed-off training dataset lineage
Production Deployment — Rollout Stages
| Stage | Traffic | Proceed When | Abort If |
|---|---|---|---|
| Shadow mode | 0% delivered | — | Quality score < 0.90 vs. baseline |
| Canary | 5% | 48h, zero critical errors | Error rate > 2× baseline |
| Blue-green | 50% | 7d stable | P95 latency > 2× SLA |
| Full production | 100% | 14d stable | Any Tier 1 safety event |
Rollback trigger: Quality score drops > 15% from 30-day baseline → automated rollback to previous model version
Cost Management — Token Economics
| Layer | Action | Typical Savings |
|---|---|---|
| Prompt caching | Cache stable system prompt prefix | 60–80% cost reduction on cached tokens |
| Model tier routing | Economy for classification, Premium for clinical synthesis | 40–60% blended cost reduction |
| Output length control | max_tokens per use case, not global default |
15–25% reduction |
| Batch processing | Async batch API where latency allows | 25–50% reduction on batch-eligible workloads |
Token budget alert trigger: Burn rate > 110% of daily budget for 3 consecutive days → alert to engineering lead
Observability — Key Metrics
| Metric | Warning Threshold | Critical Threshold | Owner |
|---|---|---|---|
| Quality score (7d rolling vs. 30d baseline) | > 10% drop | > 20% drop | AI Platform |
| Override rate (clinical) | > 30% | > 40% | Clinical Informatics |
| P95 latency vs. SLA | > 120% | > 150% | AI Platform |
| Hallucination rate (NLI score) | > 5% | > 10% | AI Platform + Governance |
| Daily cost vs. budget | > 110% | > 130% | AI Platform + Finance |
| Human review flag rate | > 5% | > 10% | Clinical Informatics |
AI Platform — Gateway Virtual Key Checklist
Before issuing a virtual key for a new clinical AI use case:
- [ ] Application has signed use case intent (department, clinical owner)
- [ ] Use case classified in model risk tier registry
- [ ] Allowed model tiers documented (economy / standard / premium)
- [ ] Monthly token budget approved by department budget owner
- [ ] Rate limit per minute set (default: 60 RPM for Tier 2, custom for Tier 1)
- [ ] HIPAA BAA covers this application's data scope
- [ ] PHI handling review complete (no raw PHI in logs — hashed identifiers only)
Vendor Evaluation — Qualification Gate
Must pass ALL criteria before technical evaluation:
| Criterion | Requirement |
|---|---|
| HIPAA BAA | Signed BAA available; covers the specific service |
| PHI used for training | Confirmed NOT used for training by default |
| Data residency | Inference in required region (US-only if applicable) |
| SOC 2 Type II | Current certification (within 12 months) |
| Data retention | Confirmed retention policy; PHI not retained for training |
After qualification: Evaluate model quality on use-case-specific de-identified test set (minimum 100 cases), latency (P50 and P95), and cost at production scale.
AI Platform — Architecture Components
| Component | Purpose | Build vs. Buy |
|---|---|---|
| AI Gateway | Auth, rate limit, routing, audit log | Buy (LiteLLM) or Build (FastAPI) |
| Prompt Registry | Versioned prompts, governance lifecycle | Build (version-controlled YAML + API) |
| Model Registry | Approved models, BAA status, eval results | Build (lightweight DB + API) |
| Embedding Service | Shared clinical vector store | Build (wrapper) + Buy (vendor model) |
| Evaluation Pipeline | CI/CD for AI quality | Build on CI/CD platform |
| Observability | Traces, metrics, dashboards | Buy (OpenTelemetry + vendor backend) |
Change Management — Adoption Health Indicators
| Signal | Healthy | Investigate |
|---|---|---|
| Override rate | 5–20% (use case dependent) | < 2% (rubber-stamping?) or > 30% (rejection?) |
| AI literacy completion | > 95% before access granted | < 80% after first 30 days |
| Feedback submissions | Steady low volume | Zero (feedback channel broken?) or spike (quality event?) |
| Champion engagement | Monthly check-in, active | No feedback in 4 weeks |
| Shadow AI tool use | None reported | Any reported use of non-approved AI for clinical tasks |
Change Management — Rollout Phase Gates
| Phase | Duration | Advance When |
|---|---|---|
| Champion pilot | ≥ 14 days | Quality meets baseline; champions satisfied; ≥ 1 integration design issue resolved |
| Department rollout | ≥ 21 days | Override rate stable; satisfaction ≥ 3.5/5.0; zero unresolved safety flags |
| Hospital-wide | ≥ 30 days | All department metrics met; Model Review Board sign-off |
Interview Quick Reference
AI Strategy:
- Build/Buy/Partner decision turns on: strategic differentiation value, time-to-value, internal ML engineering capacity
- Use case scoring must include regulatory risk — high-risk clinical AI changes the ROI denominator
AI Governance:
- Tier 1 AI requires Model Review Board approval, not just engineering sign-off
- Audit records must use hashed patient identifiers — never raw PHI in AI logs
Production Deployment:
- Shadow mode first, always — validate before clinicians see output
- Rollback policy must be defined in advance, not in response to an incident
Cost Management:
- Prompt caching: cache_control with type="ephemeral" on stable system prompt prefix
- Model tier routing: classify request complexity first, then route — don't send everything to the premium model
Observability:
- Quality drift detection: compare 7-day rolling average to 30-day baseline, not to a fixed threshold
- Override rate is a quality signal — both very high and very low rates are problems
AI Platform:
- AI gateway is the security boundary; enforce at network layer, not by convention
- Prompt registry is a governance requirement, not a developer convenience
Vendor Evaluation:
- BAA signed before PHI can be transmitted — this is a legal prerequisite
- Model training opt-out must be confirmed in writing, not assumed
Change Management:
- 3% override rate on a clinical document tool = rubber-stamping risk, not success
- Alert fatigue is a design failure — lower volume, higher specificity