AI Platform Architecture
Executive Summary
An internal AI platform is the shared infrastructure that allows an enterprise to build, deploy, govern, and operate multiple AI use cases without rebuilding foundational capabilities for each one. The difference between an organization with a well-designed AI platform and one without is observable at the use case level: with a platform, the second clinical AI system takes 40% of the time the first one did; without a platform, every system is built from scratch with duplicated infrastructure, inconsistent governance, and unmeasurable costs. This chapter covers the architecture of an enterprise clinical AI platform โ the AI gateway, prompt registry, model registry, embedding service, evaluation pipeline, and observability stack โ and the design decisions that determine whether the platform enables the AI program or constrains it.
Learning Objectives
After reading this chapter, you will be able to:
- Design an internal AI platform architecture appropriate for a healthcare organization deploying multiple clinical AI use cases
- Identify the shared services that provide the highest leverage when extracted from individual use cases into platform infrastructure
- Explain the role of the AI gateway as the central control plane for security, cost, and governance
- Design a prompt registry that supports version control, A/B testing, and rollback for production prompts
- Evaluate the build-versus-buy decision for each AI platform component
Business Problem
Organizations that deploy their first AI use case successfully often encounter a hidden problem when building the second: the first use case was built with hardcoded LLM API calls, manually managed prompts, no shared authentication, no cost attribution, and no reusable observability. Building the second use case requires starting over. By the third use case, security reviews are rejecting deployments because each team has implemented its own API key management, the CFO cannot attribute AI costs to specific departments, and the governance committee cannot identify which model versions are running in production.
This is the "AI sprawl" problem that dedicated AI platform infrastructure solves. It is not a technology problem โ the first use case worked correctly. It is an organizational scalability problem: the practices that produce a working single use case do not compose into a governed, auditable, cost-attributed multi-use-case AI program.
In healthcare, AI sprawl carries additional risk. Multiple teams accessing clinical data through independently managed LLM integrations creates HIPAA surface area that no individual team is responsible for auditing. The AI platform is the architectural boundary that concentrates security, governance, and compliance controls into a single, auditable layer.
Why This Technology Exists
Enterprise AI platforms emerged from the same pattern that produced API gateways, service meshes, and data platforms: when multiple applications need the same cross-cutting capability (security, routing, logging, billing), building it once as shared infrastructure is more efficient and more reliable than building it N times in N applications.
The specific impetus for internal AI platforms was the recognition that LLM API management has cross-cutting requirements that exceed what individual application teams should own: vendor key management at the organization level (not the team level), cost attribution by department and use case, prompt versioning and governance, evaluation pipeline infrastructure, and observability that spans all AI use cases. These requirements are fundamentally organizational, not application-level, and they require centralized infrastructure.
Conceptual Explanation
An AI platform is not a monolith. It is a set of shared services, each providing a specific capability, that individual AI applications use through well-defined interfaces. The key insight is that the platform does not own the AI use cases โ the product teams do. The platform provides the infrastructure that makes each use case more secure, more observable, and cheaper to build and operate.
The platform's responsibilities cluster around four concerns:
Control Plane: Governance of AI access. Who can call which model? With which prompts? Enforced at the AI gateway.
Data Plane: The AI requests and responses themselves. The platform handles routing, retry, fallback, and load balancing without the application team managing it.
Management Plane: Configuration, versioning, and lifecycle management of prompts, models, and evaluation datasets.
Observability Plane: Unified tracing, quality metrics, cost attribution, and alerting across all AI use cases.
Core Architecture
Enterprise Considerations
Platform Governance Model: The AI platform team provides infrastructure; product teams own their AI use cases. This separation of concerns requires clear interface contracts: what does the platform guarantee (availability, latency, security controls), and what does the product team own (business logic, prompt quality, clinical validation)? Blurring this boundary produces either a platform that tries to govern use case content (overreach) or use cases that bypass the platform (underreach).
Platform Adoption: An AI platform is only valuable if product teams use it. Forced adoption through policy is less effective than making the platform genuinely easier to use than the alternative. The platform team's primary success metric should be: how long does it take a new clinical AI use case to go from approved business case to production deployment? If the platform reduces this from 6 months to 6 weeks, adoption is self-reinforcing.
Multi-Vendor Strategy: The AI platform should abstract over multiple LLM vendors so that vendor switching is an infrastructure decision rather than an application rewrite. Model routing, fallback, and cost optimization are easier to implement at the gateway layer when the gateway supports multiple backends.
Self-Serve Onboarding: Platform teams that manually onboard every new use case become bottlenecks. Design the onboarding process to be self-serve: a clinical AI team should be able to register a new use case, receive a virtual key, access the vector store, and deploy their first inference request through the platform without a platform team member being involved.
Healthcare Example
Educational Example โ Illustrative Workflow. Not intended for clinical decision making.
The Reference Healthcare Organization begins building its third clinical AI use case โ clinical coding assistance โ and finds that the investment in shared AI platform infrastructure from the first two use cases reduces the implementation timeline by 8 weeks compared to building from scratch.
Platform capabilities available at the start of the third use case:
- AI gateway: already deployed, virtual key issuance takes 30 minutes after platform team review
- Prompt registry: already deployed, prompt deployment workflow documented
- Embedding service: clinical knowledge base already indexed; coding use case adds CPT/ICD code library to the shared vector store (additional cost, no additional infrastructure)
- Evaluation pipeline: golden dataset evaluation framework already operating; coding team adds their use case to the configuration
- Observability: distributed tracing, quality scoring, and cost attribution already collecting data from day one of deployment
Time comparison:
| Capability | Use Case 1 (no platform) | Use Case 2 (partial platform) | Use Case 3 (full platform) |
|---|---|---|---|
| LLM integration | 3 weeks | 0.5 weeks | 0.5 days |
| Authentication setup | 1 week | 1 day | 30 minutes |
| Observability | 2 weeks | 1 week | 0 days (inherited) |
| Cost attribution | 1 week | 0.5 weeks | 0 days (inherited) |
| Evaluation pipeline | 3 weeks | 1 week | 3 days (configuration only) |
| Total infrastructure | 10 weeks | 3 weeks | < 1 week |
The platform investment, made during use case 1 and 2, produces compounding returns for every subsequent use case. The platform team has become a force multiplier for the clinical AI program.
Common Mistakes
Building the Platform Before the First Use Case. Platform design requires real requirements, which only emerge from building and operating actual AI use cases. Organizations that invest 6 months in platform design before deploying their first use case build platforms that do not match real needs. The correct approach: build use case 1 without a platform, identify the repeated patterns, extract them into a platform before use case 2.
Platform as a Bottleneck. If every use case requires a platform team member to make changes โ onboarding, prompt deployment, model version updates โ the platform becomes a bottleneck. Platform design must prioritize self-serve workflows. If a clinical AI team cannot add their prompts to the registry without platform team intervention, the registry is not a platform component โ it is a managed service.
Ignoring Prompt Registry. The most commonly underestimated platform component is the prompt registry. Organizations that treat prompts as application configuration find themselves unable to audit which prompt was in production at the time of a clinical incident, unable to roll back a prompt change without a code deployment, and unable to run A/B tests on prompt variants. Prompt management is not a development convenience โ it is a governance requirement.
One Registry for All Vendors. If the model registry tracks approved model versions per vendor, it must also track HIPAA BAA status per model-per-vendor. A model with a BAA on one vendor's platform does not have a BAA on another. The registry must prevent accidental PHI routing to non-BAA-covered endpoints.
Best Practices
- Deploy the AI gateway as the first platform component, before the first production AI use case
- Issue virtual keys per application, never share keys across applications
- Store all LLM vendor master keys in a secrets manager with rotation policy
- Design the prompt registry for self-serve deployment: the prompt owner, not the platform team, deploys prompt versions
- Build the evaluation pipeline as a CI/CD step that runs automatically on every prompt or model version change
- Index all clinical knowledge sources (guidelines, formularies, criteria) into a single shared vector store, not per-use-case stores
- Measure platform value in use case delivery speed, not platform feature completeness
Alternatives
Cloud-Native AI Platforms: Major cloud providers offer managed AI platform capabilities: Azure AI Studio, AWS SageMaker, Google Vertex AI. These reduce infrastructure management overhead but increase vendor lock-in and may not support the full governance customization required for clinical AI. Evaluate against the organization's cloud strategy and BAA coverage.
LangChain + LangSmith: A popular open-source ecosystem that provides many platform capabilities (prompt management, tracing, evaluation) with a lower initial investment than a fully custom platform. The LangSmith managed service handles observability; LangChain provides the orchestration layer. Appropriate for organizations committed to the LangChain ecosystem.
LiteLLM Proxy: An open-source AI gateway that supports multiple LLM vendors, virtual keys, rate limiting, and spend tracking. A practical starting point that can be extended with custom middleware for clinical AI requirements.
Trade-offs
| Dimension | No Platform | Partial Platform | Full Platform |
|---|---|---|---|
| Use case 1 delivery speed | Fastest | Fast | Slower (platform investment) |
| Use case N delivery speed | Slow (rebuild) | Medium | Fastest |
| Security consistency | Low | Medium | High |
| Governance auditability | Low | Medium | High |
| Team autonomy | High | Medium | High (with self-serve) |
| Cost attribution accuracy | None | Partial | Complete |
| Platform maintenance burden | None | Low | Medium |
Interview Questions
Q: Design an internal AI platform architecture for a hospital system that needs to operate 10 clinical AI use cases securely.
Category: System Design Difficulty: Principal Role: AI Architect
Answer Framework:
Begin with the control plane. The AI gateway is the non-negotiable first component: it is the security boundary, the cost attribution point, the governance enforcement layer, and the abstraction that allows vendor flexibility. Every LLM call from every clinical AI application must traverse the gateway โ enforced at the network layer, not by convention.
The gateway issues virtual API keys per application, maps to master vendor keys stored in a secrets manager, and enforces rate limits and token budgets per key. All LLM audit records flow through the gateway to a HIPAA-compliant log store.
The management plane provides the prompt registry and model registry. The prompt registry is version-controlled and self-serve: clinical AI teams deploy prompt versions through a CI/CD workflow, not by contacting the platform team. The model registry lists approved model versions with their HIPAA BAA status and approved use cases.
Shared AI services: a single embedding service with a shared clinical knowledge vector store. All 10 use cases that need semantic search query the same vector store โ no per-use-case indexes for common clinical content.
The evaluation pipeline runs quality evaluation for every proposed prompt or model version change, gating deployment if metrics fall below threshold. This is the CI/CD system for AI quality.
Key Points to Hit:
- AI gateway as the mandatory network boundary, enforced at infrastructure level
- Virtual keys per application enable revocation and per-app rate limiting
- Prompt registry is a governance requirement, not a convenience
- Shared embedding service prevents N divergent vector stores
- Evaluation pipeline makes governance automated rather than manual
- Self-serve design prevents the platform team from becoming a bottleneck
Q: What are the top three platform capabilities that provide the most leverage for a healthcare organization scaling from 2 to 10 clinical AI use cases?
Category: Architecture Difficulty: Senior Role: AI Architect / FDE
Answer Framework:
First: the AI gateway with virtual keys and cost attribution. Without the gateway, each additional use case adds a new HIPAA surface area, a new API key management problem, and a new cost tracking gap. With the gateway, each new use case automatically inherits the security, cost attribution, and audit logging that were built once.
Second: the prompt registry. As use cases multiply, prompts become ungovernable without a registry. Clinical incidents will require reconstructing which prompt was in production at a given time. Prompt changes will be deployed informally without evaluation. The registry enforces discipline and makes incident investigation tractable.
Third: the shared evaluation pipeline. Manual evaluation cannot scale to 10 use cases without dedicated QA staff. An automated pipeline that runs quality evaluation on every model and prompt change, with results reported to the governance committee, is the scalable alternative to manual review.
Key Points to Hit:
- AI gateway: security and cost attribution that scales horizontally across use cases
- Prompt registry: governance enforcement that becomes non-negotiable at scale
- Evaluation pipeline: quality assurance that cannot remain manual beyond 2โ3 use cases
Key Takeaways
- An AI platform provides shared infrastructure that makes each additional AI use case faster, cheaper, and more governable to build than the previous one
- The AI gateway is the most critical platform component: it enforces security, governance, cost attribution, and audit logging at the organizational boundary
- Virtual API keys per application enable fine-grained rate limiting, cost attribution, and revocation without rotating master vendor keys
- The prompt registry is a governance requirement for clinical AI, not a development convenience: every production prompt must be versioned, approved, and rollback-capable
- The evaluation pipeline is the CI/CD system for AI quality โ it must gate every model and prompt deployment
- Self-serve platform design is the primary defense against the platform team becoming a bottleneck to the AI program
- Platform investment produces compounding returns: the cost per new use case drops significantly after the first two
Further Reading
- Chapter 4: Cost Management โ Model routing and cost attribution implemented at the gateway layer
- Chapter 5: Observability and Monitoring โ The observability plane that runs on top of the platform infrastructure
- Chapter 2: AI Governance โ Governance requirements that the platform enforces
- LiteLLM Proxy Documentation โ Open-source AI gateway implementation reference
- Phase 5: HMS Reference Architecture โ How the AI platform fits into the complete HMS architecture