AI Platform Architecture

Executive Summary

An internal AI platform is the shared infrastructure that allows an enterprise to build, deploy, govern, and operate multiple AI use cases without rebuilding foundational capabilities for each one. The difference between an organization with a well-designed AI platform and one without is observable at the use case level: with a platform, the second clinical AI system takes 40% of the time the first one did; without a platform, every system is built from scratch with duplicated infrastructure, inconsistent governance, and unmeasurable costs. This chapter covers the architecture of an enterprise clinical AI platform โ€” the AI gateway, prompt registry, model registry, embedding service, evaluation pipeline, and observability stack โ€” and the design decisions that determine whether the platform enables the AI program or constrains it.

Learning Objectives

After reading this chapter, you will be able to:

  • Design an internal AI platform architecture appropriate for a healthcare organization deploying multiple clinical AI use cases
  • Identify the shared services that provide the highest leverage when extracted from individual use cases into platform infrastructure
  • Explain the role of the AI gateway as the central control plane for security, cost, and governance
  • Design a prompt registry that supports version control, A/B testing, and rollback for production prompts
  • Evaluate the build-versus-buy decision for each AI platform component

Business Problem

Organizations that deploy their first AI use case successfully often encounter a hidden problem when building the second: the first use case was built with hardcoded LLM API calls, manually managed prompts, no shared authentication, no cost attribution, and no reusable observability. Building the second use case requires starting over. By the third use case, security reviews are rejecting deployments because each team has implemented its own API key management, the CFO cannot attribute AI costs to specific departments, and the governance committee cannot identify which model versions are running in production.

This is the "AI sprawl" problem that dedicated AI platform infrastructure solves. It is not a technology problem โ€” the first use case worked correctly. It is an organizational scalability problem: the practices that produce a working single use case do not compose into a governed, auditable, cost-attributed multi-use-case AI program.

In healthcare, AI sprawl carries additional risk. Multiple teams accessing clinical data through independently managed LLM integrations creates HIPAA surface area that no individual team is responsible for auditing. The AI platform is the architectural boundary that concentrates security, governance, and compliance controls into a single, auditable layer.

Why This Technology Exists

Enterprise AI platforms emerged from the same pattern that produced API gateways, service meshes, and data platforms: when multiple applications need the same cross-cutting capability (security, routing, logging, billing), building it once as shared infrastructure is more efficient and more reliable than building it N times in N applications.

The specific impetus for internal AI platforms was the recognition that LLM API management has cross-cutting requirements that exceed what individual application teams should own: vendor key management at the organization level (not the team level), cost attribution by department and use case, prompt versioning and governance, evaluation pipeline infrastructure, and observability that spans all AI use cases. These requirements are fundamentally organizational, not application-level, and they require centralized infrastructure.

Conceptual Explanation

An AI platform is not a monolith. It is a set of shared services, each providing a specific capability, that individual AI applications use through well-defined interfaces. The key insight is that the platform does not own the AI use cases โ€” the product teams do. The platform provides the infrastructure that makes each use case more secure, more observable, and cheaper to build and operate.

The platform's responsibilities cluster around four concerns:

Control Plane: Governance of AI access. Who can call which model? With which prompts? Enforced at the AI gateway.

Data Plane: The AI requests and responses themselves. The platform handles routing, retry, fallback, and load balancing without the application team managing it.

Management Plane: Configuration, versioning, and lifecycle management of prompts, models, and evaluation datasets.

Observability Plane: Unified tracing, quality metrics, cost attribution, and alerting across all AI use cases.

Core Architecture

Components

AI Gateway

The AI gateway is the single entry point through which all clinical AI applications access LLM capabilities. It is the most critical platform component because it is the enforcement point for every platform policy.

Responsibilities of the AI gateway:

  • Authentication and authorization: Validates that the calling application has permission to invoke the requested model and use case. Uses service-level API keys managed centrally, not application-level keys distributed to individual teams.
  • Rate limiting: Enforces per-use-case and per-department request rate limits to prevent a single high-volume workflow from exhausting organizational rate limits or cost budgets.
  • Model routing: Routes requests to the appropriate model tier based on use case classification (economy, standard, premium โ€” see Chapter 4).
  • Prompt injection: Retrieves the current active prompt version for the requested use case from the prompt registry and injects it into the outbound request. Applications do not manage prompts.
  • Cost attribution: Tags every request with department and use case metadata before forwarding, enabling downstream cost attribution.
  • Audit logging: Writes an immutable audit record for every inference (see Chapter 2).

Implementation options:

  • LiteLLM Proxy: An open-source LLM proxy that supports multiple vendor backends, virtual keys, rate limiting, and spend tracking. A practical starting point for organizations building their first AI gateway.
  • Azure API Management + Azure OpenAI: Native Microsoft stack for organizations standardized on Azure, with built-in policy enforcement, rate limiting, and cost management.
  • Custom FastAPI gateway: Full control over behavior; higher initial development cost; appropriate when platform requirements exceed what off-the-shelf proxies support.

Prompt Registry

A version-controlled store of prompt templates used in production, with governance metadata attached to each version. The prompt registry enforces that prompts are:

  • Deployed through the same review and approval process as code
  • Associated with a specific model version they were validated against
  • Rollback-capable without a code deployment
  • Auditable (every inference records the prompt version used)

Each prompt version record includes: prompt text, model version compatibility, evaluation metrics achieved at validation time, deployment status (development / staging / production), clinical validation status (for Tier 1 prompts), and approver identity.

Model Registry

A catalog of approved model versions available for production use, with governance metadata. The model registry enforces that only approved model versions can be invoked through the AI gateway โ€” preventing individual teams from calling unreleased model versions or deprecated models without governance review.

For each registered model: model identifier, vendor, version, capabilities, HIPAA BAA status, data residency, approved use cases, evaluation results, and approval date.

Embedding Service

A shared inference endpoint that generates text embeddings for all clinical AI use cases. Centralizing embedding generation provides three benefits: consistent embedding model version across all use cases (preventing semantic incompatibility between independently managed vector stores), shared caching for frequently embedded content (clinical guidelines, formularies, standard procedures), and centralized cost attribution for embedding API calls.

Evaluation Pipeline

A CI/CD pipeline for AI quality โ€” analogous to automated testing in software development. When a new prompt version or model version is proposed, the evaluation pipeline:

  1. Runs the new version against the golden dataset for each affected use case
  2. Computes quality metrics and compares them to baseline
  3. Checks for demographic performance disparities
  4. Reports results in a structured format for governance review
  5. Gates deployment promotion if metrics fall below threshold

The evaluation pipeline is the technical enforcement of the governance requirement that every change to a production AI system is evaluated before deployment.

Vector Database (Clinical)

A shared vector database that provides semantic search across clinical knowledge sources: clinical guidelines, formulary, prior authorization criteria, diagnostic protocols, and ICD/CPT code libraries. Centralizing the vector store prevents individual use cases from maintaining duplicated, potentially divergent copies of the same clinical knowledge.

Implementation Patterns

The Virtual Key Pattern

Rather than distributing LLM vendor API keys directly to application teams, the AI gateway issues virtual API keys that map to the platform's master vendor key. This pattern provides three critical capabilities:

  1. Revocation without disruption: If a virtual key is compromised, it can be revoked without rotating the master vendor key and updating every application.
  2. Per-application rate limits: Each virtual key has its own rate limit, preventing one use case from exhausting the shared organizational rate limit.
  3. Spend tracking per key: Cost attribution is derived from virtual key usage, not from post-hoc tagging of requests.
python
# Educational Example โ€” AI Gateway Virtual Key Management
# Illustrative registration pattern for a clinical AI gateway

from dataclasses import dataclass, field
from typing import Optional
import secrets
import time


@dataclass
class VirtualKey:
    """
    A virtual key issued by the AI gateway to a clinical AI application.
    Maps to the platform's master vendor key but carries per-application policy.
    """
    key_id: str
    application_name: str
    department: str
    use_case: str
    allowed_model_tiers: list[str]       # e.g., ["standard", "premium"]
    monthly_token_budget: Optional[int]  # None = unlimited (requires approval)
    requests_per_minute_limit: int
    created_at: float = field(default_factory=time.time)
    active: bool = True


def issue_virtual_key(
    application_name: str,
    department: str,
    use_case: str,
    allowed_tiers: list[str],
    monthly_token_budget: Optional[int],
    rpm_limit: int,
) -> VirtualKey:
    """
    Issue a virtual key for a new clinical AI application.
    Called during the onboarding process, after platform team review.
    """
    key_id = f"vk-{secrets.token_urlsafe(16)}"
    return VirtualKey(
        key_id=key_id,
        application_name=application_name,
        department=department,
        use_case=use_case,
        allowed_model_tiers=allowed_tiers,
        monthly_token_budget=monthly_token_budget,
        requests_per_minute_limit=rpm_limit,
    )

The Prompt Registry Deployment Pattern

python
# Educational Example โ€” Prompt Registry Client
# Illustrates how clinical applications retrieve versioned prompts

from dataclasses import dataclass
from typing import Optional


@dataclass
class PromptVersion:
    """A versioned prompt record from the prompt registry."""
    prompt_id: str                 # e.g., "discharge-summary-system"
    version: str                   # Semantic version, e.g., "3.2.0"
    model_compatibility: str       # e.g., "claude-opus-4-8"
    prompt_text: str
    status: str                    # "production" | "staging" | "development"
    clinical_validated: bool
    quality_score_at_validation: float
    approved_by: Optional[str]


class PromptRegistryClient:
    """
    Client for retrieving prompts from the platform prompt registry.
    Applications always fetch the current production prompt โ€” never hardcode.
    """

    def __init__(self, registry_url: str, virtual_key: str):
        self.registry_url = registry_url
        self.virtual_key = virtual_key
        self._cache: dict[str, PromptVersion] = {}

    def get_production_prompt(
        self,
        prompt_id: str,
        model_version: str,
        use_cache: bool = True,
    ) -> PromptVersion:
        """
        Retrieve the current production prompt for a use case.
        Caches result for 5 minutes to reduce registry latency overhead.
        """
        cache_key = f"{prompt_id}:{model_version}"
        if use_cache and cache_key in self._cache:
            return self._cache[cache_key]

        # In production, this makes an HTTP call to the registry service
        # Raises PromptNotFoundError if no production prompt exists for this use case
        prompt = self._fetch_from_registry(prompt_id, model_version)
        self._cache[cache_key] = prompt
        return prompt

    def _fetch_from_registry(self, prompt_id: str, model_version: str) -> PromptVersion:
        # Placeholder โ€” real implementation calls the registry API
        raise NotImplementedError("Implement registry API call")

Enterprise Considerations

Platform Governance Model: The AI platform team provides infrastructure; product teams own their AI use cases. This separation of concerns requires clear interface contracts: what does the platform guarantee (availability, latency, security controls), and what does the product team own (business logic, prompt quality, clinical validation)? Blurring this boundary produces either a platform that tries to govern use case content (overreach) or use cases that bypass the platform (underreach).

Platform Adoption: An AI platform is only valuable if product teams use it. Forced adoption through policy is less effective than making the platform genuinely easier to use than the alternative. The platform team's primary success metric should be: how long does it take a new clinical AI use case to go from approved business case to production deployment? If the platform reduces this from 6 months to 6 weeks, adoption is self-reinforcing.

Multi-Vendor Strategy: The AI platform should abstract over multiple LLM vendors so that vendor switching is an infrastructure decision rather than an application rewrite. Model routing, fallback, and cost optimization are easier to implement at the gateway layer when the gateway supports multiple backends.

Self-Serve Onboarding: Platform teams that manually onboard every new use case become bottlenecks. Design the onboarding process to be self-serve: a clinical AI team should be able to register a new use case, receive a virtual key, access the vector store, and deploy their first inference request through the platform without a platform team member being involved.

Security Considerations

The AI gateway is the organization's primary security boundary for AI access. Its security posture determines the security posture of every clinical AI use case:

  • Network boundary: The AI gateway must be the only egress path for LLM API calls. Direct application-to-vendor API connections must be blocked at the network layer.
  • Key management: Master LLM vendor API keys must be stored in a secrets manager (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault) with rotation policy, not in environment variables or configuration files.
  • TLS enforcement: All connections through the AI gateway (application โ†’ gateway, gateway โ†’ vendor) must use TLS 1.2 or higher.
  • Input validation: The AI gateway should validate that requests do not contain patterns associated with prompt injection โ€” while recognizing that prompt injection detection at the gateway layer is a defense-in-depth measure, not a complete defense.

Healthcare Example

โŠ• Healthcare Example

Educational Example โ€” Illustrative Workflow. Not intended for clinical decision making.

The Reference Healthcare Organization begins building its third clinical AI use case โ€” clinical coding assistance โ€” and finds that the investment in shared AI platform infrastructure from the first two use cases reduces the implementation timeline by 8 weeks compared to building from scratch.

Platform capabilities available at the start of the third use case:

  • AI gateway: already deployed, virtual key issuance takes 30 minutes after platform team review
  • Prompt registry: already deployed, prompt deployment workflow documented
  • Embedding service: clinical knowledge base already indexed; coding use case adds CPT/ICD code library to the shared vector store (additional cost, no additional infrastructure)
  • Evaluation pipeline: golden dataset evaluation framework already operating; coding team adds their use case to the configuration
  • Observability: distributed tracing, quality scoring, and cost attribution already collecting data from day one of deployment

Time comparison:

Capability Use Case 1 (no platform) Use Case 2 (partial platform) Use Case 3 (full platform)
LLM integration 3 weeks 0.5 weeks 0.5 days
Authentication setup 1 week 1 day 30 minutes
Observability 2 weeks 1 week 0 days (inherited)
Cost attribution 1 week 0.5 weeks 0 days (inherited)
Evaluation pipeline 3 weeks 1 week 3 days (configuration only)
Total infrastructure 10 weeks 3 weeks < 1 week

The platform investment, made during use case 1 and 2, produces compounding returns for every subsequent use case. The platform team has become a force multiplier for the clinical AI program.

Common Mistakes

Building the Platform Before the First Use Case. Platform design requires real requirements, which only emerge from building and operating actual AI use cases. Organizations that invest 6 months in platform design before deploying their first use case build platforms that do not match real needs. The correct approach: build use case 1 without a platform, identify the repeated patterns, extract them into a platform before use case 2.

Platform as a Bottleneck. If every use case requires a platform team member to make changes โ€” onboarding, prompt deployment, model version updates โ€” the platform becomes a bottleneck. Platform design must prioritize self-serve workflows. If a clinical AI team cannot add their prompts to the registry without platform team intervention, the registry is not a platform component โ€” it is a managed service.

Ignoring Prompt Registry. The most commonly underestimated platform component is the prompt registry. Organizations that treat prompts as application configuration find themselves unable to audit which prompt was in production at the time of a clinical incident, unable to roll back a prompt change without a code deployment, and unable to run A/B tests on prompt variants. Prompt management is not a development convenience โ€” it is a governance requirement.

One Registry for All Vendors. If the model registry tracks approved model versions per vendor, it must also track HIPAA BAA status per model-per-vendor. A model with a BAA on one vendor's platform does not have a BAA on another. The registry must prevent accidental PHI routing to non-BAA-covered endpoints.

Best Practices

  • Deploy the AI gateway as the first platform component, before the first production AI use case
  • Issue virtual keys per application, never share keys across applications
  • Store all LLM vendor master keys in a secrets manager with rotation policy
  • Design the prompt registry for self-serve deployment: the prompt owner, not the platform team, deploys prompt versions
  • Build the evaluation pipeline as a CI/CD step that runs automatically on every prompt or model version change
  • Index all clinical knowledge sources (guidelines, formularies, criteria) into a single shared vector store, not per-use-case stores
  • Measure platform value in use case delivery speed, not platform feature completeness

Alternatives

Cloud-Native AI Platforms: Major cloud providers offer managed AI platform capabilities: Azure AI Studio, AWS SageMaker, Google Vertex AI. These reduce infrastructure management overhead but increase vendor lock-in and may not support the full governance customization required for clinical AI. Evaluate against the organization's cloud strategy and BAA coverage.

LangChain + LangSmith: A popular open-source ecosystem that provides many platform capabilities (prompt management, tracing, evaluation) with a lower initial investment than a fully custom platform. The LangSmith managed service handles observability; LangChain provides the orchestration layer. Appropriate for organizations committed to the LangChain ecosystem.

LiteLLM Proxy: An open-source AI gateway that supports multiple LLM vendors, virtual keys, rate limiting, and spend tracking. A practical starting point that can be extended with custom middleware for clinical AI requirements.

Trade-offs

Dimension No Platform Partial Platform Full Platform
Use case 1 delivery speed Fastest Fast Slower (platform investment)
Use case N delivery speed Slow (rebuild) Medium Fastest
Security consistency Low Medium High
Governance auditability Low Medium High
Team autonomy High Medium High (with self-serve)
Cost attribution accuracy None Partial Complete
Platform maintenance burden None Low Medium

Interview Questions

Q: Design an internal AI platform architecture for a hospital system that needs to operate 10 clinical AI use cases securely.

Category: System Design Difficulty: Principal Role: AI Architect

Answer Framework:

Begin with the control plane. The AI gateway is the non-negotiable first component: it is the security boundary, the cost attribution point, the governance enforcement layer, and the abstraction that allows vendor flexibility. Every LLM call from every clinical AI application must traverse the gateway โ€” enforced at the network layer, not by convention.

The gateway issues virtual API keys per application, maps to master vendor keys stored in a secrets manager, and enforces rate limits and token budgets per key. All LLM audit records flow through the gateway to a HIPAA-compliant log store.

The management plane provides the prompt registry and model registry. The prompt registry is version-controlled and self-serve: clinical AI teams deploy prompt versions through a CI/CD workflow, not by contacting the platform team. The model registry lists approved model versions with their HIPAA BAA status and approved use cases.

Shared AI services: a single embedding service with a shared clinical knowledge vector store. All 10 use cases that need semantic search query the same vector store โ€” no per-use-case indexes for common clinical content.

The evaluation pipeline runs quality evaluation for every proposed prompt or model version change, gating deployment if metrics fall below threshold. This is the CI/CD system for AI quality.

Key Points to Hit:

  • AI gateway as the mandatory network boundary, enforced at infrastructure level
  • Virtual keys per application enable revocation and per-app rate limiting
  • Prompt registry is a governance requirement, not a convenience
  • Shared embedding service prevents N divergent vector stores
  • Evaluation pipeline makes governance automated rather than manual
  • Self-serve design prevents the platform team from becoming a bottleneck

Q: What are the top three platform capabilities that provide the most leverage for a healthcare organization scaling from 2 to 10 clinical AI use cases?

Category: Architecture Difficulty: Senior Role: AI Architect / FDE

Answer Framework:

First: the AI gateway with virtual keys and cost attribution. Without the gateway, each additional use case adds a new HIPAA surface area, a new API key management problem, and a new cost tracking gap. With the gateway, each new use case automatically inherits the security, cost attribution, and audit logging that were built once.

Second: the prompt registry. As use cases multiply, prompts become ungovernable without a registry. Clinical incidents will require reconstructing which prompt was in production at a given time. Prompt changes will be deployed informally without evaluation. The registry enforces discipline and makes incident investigation tractable.

Third: the shared evaluation pipeline. Manual evaluation cannot scale to 10 use cases without dedicated QA staff. An automated pipeline that runs quality evaluation on every model and prompt change, with results reported to the governance committee, is the scalable alternative to manual review.

Key Points to Hit:

  • AI gateway: security and cost attribution that scales horizontally across use cases
  • Prompt registry: governance enforcement that becomes non-negotiable at scale
  • Evaluation pipeline: quality assurance that cannot remain manual beyond 2โ€“3 use cases

Key Takeaways

  • An AI platform provides shared infrastructure that makes each additional AI use case faster, cheaper, and more governable to build than the previous one
  • The AI gateway is the most critical platform component: it enforces security, governance, cost attribution, and audit logging at the organizational boundary
  • Virtual API keys per application enable fine-grained rate limiting, cost attribution, and revocation without rotating master vendor keys
  • The prompt registry is a governance requirement for clinical AI, not a development convenience: every production prompt must be versioned, approved, and rollback-capable
  • The evaluation pipeline is the CI/CD system for AI quality โ€” it must gate every model and prompt deployment
  • Self-serve platform design is the primary defense against the platform team becoming a bottleneck to the AI program
  • Platform investment produces compounding returns: the cost per new use case drops significantly after the first two

Glossary

AI gateway: A network proxy that serves as the single entry point for all LLM API calls, enforcing authentication, rate limiting, cost attribution, and audit logging.

Virtual key: A synthetic API key issued by the AI gateway that maps to a master vendor key and carries per-application policy constraints.

Prompt registry: A version-controlled store of prompt templates used in production, with governance metadata and deployment lifecycle management.

Model registry: A catalog of approved AI model versions available for production use, with HIPAA BAA status, capability documentation, and governance approval records.

Evaluation pipeline: A CI/CD automation layer that runs quality evaluation for every proposed AI model or prompt change, gating promotion if metrics fall below threshold.

Self-serve platform: A platform designed so that consuming teams can onboard, configure, and operate independently without requiring the platform team to make changes on their behalf.

Further Reading