Zero Trust Architecture for AI Systems

Executive Summary

Zero Trust security replaces the perimeter-based model ("trust everything inside the network") with continuous verification of every request regardless of network location. AI systems are a natural fit for Zero Trust because they call external LLM APIs, process PHI within internal networks, run on shared cloud infrastructure, and involve service-to-service communication across multiple components. This chapter applies Zero Trust principles to the enterprise AI architecture: every AI component authenticates explicitly, every data access is authorized to minimum necessary scope, and every action is logged and monitored.

Learning Objectives

Apply the three Zero Trust principles (verify explicitly, use least privilege, assume breach) to AI infrastructure
Design network segmentation for AI components that handles external LLM API calls without exposing the PHI data layer
Implement mTLS for AI service-to-service communication
Configure cloud-native Zero Trust controls (AWS Security Groups, Azure Private Link, GCP VPC Service Controls) for AI workloads

Why Zero Trust Matters for AI

Traditional perimeter security assumes that once a request is inside the corporate network, it can be trusted to access most internal resources. AI systems invalidate this model in two ways:

External API dependencies: LLM inference calls exit the corporate network to reach Anthropic, Azure, or Google APIs. Traditional perimeter security cannot inspect or control this traffic without intercepting TLS.

Lateral movement via agentic workflows: A compromised AI agent that has accumulated tool access can move laterally through internal systems — reading the EHR, writing to messaging systems, querying the data warehouse — without triggering traditional network-based detection, because all these calls originate from inside the network from a trusted service account.

Zero Trust addresses both: every LLM API call is authenticated and logged, and every tool call by an AI agent is authorized against a minimum-necessary policy.

Zero Trust Principles Applied to AI

python

from dataclasses import dataclass
from enum import Enum
from typing import Optional

# Educational example — not for clinical use

class TrustLevel(Enum):
    NONE = 0
    AUTHENTICATED = 1      # Identity verified
    AUTHORIZED = 2         # Permission confirmed for this specific action
    VERIFIED = 3           # Identity + permission + request context validated


@dataclass
class ZeroTrustDecision:
    """Result of a Zero Trust policy evaluation for an AI request."""
    trust_level: TrustLevel
    allowed: bool
    component: str
    action: str
    reason: str
    requires_mfa: bool = False
    session_duration_minutes: Optional[int] = None


class ZeroTrustPolicyEngine:
    """
    Zero Trust policy evaluation for AI platform requests.
    
    Every AI component request is evaluated against these policies:
    1. Verify identity (authentication)
    2. Verify authorization (minimum necessary access)
    3. Verify context (device health, network location, time of day)
    
    Educational example — not for clinical use.
    """
    
    def evaluate_clinician_request(
        self,
        user_id: str,
        role: str,
        resource: str,
        action: str,
        device_compliance_status: str,
        network_location: str,    # "corporate_network" | "vpn" | "external"
    ) -> ZeroTrustDecision:
        """
        Evaluate a clinician's request to access the AI platform.
        Even internal requests require full authentication and authorization.
        """
        # 1. Device compliance required
        if device_compliance_status != "compliant":
            return ZeroTrustDecision(
                trust_level=TrustLevel.NONE,
                allowed=False,
                component="ai_gateway",
                action=action,
                reason="Device not compliant with MDM policy. Enroll device before accessing PHI-enabled AI features."
            )
        
        # 2. Network location risk scoring
        if network_location == "external" and resource.startswith("phi"):
            return ZeroTrustDecision(
                trust_level=TrustLevel.AUTHENTICATED,
                allowed=False,
                component="ai_gateway",
                action=action,
                reason="PHI-enabled AI features require VPN or corporate network access from external devices."
            )
        
        # 3. Role-based authorization
        allowed_resources = self._get_allowed_resources(role)
        if resource not in allowed_resources:
            return ZeroTrustDecision(
                trust_level=TrustLevel.AUTHENTICATED,
                allowed=False,
                component="ai_gateway",
                action=action,
                reason=f"Role '{role}' is not authorized for resource '{resource}'."
            )
        
        return ZeroTrustDecision(
            trust_level=TrustLevel.VERIFIED,
            allowed=True,
            component="ai_gateway",
            action=action,
            reason="Request verified",
            session_duration_minutes=60,
        )
    
    def _get_allowed_resources(self, role: str) -> list[str]:
        """Return allowed AI resources per clinical role."""
        ROLE_RESOURCE_MAP = {
            "attending_physician": [
                "phi_clinical_rag",
                "phi_discharge_summary",
                "phi_medication_review",
                "knowledge_base_query",
            ],
            "nurse": [
                "phi_clinical_rag",
                "knowledge_base_query",
            ],
            "pharmacist": [
                "phi_medication_review",
                "knowledge_base_query",
            ],
            "administrative": [
                "knowledge_base_query",  # Non-PHI queries only
            ],
        }
        return ROLE_RESOURCE_MAP.get(role, [])

Network Segmentation Architecture

graph TD subgraph "External Zone" CLINICIAN["Clinician Browser\n(VPN required for PHI AI)"] LLM_API["LLM Provider APIs\n(Anthropic, Azure, Vertex)"] end subgraph "DMZ — AI Gateway Zone" ZT_GW["Zero Trust Gateway\nmTLS termination\nIdentity verification\nRate limiting"] end subgraph "Internal — AI Processing Zone" RAG["RAG Service\nNo public internet access"] AGENT["Agent Orchestrator\nTool call ACL enforced"] EMBED["Embedding Service\nNo PHI persistence"] end subgraph "Protected — PHI Data Zone" FHIR_PROXY["FHIR Proxy\nPHI access logging"] VECTOR_DB["Vector Store\nClinical knowledge (no patient PHI)"] AUDIT_DB["HIPAA Audit Log\nImmutable, encrypted"] end subgraph "Egress Control Zone" EGRESS["Egress Proxy\nInspects + logs all LLM API calls\nEnforces no-PHI-in-prompt policy for\nnon-BAA providers"] end CLINICIAN -->|"mTLS"| ZT_GW ZT_GW -->|"JWT + mTLS"| RAG & AGENT RAG & AGENT -->|"Service Account"| FHIR_PROXY RAG --> VECTOR_DB RAG & AGENT -->|"All calls logged"| AUDIT_DB AGENT -->|"Egress inspected"| EGRESS EGRESS -->|"TLS 1.3"| LLM_API

mTLS for AI Service-to-Service Authentication

Mutual TLS (mTLS) is the Zero Trust authentication mechanism for AI service-to-service communication. Unlike API keys, mTLS certificates cannot be stolen from configuration files — they are bound to the specific service identity.

python

import ssl
import httpx

# Educational example — not for clinical use

def create_mtls_client(
    client_cert_path: str,
    client_key_path: str,
    ca_cert_path: str,      # CA that issued the server's certificate
    service_url: str,
) -> httpx.Client:
    """
    Create an httpx client configured for mTLS.
    Used for RAG service → FHIR proxy communication where both
    the client and server must present valid certificates.
    
    Educational example — not for clinical use.
    """
    ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=ca_cert_path)
    ssl_context.load_cert_chain(certfile=client_cert_path, keyfile=client_key_path)
    ssl_context.minimum_version = ssl.TLSVersion.TLSv1_3
    
    return httpx.Client(
        base_url=service_url,
        verify=ssl_context,
    )


# Certificate rotation automation (illustrative approach)
MTLS_CERT_MANAGEMENT = {
    "certificate_lifetime_days": 90,
    "rotation_trigger_days_before_expiry": 30,
    "certificate_authority": "Vault PKI (HashiCorp Vault internal CA)",
    "rotation_mechanism": "Automated via cert-manager (Kubernetes) or Vault agent",
    "components": [
        "ai-gateway",
        "rag-service",
        "fhir-proxy",
        "embedding-service",
        "async-ai-workers",
    ],
}

Cloud-Native Zero Trust Controls

python

# AWS Security Group and VPC configuration for Zero Trust AI deployment
# Educational example — illustrative configuration patterns only

AWS_ZERO_TRUST_AI_CONFIG = {
    "vpc": {
        "description": "AI Platform VPC — no default routes to internet",
        "subnets": {
            "ai_gateway_subnet": {
                "access": "internet-facing (via ALB only)",
                "security_group_ingress": ["443 from 0.0.0.0/0 via ALB"],
                "security_group_egress": ["443 to ai_processing_subnet_sg"],
            },
            "ai_processing_subnet": {
                "access": "private — no internet access",
                "security_group_ingress": [
                    "443 from ai_gateway_subnet_sg",
                    "443 from ai_processing_subnet_sg (service mesh)",
                ],
                "security_group_egress": [
                    "443 to phi_data_subnet_sg",
                    "443 to VPC endpoint for Bedrock (no internet path)",
                    "443 to NAT Gateway for non-Bedrock LLM APIs (egress inspection)",
                ],
            },
            "phi_data_subnet": {
                "access": "private — no internet access",
                "security_group_ingress": ["443 from ai_processing_subnet_sg only"],
                "security_group_egress": ["None — data subnet has no egress"],
            },
        },
        "vpc_endpoints": [
            "com.amazonaws.us-east-1.bedrock-runtime",    # Private Bedrock access
            "com.amazonaws.us-east-1.secretsmanager",
            "com.amazonaws.us-east-1.logs",
        ],
    }
}

Enterprise Considerations

Egress inspection for LLM API calls: A Zero Trust egress proxy that inspects outbound LLM API calls can enforce the policy that PHI is not sent to providers without a BAA. Pattern matching on LLM prompt payloads at the egress layer provides a safety net for configurations where PHI must not leave the internal network.

Service mesh for AI microservices: In Kubernetes-based AI deployments, a service mesh (Istio, Linkerd) provides mTLS between all AI platform microservices without requiring each service to implement mTLS itself. Service mesh is the preferred implementation for Zero Trust service-to-service authentication at scale.

Common Mistakes

1. Implementing network perimeter security and calling it Zero Trust. Placing AI services behind a VPN or private subnet does not implement Zero Trust. Zero Trust requires identity verification on every request regardless of network location.

2. Not applying Zero Trust principles to AI agent tool calls. The AI agent that calls EHR APIs is itself a component that must authenticate (service account), be authorized (minimum necessary tool ACL), and log every action (audit log). Agents without Zero Trust controls become the most significant lateral movement risk in clinical AI.

3. No certificate rotation for mTLS. mTLS certificates with no expiry or manual rotation policy become unrotated in practice. Automate certificate rotation with cert-manager or Vault PKI; certificates should expire in 90 days or fewer.

Key Takeaways

Zero Trust replaces network perimeter trust with identity-based, per-request authorization for every AI component
AI agentic workflows are the highest Zero Trust risk: an authenticated agent with multiple tools can move laterally without triggering network-based detection
mTLS for service-to-service communication is the Zero Trust authentication mechanism — more secure than API keys because certificates are bound to service identity
Egress inspection on LLM API calls is the safety net for the policy that PHI must not be sent to providers without a BAA
Network segmentation should isolate the PHI data zone from the AI processing zone; PHI data zone has no egress

Zero Trust Architecture for AI Systems#

Executive Summary#

Learning Objectives#

Why Zero Trust Matters for AI#

Zero Trust Principles Applied to AI#

Network Segmentation Architecture#

mTLS for AI Service-to-Service Authentication#

Cloud-Native Zero Trust Controls#

Enterprise Considerations#

Common Mistakes#

Key Takeaways#

Further Reading#

Zero Trust Architecture for AI Systems

Executive Summary

Learning Objectives

Why Zero Trust Matters for AI

Zero Trust Principles Applied to AI

Network Segmentation Architecture

mTLS for AI Service-to-Service Authentication

Cloud-Native Zero Trust Controls

Enterprise Considerations

Common Mistakes

Key Takeaways

Further Reading