HIPAA and AI

Executive Summary

Every AI system that processes, stores, transmits, or derives information from patient data in a healthcare context operates under the Health Insurance Portability and Accountability Act (HIPAA). For AI architects, HIPAA is not a compliance checkbox — it is a set of technical constraints that directly shape system design: what data can be sent to which vendors, how inference logs are stored, what metadata can appear in observability dashboards, whether a vector database index constitutes a PHI data store, and what de-identification methods are required before data can be used for AI model training or evaluation. This chapter translates HIPAA's technical safeguard requirements into actionable architectural patterns for clinical AI systems.

Learning Objectives

After reading this chapter, you will be able to:

Identify which components of a clinical AI system constitute Protected Health Information (PHI) under HIPAA and which fall outside the definition
Design a clinical AI inference pipeline that processes PHI in compliance with the Minimum Necessary standard and HIPAA Security Rule technical safeguards
Evaluate a Business Associate Agreement for the specific provisions relevant to AI API processing of PHI
Apply the Safe Harbor and Expert Determination de-identification methods to prepare clinical data for AI training and evaluation datasets

Business Problem

Healthcare organizations building clinical AI systems face a problem that general enterprise AI architects do not: the primary fuel for AI — patient data — is subject to a federal privacy and security framework that applies to every system component that touches it. An organization that builds a clinical AI system without understanding which components are PHI data stores, which vendors require BAAs, and which de-identification methods are legally sufficient creates regulatory liability that can result in penalties up to $1.9 million per violation category per year.

The specific challenges that arise when HIPAA meets AI:

LLM inference API calls contain patient context in the request payload — this is PHI transmission to a vendor
Vector database indexes built from clinical notes contain PHI in the vector space — the index itself is a PHI data store
Observability traces that capture prompt/response payloads contain PHI
Golden evaluation datasets built from real patient encounters are PHI collections requiring the same protection as source records

Why This Technology Exists

HIPAA was enacted in 1996 to address two healthcare information problems: ensuring health insurance portability for workers changing jobs (the insurance portability component), and establishing a national standard for the protection of patient health information (the accountability component). The Security Rule (2003) added specific technical safeguard requirements for electronic PHI (ePHI).

Neither HIPAA nor the Security Rule was written with AI in mind. The Privacy Rule's definition of PHI, the Security Rule's safeguard categories, and the Breach Notification Rule's requirements were all established before large-scale ML systems existed. Applying them to AI systems requires careful interpretation — which HHS Office for Civil Rights (OCR) guidance, enforcement actions, and informal guidance have incrementally clarified. As of 2025, HHS has not issued comprehensive AI-specific HIPAA guidance, which means architects must reason from existing principles applied to AI contexts.

Conceptual Explanation

What Is PHI

Protected Health Information is individually identifiable health information held or transmitted by a Covered Entity or Business Associate. The definition has three components:

Health information: Relates to physical or mental health condition, healthcare services provided, or payment for healthcare services
Individually identifiable: Can identify the individual or there is reasonable basis to believe it could
Held or transmitted: By a Covered Entity or Business Associate in any medium

The 18 HIPAA identifiers that, when combined with health information, constitute PHI:

Names
Geographic data (smaller than state, except for city/state/ZIP with population > 20,000)
Dates (except year) related to the individual
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
URLs
IP addresses
Biometric identifiers (finger and voice prints)
Full face photos and comparable images
Any other unique identifying number, characteristic, or code

De-Identification

De-identified information is not PHI and is not subject to HIPAA. HIPAA recognizes two de-identification methods:

Safe Harbor: Remove all 18 identifiers AND the covered entity has no actual knowledge that the remaining information could identify the individual. Safe Harbor is deterministic — if all 18 identifiers are removed, the information is de-identified.

Expert Determination: A qualified statistical or scientific expert applies generally accepted statistical and scientific principles to determine that the risk of identification is very small. The expert's methods and results are documented.

For AI training and evaluation datasets, Safe Harbor is the default method because it does not require a statistical expert and is auditable — you can verify that all 18 identifiers are absent.

Core Architecture

graph TD subgraph "PHI Data Sources" E1["EHR\n(Epic FHIR R4)"] E2["Laboratory\nResults"] E3["Radiology\nReports"] E4["Clinical Notes\nUnstructured"] end subgraph "PHI Processing Boundary" GW["AI Gateway\n(PHI-Authorized)"] PR["Prompt Builder\nMinimum Necessary"] AU["Audit Logger\nHashed Identifiers Only"] CF["Cache Manager\nEncrypted PHI Cache"] end subgraph "External Vendors — BAA Required" LLM["LLM API\nAnthropic / Azure OAI"] VS["Vector Database\nClinical Index"] end subgraph "De-identified Zone" DI["De-identification\nPipeline"] ED["Evaluation\nDatasets"] TR["Training\nDatasets"] OB["Observability\nTraces (no PHI)"] end subgraph "Security Controls" TLS["TLS 1.2+\nAll transmissions"] AES["AES-256\nAt rest encryption"] RBAC["RBAC\nMinimum necessary access"] VPC["VPC / Private Link\nNetwork isolation"] end E1 & E2 & E3 & E4 --> GW GW --> PR --> LLM GW --> VS GW --> AU GW --> CF GW --> DI --> ED & TR AU --> OB TLS --> GW & LLM AES --> VS & CF RBAC --> GW & AU VPC --> GW & LLM

Components

The Minimum Necessary Standard

The HIPAA Minimum Necessary standard requires that disclosures of PHI be limited to the information reasonably necessary to accomplish the intended purpose. For clinical AI, this translates to: do not include patient identifiers in the LLM prompt context unless the identifiers are required for the AI's function.

For a discharge summary generation AI, the patient's name and medical record number may be required (to personalize the output and enable it to be placed in the correct record). For a clinical knowledge RAG query (retrieving drug interaction information for a medication), the patient's name is not required — only the medication names and relevant clinical context.

A prompt engineering discipline that applies Minimum Necessary extracts only the clinical facts necessary for the AI task, strips or hashes identifiers where the task does not require them, and documents the Minimum Necessary determination for each use case.

Business Associate Agreements for AI Vendors

Any vendor that processes PHI on behalf of a Covered Entity is a Business Associate and requires a signed BAA. For AI systems, BAA-required vendors include:

LLM API providers (when inference requests contain PHI)
Vector database vendors (when the index contains PHI)
Cloud infrastructure providers (when PHI is stored on their infrastructure)
Observability vendors (when traces capture PHI payloads)

Vendors that do not receive PHI do not require BAAs: an AI gateway that de-identifies prompts before forwarding them to the LLM API shields the LLM vendor from PHI — but only if the de-identification happens before the vendor receives any data. If the gateway logs full prompts before de-identification, the logging service becomes a BAA-required vendor.

Audit Logging Requirements

The HIPAA Security Rule requires audit controls that record and examine activity in information systems containing ePHI. For AI systems, this means:

Every AI inference request that involves PHI must be logged
Logs must be retained per the organization's record retention policy (typically 6 years)
Logs must include: timestamp, user/system identity, data accessed, action taken
Logs must NOT include raw PHI beyond what is required for audit purposes — hashed patient identifiers (SHA-256 of the medical record number + a system-level salt) satisfy the audit requirement without creating a PHI inventory in the log store

Breach Notification

The HIPAA Breach Notification Rule requires Covered Entities to notify affected individuals (within 60 days), HHS, and the media (for breaches affecting > 500 individuals in a state) when unsecured PHI is breached. For AI systems, breach scenarios include:

LLM vendor data breach exposing stored inference request content (which is why inference data retention by AI vendors should be minimized by contract)
Misconfigured vector database index exposed to unauthorized access
Audit logs containing PHI exposed in a logging platform breach

Implementation Patterns

HIPAA-Compliant Inference Request Pattern

python

# Educational Example — HIPAA-Compliant Clinical AI Inference Request
# Illustrates PHI handling patterns for LLM API calls in a HIPAA context
# Not a production security implementation — consult your security team

import hashlib
import json
from dataclasses import dataclass
from typing import Optional
import anthropic


@dataclass
class AuditRecord:
    """
    Immutable audit record for a clinical AI inference request.
    Contains no raw PHI — uses hashed patient identifier for auditability.
    """
    request_id: str
    timestamp_utc: str
    hashed_patient_id: str          # SHA-256(mrn + system_salt) — not reversible
    use_case: str                    # e.g., "discharge_summary"
    model_id: str
    prompt_version: str
    input_token_count: int
    output_token_count: int
    override_applied: bool = False


def hash_patient_identifier(medical_record_number: str, system_salt: str) -> str:
    """
    One-way hash of patient MRN for audit logging.
    System salt must be stored securely — it is not the MRN salt; it is
    a system-level secret that prevents rainbow table attacks.
    """
    combined = f"{medical_record_number}:{system_salt}"
    return hashlib.sha256(combined.encode()).hexdigest()


def build_minimum_necessary_prompt(
    clinical_context: dict,
    use_case: str,
    include_patient_identifier: bool = False,
) -> str:
    """
    Build an LLM prompt containing only the PHI necessary for the use case.
    For most clinical AI use cases, the patient name/MRN are NOT required —
    the AI needs clinical facts, not identity.
    """
    required_fields = {
        "discharge_summary": [
            "admission_diagnosis", "procedures", "medications_at_discharge",
            "follow_up_instructions", "diet_restrictions", "activity_restrictions",
        ],
        "prior_authorization": [
            "diagnosis_codes", "procedure_codes", "clinical_rationale",
            "failed_alternative_treatments",
        ],
        "clinical_coding": [
            "encounter_diagnoses", "procedures_performed",
            "discharge_disposition",
        ],
    }

    fields = required_fields.get(use_case, list(clinical_context.keys()))

    # Only include identifier if the use case requires it (e.g., personalized summary)
    if include_patient_identifier and "patient_name" in clinical_context:
        fields = ["patient_name"] + fields

    filtered_context = {
        k: v for k, v in clinical_context.items() if k in fields
    }

    return json.dumps(filtered_context, indent=2)


def generate_clinical_output_hipaa_compliant(
    patient_mrn: str,
    clinical_context: dict,
    use_case: str,
    system_prompt: str,
    model_id: str,
    prompt_version: str,
    system_salt: str,
    anthropic_client: anthropic.Anthropic,
    request_id: str,
    timestamp_utc: str,
) -> tuple[str, AuditRecord]:
    """
    Generate clinical AI output with HIPAA-compliant audit logging.
    Returns the clinical output and the audit record.
    The audit record contains no raw PHI.
    """
    prompt_content = build_minimum_necessary_prompt(
        clinical_context=clinical_context,
        use_case=use_case,
        include_patient_identifier=(use_case == "discharge_summary"),
    )

    response = anthropic_client.messages.create(
        model=model_id,
        max_tokens=2048,
        system=system_prompt,
        messages=[{"role": "user", "content": prompt_content}],
    )

    output_text = response.content[0].text

    audit_record = AuditRecord(
        request_id=request_id,
        timestamp_utc=timestamp_utc,
        hashed_patient_id=hash_patient_identifier(patient_mrn, system_salt),
        use_case=use_case,
        model_id=model_id,
        prompt_version=prompt_version,
        input_token_count=response.usage.input_tokens,
        output_token_count=response.usage.output_tokens,
    )

    return output_text, audit_record

Safe Harbor De-identification for Evaluation Datasets

python

# Educational Example — Safe Harbor De-identification
# Removes all 18 HIPAA identifiers for AI training/evaluation dataset preparation
# Organizations should have a qualified privacy officer review de-identification implementations

import re
from dataclasses import dataclass


HIPAA_DATE_PATTERN = re.compile(
    r"\b(0?[1-9]|1[0-2])[-/](0?[1-9]|[12]\d|3[01])[-/](\d{2,4})\b"
)
HIPAA_PHONE_PATTERN = re.compile(
    r"\b(\+?1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?\d{4})\b"
)
HIPAA_MRN_PATTERN = re.compile(r"\bMRN[:\s#]*\d{5,10}\b", re.IGNORECASE)
HIPAA_SSN_PATTERN = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")


@dataclass
class DeidentificationResult:
    original_char_count: int
    deidentified_text: str
    identifiers_removed: dict[str, int]  # identifier_type → count removed


def safe_harbor_deidentify(clinical_text: str) -> DeidentificationResult:
    """
    Apply HIPAA Safe Harbor de-identification to clinical text.
    Removes or replaces the 18 HIPAA identifier categories.

    NOTE: This is an illustrative pattern. Production Safe Harbor de-identification
    requires a comprehensive NLP pipeline (named entity recognition for names,
    addresses, dates) reviewed by a qualified privacy officer.
    """
    counts: dict[str, int] = {}
    text = clinical_text

    # Dates — replace with year only (Safe Harbor permits year)
    dates = HIPAA_DATE_PATTERN.findall(text)
    counts["dates"] = len(dates)
    text = HIPAA_DATE_PATTERN.sub(lambda m: m.group(3), text)

    # Phone numbers
    phones = HIPAA_PHONE_PATTERN.findall(text)
    counts["phone_numbers"] = len(phones)
    text = HIPAA_PHONE_PATTERN.sub("[PHONE REDACTED]", text)

    # Medical record numbers
    mrns = HIPAA_MRN_PATTERN.findall(text)
    counts["medical_record_numbers"] = len(mrns)
    text = HIPAA_MRN_PATTERN.sub("[MRN REDACTED]", text)

    # Social Security numbers
    ssns = HIPAA_SSN_PATTERN.findall(text)
    counts["ssn"] = len(ssns)
    text = HIPAA_SSN_PATTERN.sub("[SSN REDACTED]", text)

    # Note: Names, addresses, and other identifiers require NLP-based NER
    # This pattern only handles structured identifiers detectable by regex

    return DeidentificationResult(
        original_char_count=len(clinical_text),
        deidentified_text=text,
        identifiers_removed=counts,
    )

Enterprise Considerations

PHI Data Classification: Healthcare organizations must classify PHI within their AI systems. Not all PHI carries the same sensitivity: a patient's name combined with a diagnosis code is more sensitive than a patient's age-range combined with a procedure code. Many organizations implement tiered PHI classification that maps to access control and encryption requirements.

Incident Response for AI Systems: The HIPAA Breach Notification Rule requires a 60-day notification window for breaches affecting PHI. AI systems that process PHI must be included in the organization's incident response plan: who is notified when an AI API call log is exposed, who determines whether the exposure constitutes a reportable breach, and who contacts HHS OCR.

State Law Preemption: HIPAA sets a federal floor; states may enact stricter privacy laws. California's CMIA (Confidentiality of Medical Information Act), New York's SHIELD Act, and several other state frameworks impose requirements beyond HIPAA. Healthcare organizations operating in multiple states must comply with the most restrictive applicable state law.

Research Exemption and IRB: If clinical AI systems will use patient data for research (not just treatment or operations), the research use may require Institutional Review Board (IRB) approval and a separate waiver of authorization under the HIPAA Privacy Rule.

Security Considerations

Technical Safeguards Required by HIPAA Security Rule:

Access control: Unique user identification for all users who access ePHI; emergency access procedure; automatic logoff; encryption and decryption
Audit controls: Hardware, software, and procedural mechanisms to record and examine access to ePHI
Integrity controls: Authentication mechanisms to ensure ePHI is not altered or destroyed in an unauthorized manner
Transmission security: Technical security measures to guard against unauthorized access during electronic transmission

For clinical AI systems:

All transmission of PHI-containing inference requests must use TLS 1.2 or higher
PHI stored in vector databases or caches must be encrypted at rest (AES-256 minimum)
Access to AI systems processing PHI must require authentication — API keys are insufficient without additional access control
Vector database indexes are ePHI data stores if they contain clinical data derived from patient records — they require the same encryption and access control as the source data

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

The Reference Healthcare Organization implements HIPAA-compliant controls for its discharge summary AI system:

PHI flow through the system:

Hospitalist opens the discharge summary AI tool in the EHR
The EHR passes the patient encounter context (diagnosis, procedures, medications) via a FHIR R4 API call authenticated with the hospitalist's SMART on FHIR credentials
The AI gateway receives the FHIR bundle, extracts clinical facts using the Minimum Necessary standard (includes patient name for personalization; excludes SSN, insurance ID, and other identifiers not needed for discharge summary generation)
The gateway logs the request with the hashed patient identifier and sends the minimum-necessary prompt to the LLM API (signed BAA in place with the LLM vendor)
The LLM returns the draft discharge summary; the gateway returns it to the EHR
The hospitalist reviews, edits, and finalizes the summary — the final document is saved in the EHR (the legal medical record)
The AI draft that the hospitalist reviewed is not stored in any system other than the EHR

PHI data stores in this system:

The EHR (existing, pre-existing controls)
The audit log in the AI gateway (hashed patient identifiers — not raw PHI, but treated as sensitive)
No PHI stored in the LLM vendor's infrastructure (contractually confirmed in BAA)

Vendors requiring BAAs:

LLM API vendor (receives PHI in inference requests)
Cloud infrastructure provider (hosts the AI gateway)
No BAA required for the EHR vendor (existing as a Covered Entity's own system, not a Business Associate)

Common Mistakes

Not Recognizing the Vector Database as a PHI Data Store. If a vector database index is built from clinical notes, discharge summaries, or other patient records, the index is a PHI data store even though the underlying data is represented as float vectors. The information it encodes is derived from PHI, and in many cases the original text can be approximately reconstructed from the vectors. Apply the same access controls and BAA requirements as to the source data.

Observability Traces Containing Raw PHI. The most common HIPAA violation in clinical AI systems is an observability trace that captures the full LLM prompt payload — which contains patient clinical context, including identifiers. Audit logs and traces in clinical AI systems must either capture only metadata (hashed IDs, token counts, latency) or be subject to the same PHI access controls as the source EHR data.

Assuming Vendor SOC 2 Certification Means HIPAA Compliance. SOC 2 Type II and HIPAA are different standards. A vendor may have a clean SOC 2 report and still be non-compliant with HIPAA — the two frameworks have different control requirements. A signed BAA is the specific mechanism that creates a HIPAA compliance relationship with a vendor; a SOC 2 report does not.

Using Production PHI for AI Evaluation Datasets Without De-identification. Organizations that pull real patient records to test or evaluate their AI systems create PHI collections that require the same protections as the source medical records. All AI evaluation datasets built from real patient data must go through Safe Harbor or Expert Determination de-identification before use.

Best Practices

Treat every AI system component that touches patient-derived data as a PHI data store until confirmed otherwise
Apply the Minimum Necessary standard in prompt engineering — exclude identifiers from inference requests when the AI task does not require them
Log every clinical AI inference request with hashed patient identifiers, not raw PHI
Build evaluation and training datasets using Safe Harbor de-identification, documented with the de-identification method and the reviewer identity
Confirm in writing (via BAA) that LLM vendors do not use inference request content for model training and do not retain PHI beyond operational necessity
Include clinical AI systems in the organization's incident response plan with explicit breach notification procedures
Review observability tooling configuration to confirm that full prompt/response payloads are not captured in traces or logs without PHI access controls

Trade-offs

Design Choice	HIPAA Risk	Clinical Capability	Operational Complexity
Full patient context in prompt	High (more PHI in transit)	Highest (most context)	Low
Minimum necessary prompt (no identifiers)	Low	High (clinical facts only)	Medium (requires extraction)
De-identified proxy (synthetic context)	Lowest	Lower (may lose clinical nuance)	High
On-premise LLM (no external transmission)	Lowest (no PHI transmission)	Constrained (model quality)	Highest

Interview Questions

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?

Category: Architecture / Security Difficulty: Senior Role: AI Architect / Healthcare AI Engineer

Answer Framework:

The HIPAA risk is that the observability platform has become a PHI data store — every trace that captures a prompt containing patient clinical context is storing ePHI. The observability platform vendor is now a Business Associate if it receives, stores, or processes the trace data. If the organization does not have a BAA with the observability vendor, every trace write is a potential unauthorized disclosure of PHI.

The risks compound: observability platforms are designed for engineering access (developers, SREs, platform teams), not for clinical access control. Patient clinical data in traces is likely accessible to engineering staff who are not authorized workforce members under HIPAA to access patient records.

Three mitigations:

Payload scrubbing at the trace boundary: Configure the AI gateway to emit traces with metadata only (request ID, hashed patient ID, token counts, latency, model ID) and no prompt/response content. This is the architectural preferred solution.
PHI access control on the observability platform: Apply the same access controls to the observability platform as to the EHR — which typically means clinical-only access, eliminating the debugging utility for engineering staff.
Separate debug logging with PHI controls: Maintain a separate, PHI-controlled debug log that engineering staff can request access to through a formal process (equivalent to break-glass access), rather than including PHI in the general observability stream.

Key Points to Hit:

Observability platform with PHI payloads = PHI data store requiring BAA and access controls
Engineering access to traces ≠ authorized HIPAA workforce access to patient records
Solution is payload scrubbing at the emission point, not access control on the downstream platform
This is a common design error in clinical AI systems — know it and catch it

Key Takeaways

HIPAA applies to every component that processes, stores, or transmits PHI — including LLM API inference requests, vector database indexes, audit logs, and observability traces
The Minimum Necessary standard requires limiting PHI in AI inference requests to what is actually needed for the clinical AI task — many use cases can operate without patient identifiers in the prompt
Every AI vendor that receives PHI in inference requests is a Business Associate requiring a signed BAA — this includes LLM API providers and vector database vendors
Observability traces that capture full prompt/response payloads are PHI data stores; configure tracing to emit metadata and hashed identifiers only
All AI evaluation and training datasets built from real patient data must be de-identified using Safe Harbor or Expert Determination before use
Hashed patient identifiers (SHA-256 of MRN + system salt) in audit logs satisfy the HIPAA audit control requirement without creating a PHI inventory in the log store

Glossary

Protected Health Information (PHI): Individually identifiable health information held or transmitted by a Covered Entity or Business Associate. Electronic PHI (ePHI) is PHI in electronic form.

Business Associate (BA): A person or entity that performs functions or activities on behalf of a Covered Entity that involve the use or disclosure of PHI.

Business Associate Agreement (BAA): A HIPAA-required contract between a Covered Entity and a Business Associate governing PHI use and disclosure.

Safe Harbor De-identification: The HIPAA-approved method of de-identification that removes all 18 specific identifier categories from health information.

Minimum Necessary Standard: The HIPAA requirement to limit disclosure of PHI to the minimum necessary to accomplish the intended purpose.

HIPAA Security Rule: The HIPAA component that establishes standards for protecting the confidentiality, integrity, and availability of ePHI, including technical safeguards.

HIPAA and AI#

Executive Summary#

Learning Objectives#

Business Problem#

Why This Technology Exists#

Conceptual Explanation#

What Is PHI#

De-Identification#

Core Architecture#

Components#

Implementation Patterns#

Enterprise Considerations#

Security Considerations#

Healthcare Example#

Common Mistakes#

Best Practices#

Trade-offs#

Interview Questions#

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?#

Key Takeaways#

Glossary#

Further Reading#

HIPAA and AI

Executive Summary

Learning Objectives

Business Problem

Why This Technology Exists

Conceptual Explanation

What Is PHI

De-Identification

Core Architecture

Components

Implementation Patterns

Enterprise Considerations

Security Considerations

Healthcare Example

Common Mistakes

Best Practices

Trade-offs

Interview Questions

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?

Key Takeaways

Glossary

Further Reading