HIPAA and AI

Executive Summary

Every AI system that processes, stores, transmits, or derives information from patient data in a healthcare context operates under the Health Insurance Portability and Accountability Act (HIPAA). For AI architects, HIPAA is not a compliance checkbox — it is a set of technical constraints that directly shape system design: what data can be sent to which vendors, how inference logs are stored, what metadata can appear in observability dashboards, whether a vector database index constitutes a PHI data store, and what de-identification methods are required before data can be used for AI model training or evaluation. This chapter translates HIPAA's technical safeguard requirements into actionable architectural patterns for clinical AI systems.

Learning Objectives

After reading this chapter, you will be able to:

Identify which components of a clinical AI system constitute Protected Health Information (PHI) under HIPAA and which fall outside the definition
Design a clinical AI inference pipeline that processes PHI in compliance with the Minimum Necessary standard and HIPAA Security Rule technical safeguards
Evaluate a Business Associate Agreement for the specific provisions relevant to AI API processing of PHI
Apply the Safe Harbor and Expert Determination de-identification methods to prepare clinical data for AI training and evaluation datasets

Business Problem

Healthcare organizations building clinical AI systems face a problem that general enterprise AI architects do not: the primary fuel for AI — patient data — is subject to a federal privacy and security framework that applies to every system component that touches it. An organization that builds a clinical AI system without understanding which components are PHI data stores, which vendors require BAAs, and which de-identification methods are legally sufficient creates regulatory liability that can result in penalties up to $1.9 million per violation category per year.

The specific challenges that arise when HIPAA meets AI:

LLM inference API calls contain patient context in the request payload — this is PHI transmission to a vendor
Vector database indexes built from clinical notes contain PHI in the vector space — the index itself is a PHI data store
Observability traces that capture prompt/response payloads contain PHI
Golden evaluation datasets built from real patient encounters are PHI collections requiring the same protection as source records

Why This Technology Exists

HIPAA was enacted in 1996 to address two healthcare information problems: ensuring health insurance portability for workers changing jobs (the insurance portability component), and establishing a national standard for the protection of patient health information (the accountability component). The Security Rule (2003) added specific technical safeguard requirements for electronic PHI (ePHI).

Neither HIPAA nor the Security Rule was written with AI in mind. The Privacy Rule's definition of PHI, the Security Rule's safeguard categories, and the Breach Notification Rule's requirements were all established before large-scale ML systems existed. Applying them to AI systems requires careful interpretation — which HHS Office for Civil Rights (OCR) guidance, enforcement actions, and informal guidance have incrementally clarified. As of 2025, HHS has not issued comprehensive AI-specific HIPAA guidance, which means architects must reason from existing principles applied to AI contexts.

Conceptual Explanation

What Is PHI

Protected Health Information is individually identifiable health information held or transmitted by a Covered Entity or Business Associate. The definition has three components:

Health information: Relates to physical or mental health condition, healthcare services provided, or payment for healthcare services
Individually identifiable: Can identify the individual or there is reasonable basis to believe it could
Held or transmitted: By a Covered Entity or Business Associate in any medium

The 18 HIPAA identifiers that, when combined with health information, constitute PHI:

Names
Geographic data (smaller than state, except for city/state/ZIP with population > 20,000)
Dates (except year) related to the individual
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
URLs
IP addresses
Biometric identifiers (finger and voice prints)
Full face photos and comparable images
Any other unique identifying number, characteristic, or code

De-Identification

De-identified information is not PHI and is not subject to HIPAA. HIPAA recognizes two de-identification methods:

Safe Harbor: Remove all 18 identifiers AND the covered entity has no actual knowledge that the remaining information could identify the individual. Safe Harbor is deterministic — if all 18 identifiers are removed, the information is de-identified.

Expert Determination: A qualified statistical or scientific expert applies generally accepted statistical and scientific principles to determine that the risk of identification is very small. The expert's methods and results are documented.

For AI training and evaluation datasets, Safe Harbor is the default method because it does not require a statistical expert and is auditable — you can verify that all 18 identifiers are absent.

Core Architecture

graph TD subgraph "PHI Data Sources" E1["EHR\n(Epic FHIR R4)"] E2["Laboratory\nResults"] E3["Radiology\nReports"] E4["Clinical Notes\nUnstructured"] end subgraph "PHI Processing Boundary" GW["AI Gateway\n(PHI-Authorized)"] PR["Prompt Builder\nMinimum Necessary"] AU["Audit Logger\nHashed Identifiers Only"] CF["Cache Manager\nEncrypted PHI Cache"] end subgraph "External Vendors — BAA Required" LLM["LLM API\nAnthropic / Azure OAI"] VS["Vector Database\nClinical Index"] end subgraph "De-identified Zone" DI["De-identification\nPipeline"] ED["Evaluation\nDatasets"] TR["Training\nDatasets"] OB["Observability\nTraces (no PHI)"] end subgraph "Security Controls" TLS["TLS 1.2+\nAll transmissions"] AES["AES-256\nAt rest encryption"] RBAC["RBAC\nMinimum necessary access"] VPC["VPC / Private Link\nNetwork isolation"] end E1 & E2 & E3 & E4 --> GW GW --> PR --> LLM GW --> VS GW --> AU GW --> CF GW --> DI --> ED & TR AU --> OB TLS --> GW & LLM AES --> VS & CF RBAC --> GW & AU VPC --> GW & LLM

Enterprise Considerations

PHI Data Classification: Healthcare organizations must classify PHI within their AI systems. Not all PHI carries the same sensitivity: a patient's name combined with a diagnosis code is more sensitive than a patient's age-range combined with a procedure code. Many organizations implement tiered PHI classification that maps to access control and encryption requirements.

Incident Response for AI Systems: The HIPAA Breach Notification Rule requires a 60-day notification window for breaches affecting PHI. AI systems that process PHI must be included in the organization's incident response plan: who is notified when an AI API call log is exposed, who determines whether the exposure constitutes a reportable breach, and who contacts HHS OCR.

State Law Preemption: HIPAA sets a federal floor; states may enact stricter privacy laws. California's CMIA (Confidentiality of Medical Information Act), New York's SHIELD Act, and several other state frameworks impose requirements beyond HIPAA. Healthcare organizations operating in multiple states must comply with the most restrictive applicable state law.

Research Exemption and IRB: If clinical AI systems will use patient data for research (not just treatment or operations), the research use may require Institutional Review Board (IRB) approval and a separate waiver of authorization under the HIPAA Privacy Rule.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

The Reference Healthcare Organization implements HIPAA-compliant controls for its discharge summary AI system:

PHI flow through the system:

Hospitalist opens the discharge summary AI tool in the EHR
The EHR passes the patient encounter context (diagnosis, procedures, medications) via a FHIR R4 API call authenticated with the hospitalist's SMART on FHIR credentials
The AI gateway receives the FHIR bundle, extracts clinical facts using the Minimum Necessary standard (includes patient name for personalization; excludes SSN, insurance ID, and other identifiers not needed for discharge summary generation)
The gateway logs the request with the hashed patient identifier and sends the minimum-necessary prompt to the LLM API (signed BAA in place with the LLM vendor)
The LLM returns the draft discharge summary; the gateway returns it to the EHR
The hospitalist reviews, edits, and finalizes the summary — the final document is saved in the EHR (the legal medical record)
The AI draft that the hospitalist reviewed is not stored in any system other than the EHR

PHI data stores in this system:

The EHR (existing, pre-existing controls)
The audit log in the AI gateway (hashed patient identifiers — not raw PHI, but treated as sensitive)
No PHI stored in the LLM vendor's infrastructure (contractually confirmed in BAA)

Vendors requiring BAAs:

LLM API vendor (receives PHI in inference requests)
Cloud infrastructure provider (hosts the AI gateway)
No BAA required for the EHR vendor (existing as a Covered Entity's own system, not a Business Associate)

Common Mistakes

Not Recognizing the Vector Database as a PHI Data Store. If a vector database index is built from clinical notes, discharge summaries, or other patient records, the index is a PHI data store even though the underlying data is represented as float vectors. The information it encodes is derived from PHI, and in many cases the original text can be approximately reconstructed from the vectors. Apply the same access controls and BAA requirements as to the source data.

Observability Traces Containing Raw PHI. The most common HIPAA violation in clinical AI systems is an observability trace that captures the full LLM prompt payload — which contains patient clinical context, including identifiers. Audit logs and traces in clinical AI systems must either capture only metadata (hashed IDs, token counts, latency) or be subject to the same PHI access controls as the source EHR data.

Assuming Vendor SOC 2 Certification Means HIPAA Compliance. SOC 2 Type II and HIPAA are different standards. A vendor may have a clean SOC 2 report and still be non-compliant with HIPAA — the two frameworks have different control requirements. A signed BAA is the specific mechanism that creates a HIPAA compliance relationship with a vendor; a SOC 2 report does not.

Using Production PHI for AI Evaluation Datasets Without De-identification. Organizations that pull real patient records to test or evaluate their AI systems create PHI collections that require the same protections as the source medical records. All AI evaluation datasets built from real patient data must go through Safe Harbor or Expert Determination de-identification before use.

Best Practices

Treat every AI system component that touches patient-derived data as a PHI data store until confirmed otherwise
Apply the Minimum Necessary standard in prompt engineering — exclude identifiers from inference requests when the AI task does not require them
Log every clinical AI inference request with hashed patient identifiers, not raw PHI
Build evaluation and training datasets using Safe Harbor de-identification, documented with the de-identification method and the reviewer identity
Confirm in writing (via BAA) that LLM vendors do not use inference request content for model training and do not retain PHI beyond operational necessity
Include clinical AI systems in the organization's incident response plan with explicit breach notification procedures
Review observability tooling configuration to confirm that full prompt/response payloads are not captured in traces or logs without PHI access controls

Trade-offs

Design Choice	HIPAA Risk	Clinical Capability	Operational Complexity
Full patient context in prompt	High (more PHI in transit)	Highest (most context)	Low
Minimum necessary prompt (no identifiers)	Low	High (clinical facts only)	Medium (requires extraction)
De-identified proxy (synthetic context)	Lowest	Lower (may lose clinical nuance)	High
On-premise LLM (no external transmission)	Lowest (no PHI transmission)	Constrained (model quality)	Highest

Interview Questions

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?

Category: Architecture / Security Difficulty: Senior Role: AI Architect / Healthcare AI Engineer

Answer Framework:

The HIPAA risk is that the observability platform has become a PHI data store — every trace that captures a prompt containing patient clinical context is storing ePHI. The observability platform vendor is now a Business Associate if it receives, stores, or processes the trace data. If the organization does not have a BAA with the observability vendor, every trace write is a potential unauthorized disclosure of PHI.

The risks compound: observability platforms are designed for engineering access (developers, SREs, platform teams), not for clinical access control. Patient clinical data in traces is likely accessible to engineering staff who are not authorized workforce members under HIPAA to access patient records.

Three mitigations:

Payload scrubbing at the trace boundary: Configure the AI gateway to emit traces with metadata only (request ID, hashed patient ID, token counts, latency, model ID) and no prompt/response content. This is the architectural preferred solution.
PHI access control on the observability platform: Apply the same access controls to the observability platform as to the EHR — which typically means clinical-only access, eliminating the debugging utility for engineering staff.
Separate debug logging with PHI controls: Maintain a separate, PHI-controlled debug log that engineering staff can request access to through a formal process (equivalent to break-glass access), rather than including PHI in the general observability stream.

Key Points to Hit:

Observability platform with PHI payloads = PHI data store requiring BAA and access controls
Engineering access to traces ≠ authorized HIPAA workforce access to patient records
Solution is payload scrubbing at the emission point, not access control on the downstream platform
This is a common design error in clinical AI systems — know it and catch it

Key Takeaways

HIPAA applies to every component that processes, stores, or transmits PHI — including LLM API inference requests, vector database indexes, audit logs, and observability traces
The Minimum Necessary standard requires limiting PHI in AI inference requests to what is actually needed for the clinical AI task — many use cases can operate without patient identifiers in the prompt
Every AI vendor that receives PHI in inference requests is a Business Associate requiring a signed BAA — this includes LLM API providers and vector database vendors
Observability traces that capture full prompt/response payloads are PHI data stores; configure tracing to emit metadata and hashed identifiers only
All AI evaluation and training datasets built from real patient data must be de-identified using Safe Harbor or Expert Determination before use
Hashed patient identifiers (SHA-256 of MRN + system salt) in audit logs satisfy the HIPAA audit control requirement without creating a PHI inventory in the log store

HIPAA and AI#

Executive Summary#

Learning Objectives#

Business Problem#

Why This Technology Exists#

Conceptual Explanation#

What Is PHI#

De-Identification#

Core Architecture#

Enterprise Considerations#

Healthcare Example#

Common Mistakes#

Best Practices#

Trade-offs#

Interview Questions#

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?#

Key Takeaways#

Further Reading#

HIPAA and AI

Executive Summary

Learning Objectives

Business Problem

Why This Technology Exists

Conceptual Explanation

What Is PHI

De-Identification

Core Architecture

Enterprise Considerations

Healthcare Example

Common Mistakes

Best Practices

Trade-offs

Interview Questions

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?

Key Takeaways

Further Reading