HIPAA and AI
Executive Summary
Every AI system that processes, stores, transmits, or derives information from patient data in a healthcare context operates under the Health Insurance Portability and Accountability Act (HIPAA). For AI architects, HIPAA is not a compliance checkbox — it is a set of technical constraints that directly shape system design: what data can be sent to which vendors, how inference logs are stored, what metadata can appear in observability dashboards, whether a vector database index constitutes a PHI data store, and what de-identification methods are required before data can be used for AI model training or evaluation. This chapter translates HIPAA's technical safeguard requirements into actionable architectural patterns for clinical AI systems.
Learning Objectives
After reading this chapter, you will be able to:
- Identify which components of a clinical AI system constitute Protected Health Information (PHI) under HIPAA and which fall outside the definition
- Design a clinical AI inference pipeline that processes PHI in compliance with the Minimum Necessary standard and HIPAA Security Rule technical safeguards
- Evaluate a Business Associate Agreement for the specific provisions relevant to AI API processing of PHI
- Apply the Safe Harbor and Expert Determination de-identification methods to prepare clinical data for AI training and evaluation datasets
Business Problem
Healthcare organizations building clinical AI systems face a problem that general enterprise AI architects do not: the primary fuel for AI — patient data — is subject to a federal privacy and security framework that applies to every system component that touches it. An organization that builds a clinical AI system without understanding which components are PHI data stores, which vendors require BAAs, and which de-identification methods are legally sufficient creates regulatory liability that can result in penalties up to $1.9 million per violation category per year.
The specific challenges that arise when HIPAA meets AI:
- LLM inference API calls contain patient context in the request payload — this is PHI transmission to a vendor
- Vector database indexes built from clinical notes contain PHI in the vector space — the index itself is a PHI data store
- Observability traces that capture prompt/response payloads contain PHI
- Golden evaluation datasets built from real patient encounters are PHI collections requiring the same protection as source records
Why This Technology Exists
HIPAA was enacted in 1996 to address two healthcare information problems: ensuring health insurance portability for workers changing jobs (the insurance portability component), and establishing a national standard for the protection of patient health information (the accountability component). The Security Rule (2003) added specific technical safeguard requirements for electronic PHI (ePHI).
Neither HIPAA nor the Security Rule was written with AI in mind. The Privacy Rule's definition of PHI, the Security Rule's safeguard categories, and the Breach Notification Rule's requirements were all established before large-scale ML systems existed. Applying them to AI systems requires careful interpretation — which HHS Office for Civil Rights (OCR) guidance, enforcement actions, and informal guidance have incrementally clarified. As of 2025, HHS has not issued comprehensive AI-specific HIPAA guidance, which means architects must reason from existing principles applied to AI contexts.
Conceptual Explanation
What Is PHI
Protected Health Information is individually identifiable health information held or transmitted by a Covered Entity or Business Associate. The definition has three components:
- Health information: Relates to physical or mental health condition, healthcare services provided, or payment for healthcare services
- Individually identifiable: Can identify the individual or there is reasonable basis to believe it could
- Held or transmitted: By a Covered Entity or Business Associate in any medium
The 18 HIPAA identifiers that, when combined with health information, constitute PHI:
- Names
- Geographic data (smaller than state, except for city/state/ZIP with population > 20,000)
- Dates (except year) related to the individual
- Phone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- URLs
- IP addresses
- Biometric identifiers (finger and voice prints)
- Full face photos and comparable images
- Any other unique identifying number, characteristic, or code
De-Identification
De-identified information is not PHI and is not subject to HIPAA. HIPAA recognizes two de-identification methods:
Safe Harbor: Remove all 18 identifiers AND the covered entity has no actual knowledge that the remaining information could identify the individual. Safe Harbor is deterministic — if all 18 identifiers are removed, the information is de-identified.
Expert Determination: A qualified statistical or scientific expert applies generally accepted statistical and scientific principles to determine that the risk of identification is very small. The expert's methods and results are documented.
For AI training and evaluation datasets, Safe Harbor is the default method because it does not require a statistical expert and is auditable — you can verify that all 18 identifiers are absent.
Core Architecture
Components
The Minimum Necessary Standard
The HIPAA Minimum Necessary standard requires that disclosures of PHI be limited to the information reasonably necessary to accomplish the intended purpose. For clinical AI, this translates to: do not include patient identifiers in the LLM prompt context unless the identifiers are required for the AI's function.
For a discharge summary generation AI, the patient's name and medical record number may be required (to personalize the output and enable it to be placed in the correct record). For a clinical knowledge RAG query (retrieving drug interaction information for a medication), the patient's name is not required — only the medication names and relevant clinical context.
A prompt engineering discipline that applies Minimum Necessary extracts only the clinical facts necessary for the AI task, strips or hashes identifiers where the task does not require them, and documents the Minimum Necessary determination for each use case.
Business Associate Agreements for AI Vendors
Any vendor that processes PHI on behalf of a Covered Entity is a Business Associate and requires a signed BAA. For AI systems, BAA-required vendors include:
- LLM API providers (when inference requests contain PHI)
- Vector database vendors (when the index contains PHI)
- Cloud infrastructure providers (when PHI is stored on their infrastructure)
- Observability vendors (when traces capture PHI payloads)
Vendors that do not receive PHI do not require BAAs: an AI gateway that de-identifies prompts before forwarding them to the LLM API shields the LLM vendor from PHI — but only if the de-identification happens before the vendor receives any data. If the gateway logs full prompts before de-identification, the logging service becomes a BAA-required vendor.
Audit Logging Requirements
The HIPAA Security Rule requires audit controls that record and examine activity in information systems containing ePHI. For AI systems, this means:
- Every AI inference request that involves PHI must be logged
- Logs must be retained per the organization's record retention policy (typically 6 years)
- Logs must include: timestamp, user/system identity, data accessed, action taken
- Logs must NOT include raw PHI beyond what is required for audit purposes — hashed patient identifiers (SHA-256 of the medical record number + a system-level salt) satisfy the audit requirement without creating a PHI inventory in the log store
Breach Notification
The HIPAA Breach Notification Rule requires Covered Entities to notify affected individuals (within 60 days), HHS, and the media (for breaches affecting > 500 individuals in a state) when unsecured PHI is breached. For AI systems, breach scenarios include:
- LLM vendor data breach exposing stored inference request content (which is why inference data retention by AI vendors should be minimized by contract)
- Misconfigured vector database index exposed to unauthorized access
- Audit logs containing PHI exposed in a logging platform breach
Implementation Patterns
HIPAA-Compliant Inference Request Pattern
# Educational Example — HIPAA-Compliant Clinical AI Inference Request
# Illustrates PHI handling patterns for LLM API calls in a HIPAA context
# Not a production security implementation — consult your security team
import hashlib
import json
from dataclasses import dataclass
from typing import Optional
import anthropic
@dataclass
class AuditRecord:
"""
Immutable audit record for a clinical AI inference request.
Contains no raw PHI — uses hashed patient identifier for auditability.
"""
request_id: str
timestamp_utc: str
hashed_patient_id: str # SHA-256(mrn + system_salt) — not reversible
use_case: str # e.g., "discharge_summary"
model_id: str
prompt_version: str
input_token_count: int
output_token_count: int
override_applied: bool = False
def hash_patient_identifier(medical_record_number: str, system_salt: str) -> str:
"""
One-way hash of patient MRN for audit logging.
System salt must be stored securely — it is not the MRN salt; it is
a system-level secret that prevents rainbow table attacks.
"""
combined = f"{medical_record_number}:{system_salt}"
return hashlib.sha256(combined.encode()).hexdigest()
def build_minimum_necessary_prompt(
clinical_context: dict,
use_case: str,
include_patient_identifier: bool = False,
) -> str:
"""
Build an LLM prompt containing only the PHI necessary for the use case.
For most clinical AI use cases, the patient name/MRN are NOT required —
the AI needs clinical facts, not identity.
"""
required_fields = {
"discharge_summary": [
"admission_diagnosis", "procedures", "medications_at_discharge",
"follow_up_instructions", "diet_restrictions", "activity_restrictions",
],
"prior_authorization": [
"diagnosis_codes", "procedure_codes", "clinical_rationale",
"failed_alternative_treatments",
],
"clinical_coding": [
"encounter_diagnoses", "procedures_performed",
"discharge_disposition",
],
}
fields = required_fields.get(use_case, list(clinical_context.keys()))
# Only include identifier if the use case requires it (e.g., personalized summary)
if include_patient_identifier and "patient_name" in clinical_context:
fields = ["patient_name"] + fields
filtered_context = {
k: v for k, v in clinical_context.items() if k in fields
}
return json.dumps(filtered_context, indent=2)
def generate_clinical_output_hipaa_compliant(
patient_mrn: str,
clinical_context: dict,
use_case: str,
system_prompt: str,
model_id: str,
prompt_version: str,
system_salt: str,
anthropic_client: anthropic.Anthropic,
request_id: str,
timestamp_utc: str,
) -> tuple[str, AuditRecord]:
"""
Generate clinical AI output with HIPAA-compliant audit logging.
Returns the clinical output and the audit record.
The audit record contains no raw PHI.
"""
prompt_content = build_minimum_necessary_prompt(
clinical_context=clinical_context,
use_case=use_case,
include_patient_identifier=(use_case == "discharge_summary"),
)
response = anthropic_client.messages.create(
model=model_id,
max_tokens=2048,
system=system_prompt,
messages=[{"role": "user", "content": prompt_content}],
)
output_text = response.content[0].text
audit_record = AuditRecord(
request_id=request_id,
timestamp_utc=timestamp_utc,
hashed_patient_id=hash_patient_identifier(patient_mrn, system_salt),
use_case=use_case,
model_id=model_id,
prompt_version=prompt_version,
input_token_count=response.usage.input_tokens,
output_token_count=response.usage.output_tokens,
)
return output_text, audit_recordSafe Harbor De-identification for Evaluation Datasets
# Educational Example — Safe Harbor De-identification
# Removes all 18 HIPAA identifiers for AI training/evaluation dataset preparation
# Organizations should have a qualified privacy officer review de-identification implementations
import re
from dataclasses import dataclass
HIPAA_DATE_PATTERN = re.compile(
r"\b(0?[1-9]|1[0-2])[-/](0?[1-9]|[12]\d|3[01])[-/](\d{2,4})\b"
)
HIPAA_PHONE_PATTERN = re.compile(
r"\b(\+?1[-.\s]?)?(\(?\d{3}\)?[-.\s]?)(\d{3}[-.\s]?\d{4})\b"
)
HIPAA_MRN_PATTERN = re.compile(r"\bMRN[:\s#]*\d{5,10}\b", re.IGNORECASE)
HIPAA_SSN_PATTERN = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
@dataclass
class DeidentificationResult:
original_char_count: int
deidentified_text: str
identifiers_removed: dict[str, int] # identifier_type → count removed
def safe_harbor_deidentify(clinical_text: str) -> DeidentificationResult:
"""
Apply HIPAA Safe Harbor de-identification to clinical text.
Removes or replaces the 18 HIPAA identifier categories.
NOTE: This is an illustrative pattern. Production Safe Harbor de-identification
requires a comprehensive NLP pipeline (named entity recognition for names,
addresses, dates) reviewed by a qualified privacy officer.
"""
counts: dict[str, int] = {}
text = clinical_text
# Dates — replace with year only (Safe Harbor permits year)
dates = HIPAA_DATE_PATTERN.findall(text)
counts["dates"] = len(dates)
text = HIPAA_DATE_PATTERN.sub(lambda m: m.group(3), text)
# Phone numbers
phones = HIPAA_PHONE_PATTERN.findall(text)
counts["phone_numbers"] = len(phones)
text = HIPAA_PHONE_PATTERN.sub("[PHONE REDACTED]", text)
# Medical record numbers
mrns = HIPAA_MRN_PATTERN.findall(text)
counts["medical_record_numbers"] = len(mrns)
text = HIPAA_MRN_PATTERN.sub("[MRN REDACTED]", text)
# Social Security numbers
ssns = HIPAA_SSN_PATTERN.findall(text)
counts["ssn"] = len(ssns)
text = HIPAA_SSN_PATTERN.sub("[SSN REDACTED]", text)
# Note: Names, addresses, and other identifiers require NLP-based NER
# This pattern only handles structured identifiers detectable by regex
return DeidentificationResult(
original_char_count=len(clinical_text),
deidentified_text=text,
identifiers_removed=counts,
)Enterprise Considerations
PHI Data Classification: Healthcare organizations must classify PHI within their AI systems. Not all PHI carries the same sensitivity: a patient's name combined with a diagnosis code is more sensitive than a patient's age-range combined with a procedure code. Many organizations implement tiered PHI classification that maps to access control and encryption requirements.
Incident Response for AI Systems: The HIPAA Breach Notification Rule requires a 60-day notification window for breaches affecting PHI. AI systems that process PHI must be included in the organization's incident response plan: who is notified when an AI API call log is exposed, who determines whether the exposure constitutes a reportable breach, and who contacts HHS OCR.
State Law Preemption: HIPAA sets a federal floor; states may enact stricter privacy laws. California's CMIA (Confidentiality of Medical Information Act), New York's SHIELD Act, and several other state frameworks impose requirements beyond HIPAA. Healthcare organizations operating in multiple states must comply with the most restrictive applicable state law.
Research Exemption and IRB: If clinical AI systems will use patient data for research (not just treatment or operations), the research use may require Institutional Review Board (IRB) approval and a separate waiver of authorization under the HIPAA Privacy Rule.
Security Considerations
Technical Safeguards Required by HIPAA Security Rule:
- Access control: Unique user identification for all users who access ePHI; emergency access procedure; automatic logoff; encryption and decryption
- Audit controls: Hardware, software, and procedural mechanisms to record and examine access to ePHI
- Integrity controls: Authentication mechanisms to ensure ePHI is not altered or destroyed in an unauthorized manner
- Transmission security: Technical security measures to guard against unauthorized access during electronic transmission
For clinical AI systems:
- All transmission of PHI-containing inference requests must use TLS 1.2 or higher
- PHI stored in vector databases or caches must be encrypted at rest (AES-256 minimum)
- Access to AI systems processing PHI must require authentication — API keys are insufficient without additional access control
- Vector database indexes are ePHI data stores if they contain clinical data derived from patient records — they require the same encryption and access control as the source data
Healthcare Example
Educational Example — Illustrative Workflow. Not intended for clinical decision making.
The Reference Healthcare Organization implements HIPAA-compliant controls for its discharge summary AI system:
PHI flow through the system:
- Hospitalist opens the discharge summary AI tool in the EHR
- The EHR passes the patient encounter context (diagnosis, procedures, medications) via a FHIR R4 API call authenticated with the hospitalist's SMART on FHIR credentials
- The AI gateway receives the FHIR bundle, extracts clinical facts using the Minimum Necessary standard (includes patient name for personalization; excludes SSN, insurance ID, and other identifiers not needed for discharge summary generation)
- The gateway logs the request with the hashed patient identifier and sends the minimum-necessary prompt to the LLM API (signed BAA in place with the LLM vendor)
- The LLM returns the draft discharge summary; the gateway returns it to the EHR
- The hospitalist reviews, edits, and finalizes the summary — the final document is saved in the EHR (the legal medical record)
- The AI draft that the hospitalist reviewed is not stored in any system other than the EHR
PHI data stores in this system:
- The EHR (existing, pre-existing controls)
- The audit log in the AI gateway (hashed patient identifiers — not raw PHI, but treated as sensitive)
- No PHI stored in the LLM vendor's infrastructure (contractually confirmed in BAA)
Vendors requiring BAAs:
- LLM API vendor (receives PHI in inference requests)
- Cloud infrastructure provider (hosts the AI gateway)
- No BAA required for the EHR vendor (existing as a Covered Entity's own system, not a Business Associate)
Common Mistakes
Not Recognizing the Vector Database as a PHI Data Store. If a vector database index is built from clinical notes, discharge summaries, or other patient records, the index is a PHI data store even though the underlying data is represented as float vectors. The information it encodes is derived from PHI, and in many cases the original text can be approximately reconstructed from the vectors. Apply the same access controls and BAA requirements as to the source data.
Observability Traces Containing Raw PHI. The most common HIPAA violation in clinical AI systems is an observability trace that captures the full LLM prompt payload — which contains patient clinical context, including identifiers. Audit logs and traces in clinical AI systems must either capture only metadata (hashed IDs, token counts, latency) or be subject to the same PHI access controls as the source EHR data.
Assuming Vendor SOC 2 Certification Means HIPAA Compliance. SOC 2 Type II and HIPAA are different standards. A vendor may have a clean SOC 2 report and still be non-compliant with HIPAA — the two frameworks have different control requirements. A signed BAA is the specific mechanism that creates a HIPAA compliance relationship with a vendor; a SOC 2 report does not.
Using Production PHI for AI Evaluation Datasets Without De-identification. Organizations that pull real patient records to test or evaluate their AI systems create PHI collections that require the same protections as the source medical records. All AI evaluation datasets built from real patient data must go through Safe Harbor or Expert Determination de-identification before use.
Best Practices
- Treat every AI system component that touches patient-derived data as a PHI data store until confirmed otherwise
- Apply the Minimum Necessary standard in prompt engineering — exclude identifiers from inference requests when the AI task does not require them
- Log every clinical AI inference request with hashed patient identifiers, not raw PHI
- Build evaluation and training datasets using Safe Harbor de-identification, documented with the de-identification method and the reviewer identity
- Confirm in writing (via BAA) that LLM vendors do not use inference request content for model training and do not retain PHI beyond operational necessity
- Include clinical AI systems in the organization's incident response plan with explicit breach notification procedures
- Review observability tooling configuration to confirm that full prompt/response payloads are not captured in traces or logs without PHI access controls
Trade-offs
| Design Choice | HIPAA Risk | Clinical Capability | Operational Complexity |
|---|---|---|---|
| Full patient context in prompt | High (more PHI in transit) | Highest (most context) | Low |
| Minimum necessary prompt (no identifiers) | Low | High (clinical facts only) | Medium (requires extraction) |
| De-identified proxy (synthetic context) | Lowest | Lower (may lose clinical nuance) | High |
| On-premise LLM (no external transmission) | Lowest (no PHI transmission) | Constrained (model quality) | Highest |
Interview Questions
Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?
Category: Architecture / Security Difficulty: Senior Role: AI Architect / Healthcare AI Engineer
Answer Framework:
The HIPAA risk is that the observability platform has become a PHI data store — every trace that captures a prompt containing patient clinical context is storing ePHI. The observability platform vendor is now a Business Associate if it receives, stores, or processes the trace data. If the organization does not have a BAA with the observability vendor, every trace write is a potential unauthorized disclosure of PHI.
The risks compound: observability platforms are designed for engineering access (developers, SREs, platform teams), not for clinical access control. Patient clinical data in traces is likely accessible to engineering staff who are not authorized workforce members under HIPAA to access patient records.
Three mitigations:
- Payload scrubbing at the trace boundary: Configure the AI gateway to emit traces with metadata only (request ID, hashed patient ID, token counts, latency, model ID) and no prompt/response content. This is the architectural preferred solution.
- PHI access control on the observability platform: Apply the same access controls to the observability platform as to the EHR — which typically means clinical-only access, eliminating the debugging utility for engineering staff.
- Separate debug logging with PHI controls: Maintain a separate, PHI-controlled debug log that engineering staff can request access to through a formal process (equivalent to break-glass access), rather than including PHI in the general observability stream.
Key Points to Hit:
- Observability platform with PHI payloads = PHI data store requiring BAA and access controls
- Engineering access to traces ≠ authorized HIPAA workforce access to patient records
- Solution is payload scrubbing at the emission point, not access control on the downstream platform
- This is a common design error in clinical AI systems — know it and catch it
Key Takeaways
- HIPAA applies to every component that processes, stores, or transmits PHI — including LLM API inference requests, vector database indexes, audit logs, and observability traces
- The Minimum Necessary standard requires limiting PHI in AI inference requests to what is actually needed for the clinical AI task — many use cases can operate without patient identifiers in the prompt
- Every AI vendor that receives PHI in inference requests is a Business Associate requiring a signed BAA — this includes LLM API providers and vector database vendors
- Observability traces that capture full prompt/response payloads are PHI data stores; configure tracing to emit metadata and hashed identifiers only
- All AI evaluation and training datasets built from real patient data must be de-identified using Safe Harbor or Expert Determination before use
- Hashed patient identifiers (SHA-256 of MRN + system salt) in audit logs satisfy the HIPAA audit control requirement without creating a PHI inventory in the log store
Glossary
Protected Health Information (PHI): Individually identifiable health information held or transmitted by a Covered Entity or Business Associate. Electronic PHI (ePHI) is PHI in electronic form.
Business Associate (BA): A person or entity that performs functions or activities on behalf of a Covered Entity that involve the use or disclosure of PHI.
Business Associate Agreement (BAA): A HIPAA-required contract between a Covered Entity and a Business Associate governing PHI use and disclosure.
Safe Harbor De-identification: The HIPAA-approved method of de-identification that removes all 18 specific identifier categories from health information.
Minimum Necessary Standard: The HIPAA requirement to limit disclosure of PHI to the minimum necessary to accomplish the intended purpose.
HIPAA Security Rule: The HIPAA component that establishes standards for protecting the confidentiality, integrity, and availability of ePHI, including technical safeguards.
Further Reading
- Chapter 1: Healthcare AI Landscape — Regulatory framework overview including FDA SaMD
- Chapter 4: Clinical RAG — PHI considerations when indexing clinical knowledge bases
- Chapter 6: HMS Reference Architecture — HIPAA controls embedded in the complete HMS AI architecture
- HHS HIPAA Security Rule Guidance — Official Security Rule requirements
- HHS De-identification Guidance — Official Safe Harbor and Expert Determination guidance