HIPAA and AI

Conceptual Explanation

What Is PHI

Protected Health Information is individually identifiable health information held or transmitted by a Covered Entity or Business Associate. The definition has three components:

Health information: Relates to physical or mental health condition, healthcare services provided, or payment for healthcare services
Individually identifiable: Can identify the individual or there is reasonable basis to believe it could
Held or transmitted: By a Covered Entity or Business Associate in any medium

The 18 HIPAA identifiers that, when combined with health information, constitute PHI:

Names
Geographic data (smaller than state, except for city/state/ZIP with population > 20,000)
Dates (except year) related to the individual
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
URLs
IP addresses
Biometric identifiers (finger and voice prints)
Full face photos and comparable images
Any other unique identifying number, characteristic, or code

De-Identification

De-identified information is not PHI and is not subject to HIPAA. HIPAA recognizes two de-identification methods:

Safe Harbor: Remove all 18 identifiers AND the covered entity has no actual knowledge that the remaining information could identify the individual. Safe Harbor is deterministic — if all 18 identifiers are removed, the information is de-identified.

Expert Determination: A qualified statistical or scientific expert applies generally accepted statistical and scientific principles to determine that the risk of identification is very small. The expert's methods and results are documented.

For AI training and evaluation datasets, Safe Harbor is the default method because it does not require a statistical expert and is auditable — you can verify that all 18 identifiers are absent.

Core Architecture

graph TD subgraph "PHI Data Sources" E1["EHR\n(Epic FHIR R4)"] E2["Laboratory\nResults"] E3["Radiology\nReports"] E4["Clinical Notes\nUnstructured"] end subgraph "PHI Processing Boundary" GW["AI Gateway\n(PHI-Authorized)"] PR["Prompt Builder\nMinimum Necessary"] AU["Audit Logger\nHashed Identifiers Only"] CF["Cache Manager\nEncrypted PHI Cache"] end subgraph "External Vendors — BAA Required" LLM["LLM API\nAnthropic / Azure OAI"] VS["Vector Database\nClinical Index"] end subgraph "De-identified Zone" DI["De-identification\nPipeline"] ED["Evaluation\nDatasets"] TR["Training\nDatasets"] OB["Observability\nTraces (no PHI)"] end subgraph "Security Controls" TLS["TLS 1.2+\nAll transmissions"] AES["AES-256\nAt rest encryption"] RBAC["RBAC\nMinimum necessary access"] VPC["VPC / Private Link\nNetwork isolation"] end E1 & E2 & E3 & E4 --> GW GW --> PR --> LLM GW --> VS GW --> AU GW --> CF GW --> DI --> ED & TR AU --> OB TLS --> GW & LLM AES --> VS & CF RBAC --> GW & AU VPC --> GW & LLM

Common Mistakes

Not Recognizing the Vector Database as a PHI Data Store. If a vector database index is built from clinical notes, discharge summaries, or other patient records, the index is a PHI data store even though the underlying data is represented as float vectors. The information it encodes is derived from PHI, and in many cases the original text can be approximately reconstructed from the vectors. Apply the same access controls and BAA requirements as to the source data.

Observability Traces Containing Raw PHI. The most common HIPAA violation in clinical AI systems is an observability trace that captures the full LLM prompt payload — which contains patient clinical context, including identifiers. Audit logs and traces in clinical AI systems must either capture only metadata (hashed IDs, token counts, latency) or be subject to the same PHI access controls as the source EHR data.

Assuming Vendor SOC 2 Certification Means HIPAA Compliance. SOC 2 Type II and HIPAA are different standards. A vendor may have a clean SOC 2 report and still be non-compliant with HIPAA — the two frameworks have different control requirements. A signed BAA is the specific mechanism that creates a HIPAA compliance relationship with a vendor; a SOC 2 report does not.

Using Production PHI for AI Evaluation Datasets Without De-identification. Organizations that pull real patient records to test or evaluate their AI systems create PHI collections that require the same protections as the source medical records. All AI evaluation datasets built from real patient data must go through Safe Harbor or Expert Determination de-identification before use.

Best Practices

Treat every AI system component that touches patient-derived data as a PHI data store until confirmed otherwise
Apply the Minimum Necessary standard in prompt engineering — exclude identifiers from inference requests when the AI task does not require them
Log every clinical AI inference request with hashed patient identifiers, not raw PHI
Build evaluation and training datasets using Safe Harbor de-identification, documented with the de-identification method and the reviewer identity
Confirm in writing (via BAA) that LLM vendors do not use inference request content for model training and do not retain PHI beyond operational necessity
Include clinical AI systems in the organization's incident response plan with explicit breach notification procedures
Review observability tooling configuration to confirm that full prompt/response payloads are not captured in traces or logs without PHI access controls

Trade-offs

Design Choice	HIPAA Risk	Clinical Capability	Operational Complexity
Full patient context in prompt	High (more PHI in transit)	Highest (most context)	Low
Minimum necessary prompt (no identifiers)	Low	High (clinical facts only)	Medium (requires extraction)
De-identified proxy (synthetic context)	Lowest	Lower (may lose clinical nuance)	High
On-premise LLM (no external transmission)	Lowest (no PHI transmission)	Constrained (model quality)	Highest

Interview Questions

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?

Category: Architecture / Security Difficulty: Senior Role: AI Architect / Healthcare AI Engineer

Answer Framework:

The HIPAA risk is that the observability platform has become a PHI data store — every trace that captures a prompt containing patient clinical context is storing ePHI. The observability platform vendor is now a Business Associate if it receives, stores, or processes the trace data. If the organization does not have a BAA with the observability vendor, every trace write is a potential unauthorized disclosure of PHI.

The risks compound: observability platforms are designed for engineering access (developers, SREs, platform teams), not for clinical access control. Patient clinical data in traces is likely accessible to engineering staff who are not authorized workforce members under HIPAA to access patient records.

Three mitigations:

Payload scrubbing at the trace boundary: Configure the AI gateway to emit traces with metadata only (request ID, hashed patient ID, token counts, latency, model ID) and no prompt/response content. This is the architectural preferred solution.
PHI access control on the observability platform: Apply the same access controls to the observability platform as to the EHR — which typically means clinical-only access, eliminating the debugging utility for engineering staff.
Separate debug logging with PHI controls: Maintain a separate, PHI-controlled debug log that engineering staff can request access to through a formal process (equivalent to break-glass access), rather than including PHI in the general observability stream.

Key Points to Hit:

Observability platform with PHI payloads = PHI data store requiring BAA and access controls
Engineering access to traces ≠ authorized HIPAA workforce access to patient records
Solution is payload scrubbing at the emission point, not access control on the downstream platform
This is a common design error in clinical AI systems — know it and catch it

Key Takeaways

HIPAA applies to every component that processes, stores, or transmits PHI — including LLM API inference requests, vector database indexes, audit logs, and observability traces
The Minimum Necessary standard requires limiting PHI in AI inference requests to what is actually needed for the clinical AI task — many use cases can operate without patient identifiers in the prompt
Every AI vendor that receives PHI in inference requests is a Business Associate requiring a signed BAA — this includes LLM API providers and vector database vendors
Observability traces that capture full prompt/response payloads are PHI data stores; configure tracing to emit metadata and hashed identifiers only
All AI evaluation and training datasets built from real patient data must be de-identified using Safe Harbor or Expert Determination before use
Hashed patient identifiers (SHA-256 of MRN + system salt) in audit logs satisfy the HIPAA audit control requirement without creating a PHI inventory in the log store

HIPAA and AI#

Conceptual Explanation#

What Is PHI#

De-Identification#

Core Architecture#

Common Mistakes#

Best Practices#

Trade-offs#

Interview Questions#

Q: You are reviewing the architecture of a clinical AI system. The observability platform captures full LLM prompt/response payloads in its traces for debugging. What is the HIPAA risk and what is the mitigation?#

Key Takeaways#