Audit and Logging for AI Systems
Executive Summary
Audit logging for AI systems serves three distinct purposes: HIPAA compliance (PHI access audit trail), security incident detection (anomaly detection on AI behavior), and AI quality assurance (tracking model outputs over time for regression detection). Each purpose has different requirements — HIPAA audit logs must be immutable and patient-attributed; security logs must be real-time and queryable; quality logs must include AI-specific metadata (model version, retrieved documents, confidence signals) that traditional observability systems do not capture. This chapter covers the audit and logging architecture for clinical AI systems across all three purposes.
Learning Objectives
- Design a multi-tier logging architecture that separates PHI audit logs from operational logs
- Implement HIPAA-compliant PHI access logging for AI systems without logging PHI content
- Configure SIEM integration for real-time detection of AI-specific anomalies
- Track AI model output quality metrics over time for regression detection
Architecture
HIPAA Audit Log Implementation
import json
import hashlib
from datetime import datetime
from dataclasses import dataclass, asdict
from enum import Enum
from typing import Optional
import boto3 # For AWS CloudTrail integration
# Educational example — not for clinical use
class AuditEventType(Enum):
PHI_READ = "phi_read"
PHI_WRITE = "phi_write"
AI_INFERENCE_WITH_PHI = "ai_inference_with_phi"
PHI_ACCESS_DENIED = "phi_access_denied"
AI_INJECTION_ATTEMPT = "ai_injection_attempt"
AI_PHI_OUTPUT_DETECTED = "ai_phi_output_detected"
@dataclass
class HIPAAAuditLogEntry:
"""
HIPAA-compliant audit log entry for AI system PHI access.
Contains ONLY metadata — never PHI content, prompts, or AI responses.
The patient_id IS included (required for HIPAA audit trail).
The request body is NOT included (would contain PHI content).
"""
event_id: str
event_type: str # AuditEventType value
timestamp: str # ISO 8601 UTC
user_id: str # Authenticated clinician or service account ID
user_role: str # Clinical role at time of access
patient_id: Optional[str] # PHI patient ID (required for audit)
resource_type: str # FHIR resource type or "ai_prompt"
resource_id: Optional[str] # Specific resource ID if applicable
action: str # "read" | "write" | "infer" | "deny"
use_case: str # Clinical use case label
source_ip: str # Requestor IP
service_component: str # Which AI component performed the access
request_id: str # Correlates across AI platform logs
ai_model_version: Optional[str] # Model used for inference events
phi_fields_accessed: Optional[list[str]] # List of FHIR field names (not values)
# EXPLICITLY EXCLUDED (must never appear in audit log):
# - Prompt content (contains PHI)
# - AI response content (may contain PHI)
# - PHI field values
# - Clinical note text
def compute_integrity_hash(self) -> str:
"""
Compute SHA-256 hash of log entry for tamper detection.
Store the hash separately — compare on audit to detect modifications.
"""
entry_bytes = json.dumps(asdict(self), sort_keys=True).encode()
return hashlib.sha256(entry_bytes).hexdigest()
def to_cloudwatch_event(self) -> dict:
"""Format for CloudWatch Logs Insights / CloudTrail."""
return {
"eventType": self.event_type,
"eventTime": self.timestamp,
"userIdentity": {
"userId": self.user_id,
"role": self.user_role,
},
"requestParameters": {
"patientId": self.patient_id,
"resourceType": self.resource_type,
"action": self.action,
"useCase": self.use_case,
},
"sourceIPAddress": self.source_ip,
"userAgent": self.service_component,
"requestID": self.request_id,
"aiContext": {
"modelVersion": self.ai_model_version,
"phiFieldsAccessed": self.phi_fields_accessed,
},
}
class HIPAAAuditLogger:
"""
HIPAA audit logger for the AI platform.
Writes to an immutable, encrypted log store (CloudWatch with S3 archival).
Audit logs are retained for 6 years per HIPAA retention requirements.
Educational example — not for clinical use.
"""
def __init__(self, log_group_name: str, region: str = "us-east-1"):
self.cloudwatch = boto3.client("logs", region_name=region)
self.log_group_name = log_group_name
self.log_stream_name = f"ai-phi-audit-{datetime.utcnow().strftime('%Y-%m-%d')}"
async def log_phi_access(self, entry: HIPAAAuditLogEntry) -> None:
"""Log a PHI access event to the immutable audit trail."""
event_dict = entry.to_cloudwatch_event()
event_dict["integrityHash"] = entry.compute_integrity_hash()
self.cloudwatch.put_log_events(
logGroupName=self.log_group_name,
logStreamName=self.log_stream_name,
logEvents=[{
"timestamp": int(datetime.utcnow().timestamp() * 1000),
"message": json.dumps(event_dict),
}]
)Security Anomaly Detection
# SIEM detection rules for AI-specific anomalies
# Format: Splunk SPL queries (illustrative — adapt to your SIEM)
# Educational example — not for clinical use
SIEM_DETECTION_RULES = {
"injection_attempt_spike": {
"description": "Spike in injection attempt detections",
"query": """
index=ai_security sourcetype=ai_gateway_events
event_type=ai_injection_attempt
| timechart span=5m count as injection_attempts
| where injection_attempts > 10
""",
"alert_threshold": "10 injection attempts in 5 minutes",
"severity": "HIGH",
"response": "Investigate source IP; consider temporary block via AI gateway WAF rule",
},
"unusual_phi_access_volume": {
"description": "User accessing unusually high volume of patient records via AI",
"query": """
index=ai_audit sourcetype=hipaa_audit event_type=phi_read
| stats count as phi_access_count by user_id, span=1h
| where phi_access_count > 200
""",
"alert_threshold": "200 PHI access events per user per hour",
"severity": "HIGH",
"response": "Review user_id for potential data snooping; escalate to Privacy Officer",
},
"ai_phi_output_detected": {
"description": "AI output validation flagged PHI in AI response",
"query": """
index=ai_audit sourcetype=hipaa_audit event_type=ai_phi_output_detected
| stats count as phi_output_count by ai_model_version, use_case
""",
"alert_threshold": ">0 events",
"severity": "CRITICAL",
"response": "Immediate review; potential PHI leakage or context mixing. Disable affected AI feature pending investigation.",
},
"after_hours_phi_access": {
"description": "PHI accessed via AI outside clinical hours (potential unauthorized access)",
"query": """
index=ai_audit sourcetype=hipaa_audit event_type=phi_read
| eval hour=strftime(_time, "%H")
| where (hour >= "22" OR hour < "06")
| stats count by user_id
| where count > 5
""",
"alert_threshold": "5+ PHI accesses outside 06:00-22:00 by the same user",
"severity": "MEDIUM",
"response": "Review for legitimate emergency access vs. unauthorized access",
},
}AI Quality Logging
Quality logging for AI systems captures model-specific metadata that enables detecting output quality regressions over time.
from dataclasses import dataclass, field
from typing import Optional
# Educational example — not for clinical use
@dataclass
class AIQualityLogEntry:
"""
AI model quality log entry.
Contains AI-specific metadata for quality monitoring.
Does NOT contain PHI — prompt and response content are excluded.
Includes retrieved document IDs (not content) for traceability.
"""
request_id: str
timestamp: str
model_id: str
model_version: str
use_case: str
prompt_tokens: int
completion_tokens: int
latency_ms: float
# Retrieval quality signals
retrieved_document_ids: list[str] # Document IDs, not content
top_retrieval_score: Optional[float] # Cosine similarity of top chunk
cache_hit: bool
# Output quality signals (no content — just signals)
finish_reason: str # "stop" | "max_tokens" | "content_filter"
content_filtered: bool
disclaimer_present: bool # Whether AI included required clinical disclaimer
citation_count: int # How many citations were returned
# User feedback (if available)
clinician_thumbs_up: Optional[bool] = None
clinician_flagged: Optional[bool] = None
clinician_flag_reason: Optional[str] = None
def compute_ai_quality_metrics(
quality_logs: list[AIQualityLogEntry],
time_window_hours: int = 24,
) -> dict:
"""
Aggregate AI quality metrics from log entries.
Use as input to quality dashboards and regression alerting.
Educational example — not for clinical use.
"""
if not quality_logs:
return {}
return {
"request_count": len(quality_logs),
"avg_latency_ms": sum(l.latency_ms for l in quality_logs) / len(quality_logs),
"p95_latency_ms": sorted(l.latency_ms for l in quality_logs)[int(0.95 * len(quality_logs))],
"cache_hit_rate": sum(1 for l in quality_logs if l.cache_hit) / len(quality_logs),
"content_filter_rate": sum(1 for l in quality_logs if l.content_filtered) / len(quality_logs),
"disclaimer_present_rate": sum(1 for l in quality_logs if l.disclaimer_present) / len(quality_logs),
"avg_citation_count": sum(l.citation_count for l in quality_logs) / len(quality_logs),
"avg_top_retrieval_score": (
sum(l.top_retrieval_score for l in quality_logs if l.top_retrieval_score)
/ sum(1 for l in quality_logs if l.top_retrieval_score)
) if any(l.top_retrieval_score for l in quality_logs) else None,
"clinician_positive_rate": (
sum(1 for l in quality_logs if l.clinician_thumbs_up)
/ sum(1 for l in quality_logs if l.clinician_thumbs_up is not None)
) if any(l.clinician_thumbs_up is not None for l in quality_logs) else None,
}Enterprise Considerations
Log retention: HIPAA requires audit log retention for 6 years. Ensure the audit log store has a lifecycle policy that prevents deletion before the retention period expires and moves logs to low-cost archival storage (S3 Glacier, Azure Archive) after 90 days.
Audit log access control: The HIPAA audit log must be accessible to compliance and security staff but protected from modification by anyone, including the AI platform team. Implement write-once logging (S3 Object Lock with Compliance mode, Azure Immutable Blob Storage) and restrict read access to authorized audit personnel.
Operational log separation: PHI audit logs and operational logs (latency, throughput, error rates) should be in separate log stores with different access controls. Operations teams need operational logs for debugging but should not have access to PHI-containing audit logs.
Common Mistakes
1. Logging request and response content for PHI-handling AI features. Even "debug" logs that include prompt content contain PHI for clinical AI features. Every log store that receives AI request/response content becomes a HIPAA data store with full compliance implications.
2. Not setting log retention to 6 years. Default CloudWatch log retention is configurable; the default is often 30–90 days. HIPAA requires 6-year retention for audit logs. Set explicit retention policies on creation and audit them quarterly.
3. Mixing PHI audit logs with operational logs. Operational logs have different retention requirements, different access control requirements, and different compliance implications. Keep them separate.
4. No integrity checking on audit logs. An attacker who can modify the audit log can cover their tracks. Store integrity hashes (SHA-256 of each log entry) separately from the log entries, or use cloud-native tamper-evident logging (AWS CloudTrail with log file integrity validation).
Key Takeaways
- HIPAA audit logs must include patientid and userid but must never include PHI content, prompt text, or AI response text
- PHI audit logs must be immutable (write-once), encrypted, and retained for 6 years
- Operational logs and HIPAA audit logs must be in separate stores with separate access controls
- AI quality logs capture model-specific metadata (model version, retrieval scores, citation counts, clinician feedback) that traditional observability systems do not
- Integrity hashing of audit log entries enables detection of tampered audit trails
Further Reading
- HIPAA Compliance — HIPAA audit controls requirement context
- AI Security Fundamentals — Threat model that drives detection rules
- Observability and Monitoring — Operational observability (distinct from audit logging)