Cloud AI Platforms
Executive Summary
AWS Bedrock, Azure OpenAI Service, and Google Vertex AI represent the three dominant managed AI inference platforms for enterprise deployment. Each provides access to frontier and open-weight language models through a secure, scalable, HIPAA-eligible API — but they differ meaningfully in model selection, enterprise integration depth, data residency controls, pricing structure, and operational model. For enterprise AI architects, the choice of cloud AI platform is rarely purely technical; it is typically constrained by the organization's existing cloud provider relationship, compliance requirements, and enterprise integration needs. This chapter provides the framework for evaluating these platforms objectively and documents the specific technical differences that matter for production enterprise AI deployments.
Learning Objectives
- Compare AWS Bedrock, Azure OpenAI Service, and Google Vertex AI across the dimensions relevant to enterprise selection
- Identify which platform characteristics are genuinely differentiating vs. which are marketing-level distinctions
- Design a cloud AI platform architecture that preserves optionality across providers through an AI gateway abstraction
- Evaluate HIPAA eligibility and data residency controls for each platform
- Apply the platform selection framework to a healthcare AI context
Business Problem
Selecting a cloud AI platform is a long-lived decision with significant switching costs. A healthcare organization that builds its clinical AI stack on Azure OpenAI — with Azure Private Link, Azure Active Directory integration, and Azure Monitor observability — faces significant re-engineering work to migrate to AWS Bedrock. Yet the model landscape evolves rapidly: today's best model for clinical documentation may not be on the same platform as next year's best model.
The business problem is: how do you select a cloud AI platform that serves current requirements without creating lock-in that constrains future optionality?
Why This Technology Exists
Before cloud AI platforms existed, enterprises accessing LLM APIs had two options: use the AI lab's native API (api.anthropic.com, api.openai.com) with its associated data handling terms, or self-host open-weight models with their associated operational burden. Neither option fit the enterprise requirement for native cloud integration, private networking, compliance certification, and consolidated billing.
Cloud AI platforms emerged to fill this gap: wrapping LLM API access in enterprise cloud packaging — VPC deployment, private endpoints, IAM integration, compliance certifications, SLA guarantees, and usage monitoring — while providing model access without requiring self-hosting.
Conceptual Explanation
All three platforms provide the same core service: managed LLM inference with enterprise packaging. The differentiation is in:
Model selection: Which models are available, when, and at what tier. Platform-exclusive models (GPT-4 on Azure OpenAI, Claude on Bedrock and Vertex) vs. multi-model catalogs (Bedrock offers Anthropic, Llama, Cohere, Titan, Stability).
Cloud integration depth: How tightly the AI service integrates with other services on the same cloud platform. Azure OpenAI integrates natively with Azure Cognitive Search, Azure Data Factory, and Azure Monitor. AWS Bedrock integrates with S3, Lambda, Bedrock Agents, and CloudWatch. GCP Vertex AI integrates with BigQuery, Dataflow, and Cloud Storage.
Data handling and residency: Where inference requests are processed, whether input data is used for model training, and what privacy controls apply.
Enterprise features: Private networking (VPC endpoints, Private Link), IAM integration, usage monitoring, content filtering, and SLA guarantees.
Core Architecture
Platform Architecture Comparison
Feature Matrix
| Dimension | AWS Bedrock | Azure OpenAI | Google Vertex AI |
|---|---|---|---|
| Key models | Claude (Anthropic), Llama, Cohere, Amazon Titan, Stability | GPT-4o, o1, o3 (OpenAI exclusive) | Gemini, Claude, PaLM, Llama |
| Model exclusivity | Claude requires Bedrock or Anthropic API | GPT-4/o1/o3 require Azure or OpenAI | Gemini requires GCP or Google AI |
| Private networking | VPC Endpoint (PrivateLink) | Azure Private Link | VPC Service Controls |
| Identity integration | IAM, AWS SSO | Azure AD / Entra ID | Google Cloud IAM |
| HIPAA BAA | Yes (included in AWS BAA) | Yes (Microsoft Azure BAA) | Yes (Google Cloud BAA) |
| Data residency | By AWS region | By Azure region | By GCP region |
| Training opt-out | Yes (Bedrock doesn't train on inputs by default) | Yes (Azure OpenAI) | Yes (Vertex AI) |
| Content filtering | Bedrock Guardrails | Azure Content Safety | Vertex AI Safety Filters |
| Observability | CloudWatch, CloudTrail | Azure Monitor, Azure Log Analytics | Cloud Monitoring, Cloud Logging |
| Batch inference | Bedrock Batch API | Azure Batch Deployments | Vertex AI Batch Prediction |
| Fine-tuning | Bedrock Fine-tuning (select models) | Azure OpenAI Fine-tuning | Vertex AI Fine-tuning |
| Agent/orchestration | Bedrock Agents | Azure AI Foundry / Prompt Flow | Vertex AI Agent Builder |
| SLA | 99.9% (varies by service) | 99.9% (varies by tier) | 99.9% (varies by service) |
Components
AWS Bedrock
Bedrock is Amazon's managed AI platform providing access to a multi-model catalog through a unified API. Its primary differentiator is model breadth — no other platform provides simultaneous access to Anthropic Claude, Meta Llama, Cohere, and Amazon's own Titan models.
import boto3
import json
def invoke_claude_via_bedrock(
prompt: str,
model_id: str = "anthropic.claude-sonnet-4-6-20250219-v1:0", # verify current IDs
max_tokens: int = 1024,
region: str = "us-east-1"
) -> str:
"""
Invoke Claude via AWS Bedrock using boto3.
Uses IAM-based authentication — no API key required.
Educational example.
"""
client = boto3.client(
"bedrock-runtime",
region_name=region
)
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [
{"role": "user", "content": prompt}
]
})
response = client.invoke_model(
modelId=model_id,
body=body,
contentType="application/json",
accept="application/json"
)
result = json.loads(response["body"].read())
return result["content"][0]["text"]
# Bedrock Guardrails — content filtering configuration
# Apply to sensitive use cases requiring output filtering
def create_bedrock_guardrail(client) -> str:
"""Create a content guardrail for clinical AI applications. Educational example."""
response = client.create_guardrail(
name="clinical-ai-guardrail",
description="Guardrail for clinical documentation AI",
topicPolicyConfig={
"topicsConfig": [
{
"name": "clinical-diagnosis",
"definition": "Providing specific medical diagnoses or treatment recommendations",
"type": "DENY"
}
]
},
sensitiveInformationPolicyConfig={
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
# Note: medical-specific PHI identifiers require custom regex
]
}
)
return response["guardrailId"]Azure OpenAI Service
Azure OpenAI provides access to OpenAI's frontier models (GPT-4o, o1, o3) through Azure's enterprise cloud platform. Its primary differentiators are: exclusive access to the GPT-4 and o1 model families in the enterprise cloud context, and native integration with the Microsoft enterprise ecosystem (Azure AD, Azure Monitor, Microsoft 365, Power Platform).
from openai import AzureOpenAI
def invoke_gpt4_via_azure(
prompt: str,
deployment_name: str = "gpt-4o", # deployment name configured in Azure
max_tokens: int = 1024,
) -> str:
"""
Invoke GPT-4 via Azure OpenAI Service.
Uses Azure AD / managed identity authentication for enterprise deployments.
Educational example.
"""
# Production: use DefaultAzureCredential for managed identity auth
# from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# token_provider = get_bearer_token_provider(DefaultAzureCredential(), "...")
client = AzureOpenAI(
azure_endpoint="https://your-resource.openai.azure.com/",
api_version="2024-02-01", # verify current API version
api_key="your-azure-api-key" # use managed identity in production
)
response = client.chat.completions.create(
model=deployment_name,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
temperature=0.1
)
return response.choices[0].message.contentGoogle Vertex AI
Vertex AI is Google's unified ML platform, integrating LLM inference with MLOps, data pipeline tooling, and the Google Cloud data stack (BigQuery, Dataflow, Cloud Storage). Its primary differentiator is native integration with BigQuery and the analytics data layer — making it the natural choice for AI applications that need to query large structured datasets alongside LLM inference.
import vertexai
from vertexai.generative_models import GenerativeModel
def invoke_gemini_via_vertex(
prompt: str,
model_name: str = "gemini-1.5-pro-002", # verify current model IDs
project_id: str = "your-gcp-project",
location: str = "us-central1",
) -> str:
"""
Invoke Gemini via Google Vertex AI.
Educational example.
"""
vertexai.init(project=project_id, location=location)
model = GenerativeModel(model_name)
response = model.generate_content(
prompt,
generation_config={
"max_output_tokens": 1024,
"temperature": 0.1,
}
)
return response.textImplementation Patterns
Platform Abstraction via AI Gateway
The most important architectural pattern for multi-cloud AI platform deployment is abstracting platform-specific APIs behind a common interface. This preserves optionality: switching from Azure OpenAI to AWS Bedrock requires changing gateway configuration, not rewriting application code.
# LiteLLM configuration demonstrating multi-platform abstraction
# All platforms accessible through a single API interface
# litellm_config.yaml (educational example)
LITELLM_MODEL_CONFIG = {
"model_list": [
# Claude via Anthropic API (direct)
{
"model_name": "claude-premium",
"litellm_params": {
"model": "anthropic/claude-opus-4-8", # verify current model ID
"api_key": "os.environ/ANTHROPIC_API_KEY"
}
},
# Claude via AWS Bedrock
{
"model_name": "claude-premium",
"litellm_params": {
"model": "bedrock/anthropic.claude-opus-4-8-...", # verify current ID
"aws_region_name": "us-east-1"
}
},
# GPT-4 via Azure OpenAI
{
"model_name": "gpt4-enterprise",
"litellm_params": {
"model": "azure/gpt-4o",
"api_base": "https://your-resource.openai.azure.com/",
"api_version": "2024-02-01",
"api_key": "os.environ/AZURE_OPENAI_API_KEY"
}
},
# Self-hosted via vLLM (fallback / cost reduction)
{
"model_name": "llm-standard",
"litellm_params": {
"model": "openai/llama-3.1-70b",
"api_base": "http://vllm-server:8000/v1",
"api_key": "os.environ/VLLM_API_KEY"
}
}
],
# Routing: use cheapest available model; fallback if rate-limited
"router_settings": {
"routing_strategy": "least-busy",
"fallbacks": [
{"claude-premium": ["gpt4-enterprise"]},
{"gpt4-enterprise": ["llm-standard"]}
]
}
}Enterprise Considerations
Existing cloud commitment: Organizations with large existing cloud commitments (AWS Enterprise Discount Program, Azure Enterprise Agreement, GCP Committed Use Discounts) should evaluate whether AI platform usage can be applied against those commitments. AWS Bedrock usage applies to AWS consolidated billing; Azure OpenAI applies to Azure Enterprise Agreements.
Regional availability: Not all models are available in all cloud regions. HIPAA-eligible regions may be a subset of all regions. Validate that the required models are available in the required regions before committing to a platform.
Quota limits: Cloud AI platforms impose default quota limits (tokens per minute, requests per minute) that may be insufficient for enterprise production traffic. Request quota increases before production launch — they can take weeks to process.
Data handling verification: Verify current data handling terms directly with each provider before finalizing a compliance assessment. BAA terms, training data opt-out, and data retention policies change and the status described here may not reflect current terms.
Security Considerations
Private networking for PHI: All three platforms support private networking (VPC Endpoint, Private Link, VPC Service Controls) that routes inference traffic through the cloud provider's private network rather than the public internet. For healthcare deployments where PHI is in the inference payload, private networking is a required architecture element.
Training data opt-out: All three platforms provide an opt-out from using inference inputs for model training. Verify this is configured and documented in the BAA or data processing agreement before processing PHI.
Content filtering: Bedrock Guardrails, Azure Content Safety, and Vertex Safety Filters provide output filtering for harmful or inappropriate content. For clinical AI, content filtering must be calibrated carefully — clinical content includes descriptions of harm (injuries, medications, diagnoses) that general-purpose safety filters may incorrectly block.
Healthcare Example
Educational Example — Not intended for clinical use.
A Reference Healthcare Organization with an existing Azure Enterprise Agreement evaluates cloud AI platform selection for its HMS AI Platform. The evaluation criteria:
| Criterion | Azure OpenAI | AWS Bedrock | Decision Basis |
|---|---|---|---|
| Existing cloud relationship | Primary cloud | Secondary cloud | Azure strongly preferred |
| HIPAA BAA | ✅ Microsoft HIPAA BAA | ✅ AWS HIPAA BAA | Tie |
| Private networking | ✅ Azure Private Link (already configured) | Requires new VPC endpoint | Azure easier |
| Model access (Claude) | ❌ Not available | ✅ Available | Bedrock required for Claude |
| Epic FHIR integration | ✅ Azure Health Data Services (existing) | Requires setup | Azure preferred |
| GPT-4o access | ✅ Exclusive | ❌ Not available | Depends on model strategy |
Decision: Primary platform = Azure OpenAI (for GPT-4o, existing EA, Azure ecosystem integration). Secondary platform = AWS Bedrock (for Claude access when required by specific use cases). All inference routed through LiteLLM gateway to enable model switching without application code changes.
Common Mistakes
1. Assuming model availability before verifying. Model availability on each platform changes frequently. Verify that the specific model version required is available in the required region before designing the architecture.
2. Building platform-specific integration without an abstraction layer. Code that calls azure<em>openai</em>client.chat.completions.create() directly cannot switch to Bedrock without rewriting. All inference calls should go through a platform-agnostic interface.
3. Not requesting quota increases before production launch. Default quotas on all platforms are insufficient for enterprise production. Quota increase requests must be submitted weeks before the target launch date.
4. Treating HIPAA BAA as a static guarantee. BAA terms and scope change. Verify current BAA terms annually and upon any significant platform update.
5. Ignoring content filter calibration. Default content filters on all platforms are calibrated for consumer use cases. Clinical content will trigger false positives on medical descriptions. Calibrate and test content filters before clinical deployment.
Best Practices
- Implement an AI gateway abstraction (LiteLLM or equivalent) before committing to any single platform — preserves optionality
- Align platform selection with existing cloud provider relationship when technically equivalent
- Enable private networking for any inference involving PHI
- Verify training data opt-out is configured and documented before processing PHI
- Request quota increases 4–6 weeks before production launch
- Test content filters against a representative clinical content sample before deployment
- Review platform data handling terms annually — they change
Trade-offs
Model selection vs. ecosystem integration: The best model for a specific use case may not be available on the organization's preferred cloud platform. AWS Bedrock offers the broadest model selection (multi-vendor); Azure OpenAI offers exclusive access to the GPT-4/o1 family; Vertex offers native BigQuery integration.
Managed features vs. lock-in: Platform-specific features (Bedrock Agents, Azure AI Foundry, Vertex Agent Builder) reduce development effort but create deeper lock-in. Generic infrastructure (LiteLLM gateway + vLLM) preserves optionality at higher operational cost.
Interview Questions
Q: An enterprise client wants to use both Claude and GPT-4 in different parts of their AI platform. How would you architect this without creating application-level dependencies on each platform?
Category: Architecture Difficulty: Senior Role: AI Architect
Answer Framework:
The answer is an AI gateway pattern — a layer that translates a common API format (typically OpenAI-compatible) into platform-specific API calls. All application code calls the gateway using the common format; the gateway routes to the appropriate platform based on the model name requested.
LiteLLM is the standard open-source implementation. The gateway configuration maps logical model names ("claude-premium", "gpt4-enterprise") to platform-specific endpoints. Application code references only the logical name. Switching from Claude via Anthropic API to Claude via Bedrock, or from GPT-4 Azure to GPT-4 OpenAI, requires only gateway configuration changes.
The additional benefits of this pattern: centralized cost attribution, unified audit logging, failover routing (if Claude API is rate-limited, fall back to an alternative), and prompt injection for common headers (department attribution, audit trail).
Key Points to Hit:
- AI gateway as the abstraction layer
- LiteLLM as the practical implementation
- Logical model names vs. platform-specific model IDs
- Benefits beyond model switching: cost, logging, failover
Key Takeaways
- AWS Bedrock offers the broadest multi-model catalog; Azure OpenAI offers exclusive GPT-4/o1 access; Google Vertex AI offers the deepest BigQuery and analytics integration
- All three platforms provide HIPAA BAAs and private networking for PHI-in-cloud deployments
- The AI gateway pattern (LiteLLM) is the architectural mechanism that preserves cross-platform optionality
- Align cloud AI platform selection with existing cloud provider relationship when technically equivalent
- Verify model availability in required regions before architecture commitment
- Request quota increases 4–6 weeks before production launch — default quotas are insufficient for enterprise production
- Training data opt-out must be verified and documented before processing PHI
Glossary
VPC Endpoint (AWS) / Private Link (Azure) / VPC Service Controls (GCP): Private networking mechanisms that route cloud service traffic through the cloud provider's internal network rather than the public internet — required for PHI data paths.
Quota: A platform-imposed limit on the rate of API requests (requests per minute, tokens per minute). Default quotas are set conservatively and must be increased for production workloads.
Content Filter / Guardrail: A platform-level mechanism that screens LLM outputs for harmful, inappropriate, or policy-violating content before returning them to the application.
Enterprise Agreement (EA): A volume licensing contract with a cloud provider that provides discounted pricing in exchange for committed spend — often determines which cloud AI platform is economically preferred.
Further Reading
- LLM Serving Infrastructure — Self-hosted alternative to managed platforms
- AI Platform Architecture — How cloud AI platforms fit into the enterprise AI platform design
- Cost Management — Token economics and cost optimization across cloud platforms
- HIPAA and AI — Healthcare-specific requirements for cloud AI platform selection