Orchestration and Workflow Automation for AI

Core Architecture

Common Mistakes

1. Using Airflow for long-running workflows. Airflow tasks time out and occupy slots for their entire duration. A workflow waiting for physician approval occupies an Airflow worker slot for days. Use Temporal for workflows with human-in-the-loop steps.

2. Not setting GPU resource limits. Without explicit GPU limits in Kubernetes, a single workload can monopolize all GPU capacity. Always set both requests and limits for nvidia.com/gpu.

3. Over-broad retry policies on LLM activities. Retrying a non-idempotent LLM activity 10 times may produce 10 different outputs or consume significant API budget. LLM activities should retry only on transient network errors; they should not retry on model errors or rate limit responses without backoff.

4. Scheduling embedding and inference jobs at the same time. Nightly knowledge base refresh (embedding-intensive) and peak clinical usage (inference-intensive) should not compete for the same GPU resources. Schedule batch embedding jobs during off-peak inference hours.

Best Practices

  • Use Airflow for scheduled, partitioned data pipelines; use Temporal for durable, long-running AI workflows with human-in-the-loop steps
  • Set explicit resource requests and limits for GPU workloads in Kubernetes
  • Use Temporal's retry policy to control retry semantics per activity type; don't apply a single global retry policy
  • Schedule batch embedding jobs during off-peak inference hours to avoid GPU contention
  • Implement quality evaluation as the final step in every knowledge base refresh pipeline

Trade-offs

Orchestrator Strengths Weaknesses Best For
Airflow Mature, widely deployed, rich UI Workers block during task execution Scheduled batch pipelines
Temporal Durable execution, fine-grained retry Operational complexity, infrastructure required Long-running, event-driven AI workflows
Kubernetes Jobs Native GPU support, simple for batch No workflow logic One-shot batch inference, fine-tuning runs
Step Functions (AWS) Managed, integrates with AWS services Vendor-locked, limited duration AWS-native AI pipelines

Interview Questions

Q: A clinical AI workflow has a step where a physician reviews an AI-generated document. The review may take anywhere from 10 minutes to 3 days. How would you design the orchestration for this workflow?

Category: System Design Difficulty: Senior Role: AI Architect

Answer Framework:

This is a durable workflow problem. The key constraint is that the workflow must remain paused for up to 3 days while consuming no compute resources — the physician review step is an external, human-triggered event, not a blocking computation.

Wrong approach: Using a thread.sleep() loop in an Airflow task or a Lambda function that polls every minute for 3 days. This consumes compute resources continuously and produces no audit trail.

Correct approach: Temporal's durable execution model is built for exactly this pattern. The workflow suspends after creating the physician review task, persisting its state in the Temporal service. When the physician completes the review (triggering a webhook or a poll activity), the workflow resumes from its exact suspension point. The full execution history — including suspension time, resume time, and physician identity — is recorded in Temporal's history log.

Key Points to Hit:

  • Durable execution vs. stateful polling: the former suspends without consuming resources
  • Audit trail requirement for clinical AI compliance
  • Timeout policy: explicit escalation at 24h and 72h, not silent expiry at 7 days
  • Separation of concerns: physician review task is created in the EHR workflow system, not managed by Temporal

Key Takeaways

  • Airflow is appropriate for scheduled, batch-oriented AI data pipelines; Temporal is appropriate for durable, event-driven AI workflows
  • GPU workloads require explicit resource limits in Kubernetes to prevent resource monopolization
  • Human-in-the-loop steps require durable workflow orchestration — polling loops in Airflow workers are an antipattern
  • Clinical AI workflows require complete audit trails; Temporal's execution history satisfies this requirement
  • Quality evaluation (golden query evaluation) should be a first-class step in every knowledge base refresh pipeline