BlogCloud ArchitectureMulti-Cloud Healthcare Data Architecture: Patterns, Risks, and Best Practices
Cloud Architecture

Multi-Cloud Healthcare Data Architecture: Patterns, Risks, and Best Practices

Healthcare organizations end up multi-cloud for reasons that are rarely strategic. Here is how to architect data infrastructure across clouds without creating a compliance and operational nightmare.

mdatool Team·April 21, 2026·8 min read
multi-cloudhealthcare data architecturecloud architecturedata residencysecurity

Introduction

Healthcare organizations do not usually choose multi-cloud — they arrive there. A payer acquires a health system that runs on AWS while the parent organization is on Azure. A provider organization adopts Epic on Azure but runs its data warehouse on Snowflake on GCP. A Medicare Advantage plan inherits Salesforce Health Cloud (running on AWS) alongside a homegrown Databricks environment on Azure. Three years later, you have a multi-cloud architecture that nobody designed.

The question is not whether to be multi-cloud — most large healthcare organizations already are. The question is how to govern it, secure it, and build data pipelines across it without creating a fragmented, ungovernable PHI disaster.


Why Healthcare Goes Multi-Cloud

The most common drivers are not strategic decisions — they are tactical ones:

  • Acquisition: The acquired organization runs a different cloud
  • Vendor lock-in avoidance: Procurement mandates preventing single-vendor dependency
  • Best-of-breed services: AWS Comprehend Medical for NLP, GCP Healthcare API for [FHIR](/terms/FHIR) streaming, Azure for Epic integration
  • SaaS footprint: Salesforce, Workday, ServiceNow, and other SaaS platforms run on specific clouds and create implicit multi-cloud data flows
  • Geographic compliance: Some international healthcare data residency rules require specific cloud regions that only one provider covers

Common Multi-Cloud Patterns

Pattern 1: Active-Active (Symmetric Workloads)

Both clouds run production workloads simultaneously, with data synchronized between them. This provides the highest availability and eliminates single-cloud dependency.

Healthcare use case: A large regional health system runs Epic on Azure in the Midwest but acquired a coastal hospital running Cerner on AWS. Both produce clinical data that must be merged into a single enterprise analytics layer.

Architecture: Use a cloud-neutral data platform (Snowflake, Databricks, or dbt Cloud) as the analytics tier, with each cloud feeding into it. The active-active pattern works best when you accept that the analytics layer is the synchronization point — not real-time data replication between cloud storage tiers.

Risk: Data consistency across clouds during pipeline failures. Active-active requires idempotent pipelines and careful conflict resolution for shared data (the patient master index in particular).

Pattern 2: Primary + DR (Asymmetric)

One cloud runs primary production workloads. The second runs warm standby or cold DR.

Healthcare use case: Primary claims processing on AWS; DR replica on Azure for business continuity. HIPAA requires contingency plans and disaster recovery capability — this pattern satisfies that requirement with geographic and vendor diversity.

Architecture: Replicate PHI data stores across clouds using encrypted, HIPAA-compliant replication. In Snowflake, cross-cloud replication handles this natively. For S3-to-Azure Blob replication, use Azure Data Factory or Rclone on an encrypted channel.

Pattern 3: Workload-Specific (Best-of-Breed)

Different workloads live on different clouds based on service fit, not accident.

Healthcare use case:

  • AWS: Claims ingestion and processing (mature Glue/Step Functions ecosystem)
  • GCP: FHIR analytics (BigQuery sync from Cloud Healthcare API)
  • Azure: Epic integration and SMART on FHIR apps (Microsoft's preferred integration path)

This is the most intellectually honest multi-cloud architecture — deliberate choices per workload. It is also the hardest to govern and the most expensive to operate.


PHI Data Residency Risks in Multi-Cloud

Multi-cloud creates PHI data movement risks that single-cloud architectures do not have:

Cross-cloud PHI transfer: When data moves from AWS to GCP, it traverses the public internet unless you have established private interconnects (AWS Direct Connect + Google Cloud Interconnect, or through a co-location facility). Unencrypted PHI in transit across clouds is a HIPAA violation.

Control plane exposure: Multi-cloud governance tools (Prisma Cloud, Wiz, Lacework) that scan your infrastructure across clouds require read access to both environments. That read access must be scoped carefully — a misconfigured governance scanner can become a PHI exposure vector.

BAA coverage gaps: Verify that your BAAs with AWS, Azure, and GCP each cover the specific services you use. A BAA with AWS does not cover GCP workloads. Maintain a BAA inventory that maps each vendor to the specific services and data types in scope.


Security Architecture for Multi-Cloud PHI

Encryption Key Strategy

Use a hardware security module (HSM) that operates outside both clouds as the root of trust for encryption keys. AWS CloudHSM, Azure Dedicated HSM, and GCP Cloud HSM are all cloud-specific — for true multi-cloud key management, use a third-party HSM (Thales, Entrust) or HashiCorp Vault.

Identity Federation

A single identity provider (Okta, Azure AD, or Ping Identity) should federate to both clouds. Do not maintain separate IAM user pools in AWS and Azure — that is an access control disaster waiting to happen.

Network Architecture

[AWS VPC] ──── AWS Direct Connect ────┐
                                       ├── [Co-location / Private fabric]
[Azure VNet] ── ExpressRoute ──────────┤
                                       └── [GCP VPC] ── Cloud Interconnect

Private interconnects between clouds are required for PHI data movement at production scale. Public internet transfer is acceptable only for non-PHI data or small-volume test workloads.


Governance in a Multi-Cloud World

Schema governance becomes critical when the same data model must be maintained across cloud environments. A column naming convention that is enforced in Snowflake on GCP but not in Redshift on AWS creates downstream governance failures — your data catalog cannot reliably link the same concept across environments.

Before deploying any schema change across cloud environments, run it through the Schema Diff to verify that the DDL is consistent and that no PHI columns have been inadvertently dropped or renamed. The diff output becomes your audit evidence for the schema change.


Practical Recommendations

  1. Designate one cloud as your analytics hub. Data can originate in multiple clouds but should consolidate into a single analytics layer. Fighting multi-cloud analytics fragmentation is more expensive than the multi-cloud ingestion architecture.

  2. Use cloud-neutral tooling at the orchestration layer. Airflow (on any cloud), Prefect Cloud, or Dagster Cloud remove cloud-specific orchestration dependency. dbt Cloud is cloud-neutral at the transformation layer.

  3. Audit your PHI flow across clouds quarterly. Use your governance tools (Purview, BigID) to scan for PHI in unexpected locations — multi-cloud environments develop shadow data stores quickly.

  4. Treat your BAA inventory as a living document. Every new SaaS tool, every new cloud service, every new data integration is a potential BAA gap.


Key Takeaways

  • Most healthcare organizations are multi-cloud by accident. Audit the actual state before designing a future-state architecture.
  • The most sustainable multi-cloud pattern for analytics is workload-specific (best-of-breed per service) with a single consolidated analytics layer.
  • PHI data movement across clouds requires private interconnects, not public internet transfer.
  • BAA coverage must be verified per service, per cloud — a blanket AWS BAA does not cover GCP workloads.
  • Schema consistency across cloud environments requires active governance. Use Schema Diff to catch DDL divergence before it reaches production.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free