BlogCloud ArchitectureHIPAA-Compliant Data Architecture on GCP: A Practical Guide
Cloud Architecture

HIPAA-Compliant Data Architecture on GCP: A Practical Guide

Building HIPAA-compliant data infrastructure on Google Cloud requires more than checking a BAA checkbox. Here is the architecture — BigQuery, Cloud Healthcare API, Pub/Sub, Dataflow, and the security controls that make it defensible.

mdatool Team·April 21, 2026·9 min read
GCPHIPAABigQueryCloud Healthcare APIdata architectureVPC Service Controls

Introduction

Google Cloud is a legitimate [HIPAA](/terms/HIPAA)-eligible platform. GCP signs Business Associate Agreements, and a specific set of services fall within scope. But HIPAA compliance on GCP is not automatic — it requires deliberate architectural choices around encryption, network isolation, audit logging, and access controls. This guide walks through a production-grade healthcare data architecture on GCP, from raw clinical data ingestion through analytics, with the security controls that make it defensible under HIPAA.


GCP Services in Scope for HIPAA

GCP's BAA covers a specific list of services. The most relevant for a healthcare data architecture are:

  • Cloud Healthcare API — [FHIR](/terms/FHIR) R4, HL7 v2, and DICOM storage
  • BigQuery — Data warehouse and analytics
  • Cloud Storage (GCS) — Object storage for raw files (837, 835, flat files)
  • Pub/Sub — Real-time event streaming
  • Dataflow — Managed Apache Beam for batch and streaming pipelines
  • Cloud Composer — Managed Airflow for orchestration
  • Secret Manager — Credential and secret storage
  • Cloud KMS — Encryption key management
  • VPC Service Controls — Network security perimeter
  • Cloud Audit Logs — Access and activity logging

Services not covered by GCP's BAA should never touch PHI. When in doubt, check GCP's current BAA addendum before using a new service.


Architecture Overview

The reference architecture has five layers:

[Source Systems]
  Epic (FHIR R4) → Cloud Healthcare API → BigQuery (streaming sync)
  Clearinghouse (837/835 EDI) → GCS (raw zone) → Dataflow → BigQuery
  Lab vendor (HL7 v2) → Cloud Healthcare API HL7 Store → Pub/Sub → Dataflow

[Ingestion & Processing]
  Dataflow pipelines for transformation and PHI standardization

[Storage]
  BigQuery: bronze (raw), silver (standardized), gold (aggregated)
  GCS: raw file archive (encrypted, lifecycle-managed)

[Access & Governance]
  VPC Service Controls security perimeter
  IAM + Column-level security in BigQuery
  Cloud Audit Logs → BigQuery logging sink

[Analytics & Serving]
  Looker / Looker Studio for operational reporting
  Vertex AI for risk models

Step 1: Establish the Security Perimeter with VPC Service Controls

VPC Service Controls create a security perimeter around your PHI-handling GCP project. Resources inside the perimeter can communicate with each other; data egress to outside the perimeter is blocked by default.

Create a service perimeter that includes all PHI-handling services:

gcloud access-context-manager perimeters create phi-perimeter   --title="PHI Data Perimeter"   --resources=projects/[YOUR_PROJECT_NUMBER]   --restricted-services=bigquery.googleapis.com,healthcare.googleapis.com,storage.googleapis.com,pubsub.googleapis.com,dataflow.googleapis.com   --policy=[YOUR_ACCESS_POLICY_ID]

This is your first line of defense against data exfiltration. A service account or user with BigQuery access inside the perimeter cannot export PHI to an external GCS bucket or BigQuery dataset outside the perimeter.


Step 2: Configure Cloud Healthcare API

Create a FHIR store with audit logging and BigQuery streaming enabled:

gcloud healthcare fhir-stores create ehr-fhir-store   --dataset=clinical-dataset   --location=us-central1   --version=R4   --enable-update-create   --pubsub-topic=projects/[PROJECT_ID]/topics/fhir-mutations

Enable BigQuery streaming sync for analytics access:

{
  "streamConfigs": [{
    "resourceTypes": ["Patient", "Encounter", "Condition", "Observation", "MedicationRequest"],
    "bigqueryDestination": {
      "datasetUri": "bq://[PROJECT_ID].clinical_fhir_bronze",
      "schemaConfig": {
        "schemaType": "ANALYTICS_V2",
        "recursiveStructureDepth": 5
      },
      "writeDisposition": "WRITE_APPEND"
    }
  }]
}

The ANALYTICS_V2 schema flattens FHIR JSON into BigQuery-native columns, making FHIR resources directly queryable without JSON parsing.


Step 3: Encryption with Customer-Managed Keys

By default, GCP encrypts all data at rest using Google-managed keys. For PHI, use Customer-Managed Encryption Keys (CMEK) via Cloud KMS — this gives your organization control over the encryption lifecycle.

# Create a key ring for PHI data
gcloud kms keyrings create phi-keyring   --location=us-central1

# Create a symmetric encryption key
gcloud kms keys create phi-data-key   --keyring=phi-keyring   --location=us-central1   --purpose=encryption   --rotation-period=90d   --next-rotation-time=$(date -d '+90 days' --iso-8601)

# Apply CMEK to a BigQuery dataset
bq update   --default_kms_key=projects/[PROJECT_ID]/locations/us-central1/keyRings/phi-keyring/cryptoKeys/phi-data-key   [PROJECT_ID]:phi_data

Key rotation every 90 days is a HIPAA Security Rule best practice for encryption keys protecting ePHI.


Step 4: IAM and Column-Level Security in BigQuery

Implement least-privilege access using BigQuery column-level security with data policies:

-- Tag PHI columns with a policy tag
-- (policy tag must be created in Data Catalog first)
ALTER TABLE phi.member_demographics
  ALTER COLUMN ssn
  SET OPTIONS (policy_tags = '["projects/[PROJECT]/locations/us-central1/taxonomies/[TAXONOMY_ID]/policyTags/[TAG_ID]"]');

-- Grant access to the PHI policy tag for privileged role only
-- Done via Data Catalog IAM, not BigQuery IAM

This ensures that even if a user has BigQuery read access to the table, they cannot read the SSN column unless they have been explicitly granted access to the PHI policy tag.


Step 5: Audit Logging

Enable Data Access audit logs for all PHI-handling services:

gcloud projects set-iam-policy [PROJECT_ID] - <<'EOF'
auditConfigs:
- auditLogConfigs:
  - logType: DATA_READ
  - logType: DATA_WRITE
  - logType: ADMIN_READ
  service: bigquery.googleapis.com
- auditLogConfigs:
  - logType: DATA_READ
  - logType: DATA_WRITE
  service: healthcare.googleapis.com
- auditLogConfigs:
  - logType: DATA_READ
  - logType: DATA_WRITE
  service: storage.googleapis.com
EOF

Route audit logs to BigQuery for long-term retention and queryable audit evidence:

gcloud logging sinks create phi-audit-sink   bigquery.googleapis.com/projects/[PROJECT_ID]/datasets/audit_logs   --log-filter='logName="projects/[PROJECT_ID]/logs/cloudaudit.googleapis.com%2Fdata_access"'

Retain audit logs for a minimum of 6 years (HIPAA requirement). BigQuery's partitioned table storage makes 6-year retention economically practical.


Key Takeaways

  • VPC Service Controls are non-negotiable for PHI on GCP — they prevent data egfiltration that IAM alone cannot stop.
  • Cloud Healthcare API's BigQuery streaming sync eliminates the ETL tier for FHIR analytics. Enable it at FHIR store creation time.
  • CMEK with 90-day key rotation satisfies HIPAA Security Rule encryption management requirements.
  • BigQuery column-level security with policy tags provides fine-grained PHI access control without view proliferation.
  • Before any SQL query runs against PHI tables, validate it with the [SQL Linter](/tools/sql-linter) to catch unbounded queries and anti-patterns that could expose more PHI than intended.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free