Introduction
Google Cloud is a legitimate [HIPAA](/terms/HIPAA)-eligible platform. GCP signs Business Associate Agreements, and a specific set of services fall within scope. But HIPAA compliance on GCP is not automatic — it requires deliberate architectural choices around encryption, network isolation, audit logging, and access controls. This guide walks through a production-grade healthcare data architecture on GCP, from raw clinical data ingestion through analytics, with the security controls that make it defensible under HIPAA.
GCP Services in Scope for HIPAA
GCP's BAA covers a specific list of services. The most relevant for a healthcare data architecture are:
- Cloud Healthcare API — [FHIR](/terms/FHIR) R4, HL7 v2, and DICOM storage
- BigQuery — Data warehouse and analytics
- Cloud Storage (GCS) — Object storage for raw files (837, 835, flat files)
- Pub/Sub — Real-time event streaming
- Dataflow — Managed Apache Beam for batch and streaming pipelines
- Cloud Composer — Managed Airflow for orchestration
- Secret Manager — Credential and secret storage
- Cloud KMS — Encryption key management
- VPC Service Controls — Network security perimeter
- Cloud Audit Logs — Access and activity logging
Services not covered by GCP's BAA should never touch PHI. When in doubt, check GCP's current BAA addendum before using a new service.
Architecture Overview
The reference architecture has five layers:
[Source Systems]
Epic (FHIR R4) → Cloud Healthcare API → BigQuery (streaming sync)
Clearinghouse (837/835 EDI) → GCS (raw zone) → Dataflow → BigQuery
Lab vendor (HL7 v2) → Cloud Healthcare API HL7 Store → Pub/Sub → Dataflow
[Ingestion & Processing]
Dataflow pipelines for transformation and PHI standardization
[Storage]
BigQuery: bronze (raw), silver (standardized), gold (aggregated)
GCS: raw file archive (encrypted, lifecycle-managed)
[Access & Governance]
VPC Service Controls security perimeter
IAM + Column-level security in BigQuery
Cloud Audit Logs → BigQuery logging sink
[Analytics & Serving]
Looker / Looker Studio for operational reporting
Vertex AI for risk models
Step 1: Establish the Security Perimeter with VPC Service Controls
VPC Service Controls create a security perimeter around your PHI-handling GCP project. Resources inside the perimeter can communicate with each other; data egress to outside the perimeter is blocked by default.
Create a service perimeter that includes all PHI-handling services:
gcloud access-context-manager perimeters create phi-perimeter --title="PHI Data Perimeter" --resources=projects/[YOUR_PROJECT_NUMBER] --restricted-services=bigquery.googleapis.com,healthcare.googleapis.com,storage.googleapis.com,pubsub.googleapis.com,dataflow.googleapis.com --policy=[YOUR_ACCESS_POLICY_ID]
This is your first line of defense against data exfiltration. A service account or user with BigQuery access inside the perimeter cannot export PHI to an external GCS bucket or BigQuery dataset outside the perimeter.
Step 2: Configure Cloud Healthcare API
Create a FHIR store with audit logging and BigQuery streaming enabled:
gcloud healthcare fhir-stores create ehr-fhir-store --dataset=clinical-dataset --location=us-central1 --version=R4 --enable-update-create --pubsub-topic=projects/[PROJECT_ID]/topics/fhir-mutations
Enable BigQuery streaming sync for analytics access:
{ "streamConfigs": [{ "resourceTypes": ["Patient", "Encounter", "Condition", "Observation", "MedicationRequest"], "bigqueryDestination": { "datasetUri": "bq://[PROJECT_ID].clinical_fhir_bronze", "schemaConfig": { "schemaType": "ANALYTICS_V2", "recursiveStructureDepth": 5 }, "writeDisposition": "WRITE_APPEND" } }] }
The ANALYTICS_V2 schema flattens FHIR JSON into BigQuery-native columns, making FHIR resources directly queryable without JSON parsing.
Step 3: Encryption with Customer-Managed Keys
By default, GCP encrypts all data at rest using Google-managed keys. For PHI, use Customer-Managed Encryption Keys (CMEK) via Cloud KMS — this gives your organization control over the encryption lifecycle.
# Create a key ring for PHI data gcloud kms keyrings create phi-keyring --location=us-central1 # Create a symmetric encryption key gcloud kms keys create phi-data-key --keyring=phi-keyring --location=us-central1 --purpose=encryption --rotation-period=90d --next-rotation-time=$(date -d '+90 days' --iso-8601) # Apply CMEK to a BigQuery dataset bq update --default_kms_key=projects/[PROJECT_ID]/locations/us-central1/keyRings/phi-keyring/cryptoKeys/phi-data-key [PROJECT_ID]:phi_data
Key rotation every 90 days is a HIPAA Security Rule best practice for encryption keys protecting ePHI.
Step 4: IAM and Column-Level Security in BigQuery
Implement least-privilege access using BigQuery column-level security with data policies:
-- Tag PHI columns with a policy tag -- (policy tag must be created in Data Catalog first) ALTER TABLE phi.member_demographics ALTER COLUMN ssn SET OPTIONS (policy_tags = '["projects/[PROJECT]/locations/us-central1/taxonomies/[TAXONOMY_ID]/policyTags/[TAG_ID]"]'); -- Grant access to the PHI policy tag for privileged role only -- Done via Data Catalog IAM, not BigQuery IAM
This ensures that even if a user has BigQuery read access to the table, they cannot read the SSN column unless they have been explicitly granted access to the PHI policy tag.
Step 5: Audit Logging
Enable Data Access audit logs for all PHI-handling services:
gcloud projects set-iam-policy [PROJECT_ID] - <<'EOF' auditConfigs: - auditLogConfigs: - logType: DATA_READ - logType: DATA_WRITE - logType: ADMIN_READ service: bigquery.googleapis.com - auditLogConfigs: - logType: DATA_READ - logType: DATA_WRITE service: healthcare.googleapis.com - auditLogConfigs: - logType: DATA_READ - logType: DATA_WRITE service: storage.googleapis.com EOF
Route audit logs to BigQuery for long-term retention and queryable audit evidence:
gcloud logging sinks create phi-audit-sink bigquery.googleapis.com/projects/[PROJECT_ID]/datasets/audit_logs --log-filter='logName="projects/[PROJECT_ID]/logs/cloudaudit.googleapis.com%2Fdata_access"'
Retain audit logs for a minimum of 6 years (HIPAA requirement). BigQuery's partitioned table storage makes 6-year retention economically practical.
Key Takeaways
- VPC Service Controls are non-negotiable for PHI on GCP — they prevent data egfiltration that IAM alone cannot stop.
- Cloud Healthcare API's BigQuery streaming sync eliminates the ETL tier for FHIR analytics. Enable it at FHIR store creation time.
- CMEK with 90-day key rotation satisfies HIPAA Security Rule encryption management requirements.
- BigQuery column-level security with policy tags provides fine-grained PHI access control without view proliferation.
- Before any SQL query runs against PHI tables, validate it with the [SQL Linter](/tools/sql-linter) to catch unbounded queries and anti-patterns that could expose more PHI than intended.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
More in Cloud Architecture
Multi-Cloud Healthcare Data Architecture: Patterns, Risks, and Best Practices
Healthcare organizations end up multi-cloud for reasons that are rarely strategic. Here is how to architect data infrastructure across clouds without creating a compliance and operational nightmare.
Read moreGoogle Cloud Healthcare API vs Azure Health Data Services vs AWS HealthLake: A 2026 Comparison
Three cloud giants, three healthcare data services, and one question: which one actually fits your architecture? A practical comparison across FHIR compliance, PHI handling, pipeline integration, and real-world limitations.
Read moreAzure Health Data Services vs AWS HealthLake
Both Microsoft and AWS now offer managed FHIR-native cloud platforms for healthcare data. We compare Azure Health Data Services and AWS HealthLake across FHIR compliance, data pipeline integration, cost, and real-world use cases so your team can make an informed choice.
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.