mdatool
LibraryBlogPricing
mdatool
mdatool

Healthcare data architecture platform for data engineers, architects, and analysts building modern health systems.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data engineers & architects.

BlogHealthcare Data ModelingHealthcare Data Models in Snowflake: Architecture, Performance, and Cost Optimization
Healthcare Data Modeling

Healthcare Data Models in Snowflake: Architecture, Performance, and Cost Optimization

Your Snowflake storage costs are climbing 40% per month. Queries are timing out. Your claims report just cost $847 to run. Here's how to build healthcare data models optimized for Snowflake's architecture—with clustering, materialized views, and cost controls.

mdatool Team·March 23, 2026·10 min read
snowflakehealthcaredata warehouseoptimizationclusteringperformance

Your healthcare data warehouse migrated to Snowflake six months ago. Storage costs are climbing 40% per month. Queries that should take seconds are timing out. Your claims reconciliation report—the one finance runs every morning—just cost $847 to execute.

Snowflake's promise was scalability and simplicity. What happened?

You lifted-and-shifted your on-prem [[[data model](/tools/modeling)](/tools/modeling)](/tools/modeling) without adapting it for Snowflake's architecture. Row-by-row processing patterns that worked in Oracle are killing performance in a columnar database. Clustering keys are missing. Time-travel is inflating storage. Your warehouse is running 24/7 on XXXL when it should auto-suspend.

Here's how to build healthcare data models optimized for Snowflake's architecture—models that leverage clustering, partitioning, materialized views, and cost controls while maintaining the temporal accuracy and audit trails healthcare requires.


The Snowflake Difference: Why Your Oracle Model Doesn't Work

Traditional healthcare data warehouses (Oracle, SQL Server, Teradata) optimize for:

  • Row-based storage
  • Indexes everywhere
  • Partitioning by date ranges
  • Manual scaling

Snowflake is fundamentally different:

  • Columnar storage (micro-partitions)
  • No indexes (automatic clustering)
  • Metadata-based pruning
  • Auto-scaling compute (virtual warehouses)
  • Time-travel and zero-copy cloning

The mistake: Treating Snowflake like Oracle with auto-scaling. It's not. You need to redesign for columnar storage, leverage clustering keys, and optimize for Snowflake's pricing model.


Snowflake Architecture Principles for Healthcare

Principle 1: Separate Storage from Compute

In traditional databases, storage and compute are coupled. In Snowflake, they're independent.

Healthcare implication:

  • Store ALL historical claims (10 years+) without compute costs
  • Spin up warehouses only when running reports
  • Auto-suspend after 60 seconds of inactivity
  • Use different warehouse sizes for different workloads
-- ETL warehouse (loads, transformations)
CREATE WAREHOUSE ETL_WH
  WAREHOUSE_SIZE = 'LARGE'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE;
🔄

Free Tool

Convert this DDL to Snowflake, BigQuery, or PostgreSQL instantly →

-- Reporting warehouse (morning finance reports) CREATE WAREHOUSE REPORTING_WH WAREHOUSE_SIZE = 'MEDIUM' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE;

-- Analytics warehouse (ad-hoc queries) CREATE WAREHOUSE ANALYTICS_WH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60 AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE;

**Cost impact:** A warehouse that's always on costs ~$2/hour (Medium). One that auto-suspends costs only when running. --- ### **Principle 2: Leverage Micro-Partitions, Not Traditional Partitioning** Snowflake automatically divides tables into micro-partitions (50-500MB compressed). You don't create partitions manually. **But you DO define clustering keys** to organize micro-partitions for query performance. #### **Bad Healthcare Model (No Clustering)** ```sql CREATE TABLE claim_header ( clm_id VARCHAR(50), mmbr_id VARCHAR(50), service_from_dt DATE, paid_dt DATE, current_paid_amt NUMBER(10,2), -- 50 more columns );

Problem: Queries filtering by service_from_dt scan ALL micro-partitions because data isn't organized by date.

Good Healthcare Model (Clustered)

CREATE TABLE claim_header (
  clm_id                 VARCHAR(50),
  mmbr_id                VARCHAR(50),
  service_from_dt        DATE,
  paid_dt                DATE,
  current_paid_amt       NUMBER(10,2),
  -- 50 more columns
)
CLUSTER BY (service_from_dt, mmbr_id);

Why this works:

  • Most healthcare queries filter by date range (WHERE service_from_dt BETWEEN ...)
  • Secondary filter is often member (AND mmbr_id = '...')
  • Clustering organizes micro-partitions to prune non-matching partitions

Query performance:

  • Without clustering: Scans 10,000 micro-partitions
  • With clustering: Scans 23 micro-partitions (99.77% pruned)

Principle 3: Use Materialized Views for Expensive Aggregations

Healthcare reports often aggregate millions of claims. Don't recompute every time.

Expensive Query (Runs Every Morning)

-- Finance: Total paid by month (runs on 100M claims)
SELECT 
  DATE_TRUNC('month', paid_dt) as paid_month,
  COUNT(*) as claim_count,
  SUM(current_paid_amt) as total_paid
FROM claim_header
WHERE paid_dt >= '2020-01-01'
GROUP BY DATE_TRUNC('month', paid_dt);

Cost: $12 per run (scans 100M rows)

Optimized: Materialized View

CREATE MATERIALIZED VIEW claim_monthly_summary AS
SELECT 
  DATE_TRUNC('month', paid_dt) as paid_month,
  COUNT(*) as claim_count,
  SUM(current_paid_amt) as total_paid
FROM claim_header
WHERE paid_dt >= '2020-01-01'
GROUP BY DATE_TRUNC('month', paid_dt);

-- Query the materialized view instead
SELECT * FROM claim_monthly_summary
WHERE paid_month = '2024-03-01';

Cost: $0.02 per run (queries pre-aggregated data)
Maintenance: Auto-refreshed by Snowflake when base table changes


Healthcare-Specific Snowflake Patterns

Pattern 1: Temporal Tables with Clustering

Healthcare needs point-in-time queries. Snowflake's clustering + time-travel enables this.

CREATE TABLE member_enrollment (
  enrollment_id          VARCHAR(50),
  mmbr_id                VARCHAR(50),
  plan_id                VARCHAR(50),
  coverage_start_dt      DATE,
  coverage_end_dt        DATE,
  
  effective_dt           DATE,
  end_dt                 DATE,
  is_current_flag        BOOLEAN,
  
  load_dttm              TIMESTAMP_NTZ,
  
  CONSTRAINT pk_enrollment PRIMARY KEY (enrollment_id)
)
CLUSTER BY (mmbr_id, effective_dt);

-- Query: Who was enrolled on March 15, 2024?
SELECT mmbr_id, plan_id
FROM member_enrollment
WHERE '2024-03-15' BETWEEN coverage_start_dt 
  AND COALESCE(coverage_end_dt, '9999-12-31')
  AND is_current_flag = TRUE;

Clustering on (mmbr_id, effective_dt) ensures:

  • Member-specific queries are fast
  • Date range queries prune effectively
  • Point-in-time queries scan minimal partitions

Pattern 2: Claim Header/Line with Variant for Semi-Structured Data

[[[[EDI 837](/terms/EDI%20837)](/terms/EDI%20837)](/terms/EDI%20837)](/terms/EDI%20837) claims have 50+ optional fields. Don't create 100 nullable columns. Use VARIANT.

CREATE TABLE claim_header (
  clm_id                 VARCHAR(50),
  mmbr_id                VARCHAR(50),
  service_from_dt        DATE,
  
  -- Core financial fields (structured)
  current_billed_amt     NUMBER(10,2),
  current_paid_amt       NUMBER(10,2),
  
  -- Optional/variable fields (semi-structured)
  additional_data        VARIANT,
  
  load_dttm              TIMESTAMP_NTZ
)
CLUSTER BY (service_from_dt);

-- Insert with variant
INSERT INTO claim_header VALUES (
  'CLM_001',
  'M123',
  '2024-03-15',
  500.00,
  450.00,
  PARSE_JSON('{
    "admission_type": "ELECTIVE",
    "discharge_status": "01",
    "drg_code": "470",
    "attending_physician_npi": "1234567890"
  }'),
  CURRENT_TIMESTAMP
);

-- Query variant fields
SELECT 
  clm_id,
  current_paid_amt,
  additional_data:drg_code::STRING as drg_code,
  additional_data:attending_physician_npi::STRING as attending_npi
FROM claim_header
WHERE additional_data:admission_type::STRING = 'ELECTIVE';

Benefits:

  • Schema flexibility (add fields without ALTER TABLE)
  • Storage efficiency (only stores non-null values)
  • Fast JSON parsing (Snowflake optimizes variant queries)

Pattern 3: Incremental Models with Streams and Tasks

Don't reprocess all claims every night. Use Snowflake Streams to track changes.

-- Create stream to track new/changed claims
CREATE STREAM claim_changes_stream 
ON TABLE claim_header;

-- Create task to update materialized summary
CREATE TASK update_claim_summary
  WAREHOUSE = ETL_WH
  SCHEDULE = '60 MINUTE'
WHEN
  SYSTEM$STREAM_HAS_DATA('claim_changes_stream')
AS
  MERGE INTO claim_monthly_summary t
  USING (
    SELECT 
      DATE_TRUNC('month', paid_dt) as paid_month,
      COUNT(*) as claim_count,
      SUM(current_paid_amt) as total_paid
    FROM claim_changes_stream
    WHERE METADATA$ACTION = 'INSERT'
    GROUP BY DATE_TRUNC('month', paid_dt)
  ) s
  ON t.paid_month = s.paid_month
  WHEN MATCHED THEN 
    UPDATE SET 
      claim_count = t.claim_count + s.claim_count,
      total_paid = t.total_paid + s.total_paid
  WHEN NOT MATCHED THEN
    INSERT (paid_month, claim_count, total_paid)
    VALUES (s.paid_month, s.claim_count, s.total_paid);

Cost savings: Process only changed data, not entire table.


Cost Optimization Strategies

Strategy 1: Right-Size Warehouses by Workload

-- Heavy ETL loads (nightly)
CREATE WAREHOUSE ETL_WH 
  WAREHOUSE_SIZE = 'XLARGE'
  AUTO_SUSPEND = 60;

-- Morning reports (predictable)
CREATE WAREHOUSE REPORTING_WH 
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 60;

-- Ad-hoc analytics (variable)
CREATE WAREHOUSE ANALYTICS_WH 
  WAREHOUSE_SIZE = 'XSMALL'
  MAX_CLUSTER_COUNT = 3  -- Auto-scale up if needed
  AUTO_SUSPEND = 60;

Strategy 2: Time-Travel Retention

Default time-travel is 1 day (Enterprise: 90 days). For most tables, 1 day is enough.

-- Transactional tables: 1 day is sufficient
CREATE TABLE claim_header (
  -- columns
) DATA_RETENTION_TIME_IN_DAYS = 1;

-- Critical compliance tables: Extended retention
CREATE TABLE member_hcc (
  -- columns  
) DATA_RETENTION_TIME_IN_DAYS = 90;

Storage cost: Time-travel stores all versions. 90 days costs 90x more than 1 day.

Strategy 3: Optimize Clustering Maintenance

Clustering isn't free. Monitor and adjust.

-- Check clustering quality
SELECT 
  SYSTEM$CLUSTERING_INFORMATION('claim_header', '(service_from_dt)');

-- Reclustering consumes credits. Set budget:
ALTER TABLE claim_header 
  SUSPEND RECLUSTER;  -- Disable auto-reclustering

-- Manually recluster during off-hours
ALTER TABLE claim_header 
  RESUME RECLUSTER;

Healthcare Data Model Architecture in Snowflake

Layer 1: Raw/Staging (Minimal Processing)

CREATE SCHEMA raw;

CREATE TABLE raw.edi_837_claims (
  file_id       VARCHAR(100),
  claim_json    VARIANT,
  received_dt   DATE,
  load_dttm     TIMESTAMP_NTZ
)
CLUSTER BY (received_dt);

Purpose: Land data as-is. Minimal transformation.
Warehouse: Small (cheap storage, fast loads)


Layer 2: Conformed (Business Logic Applied)

CREATE SCHEMA conformed;

CREATE TABLE conformed.claim_header (
  clm_id                 VARCHAR(50),
  mmbr_id                VARCHAR(50),
  service_from_dt        DATE,
  current_paid_amt       NUMBER(10,2),
  -- conformed, validated fields
)
CLUSTER BY (service_from_dt, mmbr_id);

Purpose: Apply business rules, deduplicate, validate.
Warehouse: Medium-Large (transformation-heavy)


Layer 3: Presentation (Optimized for Queries)

CREATE SCHEMA presentation;

CREATE MATERIALIZED VIEW presentation.claim_monthly_summary AS
SELECT 
  DATE_TRUNC('month', service_from_dt) as service_month,
  mmbr_id,
  COUNT(*) as claim_count,
  SUM(current_paid_amt) as total_paid
FROM conformed.claim_header
GROUP BY DATE_TRUNC('month', service_from_dt), mmbr_id;

Purpose: Pre-aggregated for fast reporting.
Warehouse: Small (queries pre-computed data)


Real-World Example: Monthly Claims Report

Before Snowflake Optimization:

-- Scans 100M rows, runs for 12 minutes, costs $43
SELECT 
  DATE_TRUNC('month', ch.service_from_dt) as service_month,
  COUNT(DISTINCT ch.clm_id) as claim_count,
  COUNT(DISTINCT ch.mmbr_id) as member_count,
  SUM(ch.current_paid_amt) as total_paid,
  AVG(ch.current_paid_amt) as avg_paid
FROM claim_header ch
WHERE ch.service_from_dt >= '2023-01-01'
GROUP BY DATE_TRUNC('month', ch.service_from_dt)
ORDER BY service_month;

After Snowflake Optimization:

-- Uses materialized view, runs in 2 seconds, costs $0.03
SELECT * 
FROM presentation.claim_monthly_summary
WHERE service_month >= '2023-01-01'
ORDER BY service_month;

Performance: 360x faster
Cost: 1,433x cheaper


Monitoring and Governance

Query Cost Tracking

-- See warehouse usage by query
SELECT 
  query_id,
  user_name,
  warehouse_name,
  execution_time,
  total_elapsed_time,
  bytes_scanned,
  credits_used_cloud_services
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
ORDER BY credits_used_cloud_services DESC
LIMIT 100;

Storage Cost Monitoring

-- See table storage costs
SELECT 
  table_schema,
  table_name,
  active_bytes / 1024 / 1024 / 1024 as active_gb,
  time_travel_bytes / 1024 / 1024 / 1024 as time_travel_gb,
  failsafe_bytes / 1024 / 1024 / 1024 as failsafe_gb
FROM snowflake.account_usage.table_storage_metrics
WHERE table_schema = 'CONFORMED'
ORDER BY active_bytes DESC;

The Healthcare Snowflake Data Model Checklist

✅ Architecture

  • Separate warehouses by workload (ETL, Reporting, Analytics)
  • Auto-suspend after 60 seconds
  • Right-size for actual usage

✅ Clustering

  • Cluster by date fields (service_dt, paid_dt)
  • Secondary cluster by high-cardinality keys (mmbr_id, prvdr_id)
  • Monitor clustering quality

✅ Storage

  • Use VARIANT for semi-structured EDI data
  • Set time-travel retention based on compliance needs
  • Avoid over-clustering (costs credits)

✅ Performance

  • Materialized views for expensive aggregations
  • Streams + Tasks for incremental processing
  • Layer data: Raw → Conformed → Presentation

✅ Cost Control

  • Monitor query costs weekly
  • Identify expensive queries, optimize
  • Review warehouse sizes monthly

Conclusion

Snowflake isn't just "cloud Oracle." It's a fundamentally different architecture that rewards columnar thinking, separation of storage/compute, and materialized aggregations.

Build for Snowflake by:

  • Clustering on date + high-cardinality keys
  • Using materialized views for reports
  • Auto-suspending warehouses
  • Monitoring costs and optimizing continuously

Your healthcare data model should leverage Snowflake's strengths—not fight against them.

Optimize the architecture. Control the costs.

Related Guides

EHR Systems

Electronic Health Record systems, data models, and interoperability standards.

Read Guide

Healthcare Analytics

Population health analytics, data warehousing, and clinical intelligence.

Read Guide

More in Healthcare Data Modeling

HCC Risk Adjustment Data Model: Building Accurate Risk Score Pipelines in SQL

HCC risk adjustment determines how much revenue a health plan receives for each Medicare Advantage member. The data model behind it — from diagnosis code ingestion to RAF score calculation — is one of the most consequential schemas a payer data engineer will build.

Read more

NPI Number Validation: How to Clean and Enrich Provider Data in Your Database

Invalid NPI numbers in your provider table silently break claims routing, credentialing workflows, and CMS submissions. This guide covers validation approaches, NPPES enrichment patterns, and SQL queries for catching NPI data quality problems before they reach production.

Read more

HEDIS Measure Data Architecture: Building a Compliant Quality Reporting Pipeline

HEDIS measure calculation is deceptively complex. NCQA technical specifications are hundreds of pages long, source data comes from four different systems, and one logic error can affect thousands of members. Here is how to build a pipeline that holds up to NCQA audit.

Read more

Free Tools

Free SQL Linter

Catch SQL bugs, performance issues, and naming violations before production.

Try it free

Free DDL Converter

Translate SQL schemas between Snowflake, BigQuery, Oracle, and SQL Server.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

On this page

  • The Snowflake Difference: Why Your Oracle Model Doesn't Work
  • Snowflake Architecture Principles for Healthcare
  • **Principle 1: Separate Storage from Compute**
  • **Principle 2: Leverage Micro-Partitions, Not Traditional Partitioning**
  • **Principle 3: Use Materialized Views for Expensive Aggregations**
  • Healthcare-Specific Snowflake Patterns
  • **Pattern 1: Temporal Tables with Clustering**
  • **Pattern 2: Claim Header/Line with Variant for Semi-Structured Data**
  • **Pattern 3: Incremental Models with Streams and Tasks**
  • Cost Optimization Strategies
  • **Strategy 1: Right-Size Warehouses by Workload**
  • **Strategy 2: Time-Travel Retention**
  • **Strategy 3: Optimize Clustering Maintenance**
  • Healthcare Data Model Architecture in Snowflake
  • **Layer 1: Raw/Staging (Minimal Processing)**
  • **Layer 2: Conformed (Business Logic Applied)**
  • **Layer 3: Presentation (Optimized for Queries)**
  • Real-World Example: Monthly Claims Report
  • Monitoring and Governance
  • **Query Cost Tracking**
  • **Storage Cost Monitoring**
  • The Healthcare Snowflake Data Model Checklist
  • Conclusion

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools