How to Write a DDL Script for Healthcare Data Warehouses

Introduction

A poorly written DDL script is a slow-motion data quality problem. The VARCHAR(255) on an NPI column allows a 50-character typo to load without error. The missing NOT NULL on service_date lets null dates propagate into HEDIS measure calculations. The absent unique constraint on member_id lets a deduplication bug create duplicate member records that survive for months before anyone notices.

🏥NPI Lookup

Look up any NPI number and validate provider data against the NPPES registry.

Try it free

In healthcare data warehouses, DDL is a correctness control. The schema you define determines what the database will and will not accept — which means schema decisions made in 20 minutes at the start of a project enforce (or fail to enforce) data quality for the life of the warehouse.

This guide covers practical DDL patterns for healthcare data warehouses: data type selection, constraint design, indexing strategy, PHI field handling, and the structural conventions that make a schema maintainable at scale.

Data Type Selection for Healthcare Fields

Choosing the right data type for each column is the highest-leverage decision in your DDL. Wrong data types cause silent truncation, failed joins, and performance problems that are expensive to fix after data is loaded.

Dates and Times

Healthcare data has three distinct temporal concepts that require different types:

-- Service dates: DATE is almost always correct
-- Never use VARCHAR for a date column
service_date            DATE         NOT NULL,
discharge_date          DATE,

-- Timestamps for audit fields and real-time events
created_datetime        TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_datetime        TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,

-- Fiscal and reporting periods: store as DATE, not VARCHAR('2026Q1')
reporting_period_date   DATE         NOT NULL,

Avoid storing dates as VARCHAR or INTEGER (YYYYMMDD format). Dates stored as strings do not sort correctly, do not support date arithmetic, and accept invalid values like '2026-02-30'.

Monetary Amounts

-- Use DECIMAL/NUMERIC for monetary amounts — never FLOAT
-- FLOAT has rounding errors that accumulate in aggregate queries
paid_amount             DECIMAL(12, 2)   NOT NULL DEFAULT 0.00,
billed_amount           DECIMAL(12, 2)   NOT NULL DEFAULT 0.00,
allowed_amount          DECIMAL(12, 2),
copay_amount            DECIMAL(10, 2)   NOT NULL DEFAULT 0.00,

DECIMAL(12, 2) gives you up to $9.99 billion with two decimal places — sufficient for claim-level amounts. For capitation and actuarial calculations that require more precision, use DECIMAL(18, 6).

Code Fields

Healthcare code fields have well-defined formats. Encode the format in the data type:

-- NPI: exactly 10 digits, stored as CHAR not VARCHAR
npi_id                  CHAR(10)         NOT NULL,

-- [ICD-10](/terms/icd-10) codes: up to 7 characters (e.g., 'A15.0', 'Z23')
diagnosis_code          VARCHAR(10)      NOT NULL,

-- [CPT](/terms/cpt) codes: 5 characters (numeric or alphanumeric)
procedure_code          VARCHAR(5),

-- [NDC](/terms/ndc) codes: 11 digits
ndc_code                CHAR(11),

-- Plan IDs and member IDs vary by payer — use VARCHAR with a reasonable cap
member_id               VARCHAR(50)      NOT NULL,
plan_id                 VARCHAR(20)      NOT NULL,

Use CHAR for fixed-length codes (NPI, NDC). It communicates the expected length and in some platforms improves storage efficiency for fixed-width values.

Constraint Design

Constraints are the most underused feature in healthcare warehouse DDL. Many teams write CREATE TABLE with no constraints other than a PRIMARY KEY. This is a mistake.

🔄DDL Converter

Convert DDL instantly between Snowflake, BigQuery, Databricks, and PostgreSQL.

Try it free

Primary Keys

-- Surrogate key pattern — preferred for warehouse tables
CREATE TABLE fct_claims (
    claim_key           BIGINT           NOT NULL GENERATED ALWAYS AS IDENTITY,
    claim_id            VARCHAR(50)      NOT NULL,
    claim_line_number   SMALLINT         NOT NULL,

    CONSTRAINT pk_fct_claims PRIMARY KEY (claim_key),
    CONSTRAINT uq_fct_claims_natural_key UNIQUE (claim_id, claim_line_number)
);

Name every constraint. A named constraint appears in error messages and migration scripts by name — making debugging and schema diffs readable.

NOT NULL Constraints

Apply NOT NULL to every column that should not be null. In healthcare data, that is most columns:

CREATE TABLE dim_member (
    member_key          BIGINT           NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id           VARCHAR(50)      NOT NULL,
    member_first_name   VARCHAR(100)     NOT NULL,
    member_last_name    VARCHAR(100)     NOT NULL,
    member_dob_date     DATE             NOT NULL,
    member_gender_code  CHAR(1)          NOT NULL,
    plan_id             VARCHAR(20)      NOT NULL,
    effective_date      DATE             NOT NULL,
    termination_date    DATE,                         -- NULL means currently active
    is_deleted_flag     BOOLEAN          NOT NULL DEFAULT FALSE,
    created_datetime    TIMESTAMP        NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_datetime    TIMESTAMP        NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_dim_member PRIMARY KEY (member_key),
    CONSTRAINT uq_dim_member_id UNIQUE (member_id)
);

Check Constraints

Check constraints enforce business rules at the database level:

ALTER TABLE fct_claims
    ADD CONSTRAINT chk_claims_paid_amount
        CHECK (paid_amount >= 0),
    ADD CONSTRAINT chk_claims_billed_ge_paid
        CHECK (billed_amount >= paid_amount OR billed_amount IS NULL),
    ADD CONSTRAINT chk_claims_service_date_range
        CHECK (service_date >= '2000-01-01'),
    ADD CONSTRAINT chk_claims_gender
        CHECK (member_gender_code IN ('M', 'F', 'U'));

Foreign Keys

Foreign keys are often omitted in analytical warehouses for performance reasons — but they should be present in your DDL as documentation and enforced in staging layers:

ALTER TABLE fct_claims
    ADD CONSTRAINT fk_claims_member
        FOREIGN KEY (member_key) REFERENCES dim_member (member_key),
    ADD CONSTRAINT fk_claims_provider
        FOREIGN KEY (rendering_provider_key) REFERENCES dim_provider (provider_key);

If your warehouse does not enforce foreign keys at runtime (Snowflake and Redshift support declarative FK definitions without enforcement), define them anyway. ERD tools, lineage platforms, and documentation generators read them.

Indexing Strategy for Healthcare Queries

Healthcare analytical queries have predictable access patterns. Your index strategy should reflect them.

Claims Tables

Claims tables are the most frequently queried objects in a payer warehouse. Index for the three most common access patterns:

-- Primary filter: member + date range
CREATE INDEX idx_claims_member_service_date
    ON fct_claims (member_key, service_date);

-- Secondary filter: provider + date range
CREATE INDEX idx_claims_provider_service_date
    ON fct_claims (rendering_provider_key, service_date);

-- Lookups by claim ID
CREATE INDEX idx_claims_claim_id
    ON fct_claims (claim_id);

-- Diagnosis code lookups for clinical analytics and [HCC](/terms/hcc)
CREATE INDEX idx_claims_diagnosis_code
    ON fct_claims (primary_diagnosis_code);

Eligibility Tables

Eligibility queries almost always filter on a date range to find active coverage:

-- Point-in-time eligibility lookup
CREATE INDEX idx_eligibility_member_dates
    ON fct_member_eligibility (member_id, effective_date, termination_date);

Code Reference Tables

Reference tables (ICD-10, CPT, NDC) are small but queried on every claim load. Index the code column:

CREATE UNIQUE INDEX uq_ref_icd10_code ON ref_icd10 (icd10_code);
CREATE UNIQUE INDEX uq_ref_cpt_code ON ref_cpt (cpt_code);

Handling PHI Fields in DDL

PHI columns require explicit handling at the DDL level — not just in application code.

Column-Level Comments

Document PHI fields in your DDL with column comments that your governance tooling can read:

-- Snowflake column comment syntax
ALTER TABLE dim_member MODIFY COLUMN member_ssn_phi
    COMMENT 'PHI: Social Security Number. Masked for all roles except compliance_admin.';

ALTER TABLE dim_member MODIFY COLUMN member_dob_date
    COMMENT 'PHI: Date of Birth. Generalized to birth year for roles below analyst_phi tier.';

Data Classification Tags

In platforms that support it, tag PHI columns at the DDL stage:

-- Snowflake object tagging
ALTER TABLE dim_member MODIFY COLUMN member_ssn_phi
    SET TAG governance.phi_classification = 'SSN';

ALTER TABLE dim_member MODIFY COLUMN member_dob_date
    SET TAG governance.phi_classification = 'DOB';

This approach means masking policies can target the tag rather than hardcoding column names — surviving column renames.

DDL for Slowly Changing Dimensions in Healthcare

Member enrollment, provider rosters, and plan configurations change over time. Use Type 2 SCD patterns:

CREATE TABLE dim_member_scd2 (
    member_key              BIGINT       NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id               VARCHAR(50)  NOT NULL,
    member_first_name       VARCHAR(100) NOT NULL,
    member_last_name        VARCHAR(100) NOT NULL,
    member_dob_date         DATE         NOT NULL,
    plan_id                 VARCHAR(20)  NOT NULL,

    -- SCD2 control columns
    effective_date          DATE         NOT NULL,
    expiration_date         DATE         NOT NULL DEFAULT '9999-12-31',
    is_current_flag         BOOLEAN      NOT NULL DEFAULT TRUE,
    source_hash             CHAR(64),

    created_datetime        TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_dim_member_scd2 PRIMARY KEY (member_key),
    CONSTRAINT uq_dim_member_scd2_natural UNIQUE (member_id, effective_date)
);

-- Index for point-in-time lookups
CREATE INDEX idx_member_scd2_current
    ON dim_member_scd2 (member_id, is_current_flag);

Versioning DDL Scripts

DDL scripts need version control the same way application code does. Structure your migration files so they are replayable in order:

migrations/
  V001__create_dim_member.sql
  V002__create_fct_claims.sql
  V003__add_phi_tags_dim_member.sql
  V004__add_index_claims_provider.sql
  V005__alter_claims_add_drg_code.sql

Each migration file should be idempotent where possible — using CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS, and ALTER TABLE ... ADD COLUMN IF NOT EXISTS.

Frequently Asked Questions

Should warehouse tables use surrogate keys or natural keys as primary keys?

Use surrogate keys (auto-incrementing integers or GUIDs) as primary keys in analytical layer tables. Natural keys belong in a separate unique constraint. This pattern keeps join performance predictable and handles natural key changes (like NPI deactivations) without cascading updates.

How long should VARCHAR columns be?

Define VARCHAR lengths based on the domain, not a default of 255. Unconstrained VARCHAR(255) accepts invalid data that your application would reject. For code fields, use CHAR(n) for fixed-length codes and VARCHAR with a realistic maximum for variable-length fields.

When should I use BOOLEAN vs CHAR(1) for flag columns?

Use BOOLEAN where your platform supports it (Snowflake, PostgreSQL, Databricks). Use CHAR(1) with a check constraint on platforms where BOOLEAN has limited support or where your ETL source uses character flags.

Operationalizing DDL Quality with mdatool

For healthcare data engineering teams writing and reviewing DDL at scale, mdatool provides tooling at every stage. The mdatool DDL Converter translates DDL across warehouse dialects — converting Snowflake CREATE TABLE scripts to BigQuery, PostgreSQL, or Redshift syntax while preserving data types, constraints, and index definitions. The mdatool Naming Auditor validates every column name in a DDL script against your team's representation term standards before the schema reaches production. The mdatool SQL Linter catches DDL patterns that create downstream query problems — missing NOT NULL constraints, implicit data type coercions, and SELECT star in view definitions. The mdatool Schema Diff generates a precise diff between two DDL versions, making schema review and migration planning tractable at scale.

How to Write a DDL Script for Healthcare Data Warehouses: Best Practices and Examples