mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data teams.

HomeBlogHealthcare Data ModelingHealthcare Claims Data Model: Complete SQL Schema for Payers
Healthcare Data Modeling

Healthcare Claims Data Model: Complete SQL Schema for Payers

The complete SQL schema for a payer-side healthcare claims data warehouse — including claim header, claim line, adjudication, remittance, and provider tables with DDL for Snowflake and BigQuery.

mdatool Team·April 29, 2026·13 min read
claims data model837adjudicationhealthcare data warehouseSQL schemaSnowflakeBigQuery

Why Claims Data Modeling Is Harder Than It Looks

Healthcare claims are the financial record of care delivery, and designing a data model to hold them correctly is one of the highest-stakes schema decisions a healthcare data engineer makes. A poorly normalized claims model produces incorrect utilization reports, miscalculated provider reimbursement, and failed CMS submissions. A model that conflates professional and institutional claims, or collapses claim header and line into a single flat table, will produce errors that compound across every downstream use case — revenue cycle, quality measurement, risk adjustment, and network analytics.

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

This guide covers the core tables, relationships, and SQL DDL for a production-grade payer-side claims data warehouse. The schema covers professional claims (837P), institutional claims (837I), remittance (835), and the member and provider dimensions they reference.

While ERwin requires a complex setup for schema generation, you can generate clean DDL in seconds using our free converter.

Convert your first 5 DDLs — No Credit Card Required

Claims Data Fundamentals

Before touching a schema, every data engineer working with claims should understand three structural facts.

Claim header vs. claim line. Every claim has a header record (one per claim) and one or more line records (one per service). The header holds the member, billing provider, dates of service, and claim-level financial totals. Each line holds the specific procedure code (CPT or revenue code), units, billed amount, and line-level adjudication. Never flatten header and line into a single table — the many-to-one relationship is fundamental.

Professional vs. institutional. Professional claims (837P) are submitted by physicians and other non-facility providers using CPT codes on the CMS-1500 form. Institutional claims (837I) are submitted by hospitals and facilities using revenue codes on the UB-04. They share a header structure but diverge significantly at the line level. A clean schema separates them or uses a discriminator column — mixing them undifferentiated is a major source of downstream errors.

Adjudicated vs. paid. The adjudication date is when the payer makes a coverage decision. The paid date is when funds transfer. Many claims are adjudicated and denied; others are adjudicated, paid, then reversed. Your schema must accommodate the full lifecycle.


Core Schema: Claim Header

CREATE TABLE claims.clm_header (
  clm_id              VARCHAR(50)    NOT NULL,
  clm_type_cd         VARCHAR(10)    NOT NULL,   -- PROF (837P) or INST (837I)
  mbr_id              VARCHAR(50)    NOT NULL,
  billing_npi         VARCHAR(10)    NOT NULL,
  rendering_npi       VARCHAR(10),
  referring_npi       VARCHAR(10),
  billing_tin         VARCHAR(9),
  plan_id             VARCHAR(20)    NOT NULL,
  svc_from_dt         DATE           NOT NULL,
  svc_thru_dt         DATE           NOT NULL,
  clm_recv_dt         DATE           NOT NULL,
  adjudication_dt     DATE,
  paid_dt             DATE,
  clm_sts_cd          VARCHAR(20)    NOT NULL,   -- PAID, DENIED, PENDED, REVERSED
  billed_amt          NUMBER(12,2)   NOT NULL,
  allowed_amt         NUMBER(12,2),
  paid_amt            NUMBER(12,2),
  denial_rsn_cd       VARCHAR(20),
  place_of_svc_cd     VARCHAR(2),                -- CMS POS codes (837P only)
  bill_type_cd        VARCHAR(3),                -- UB-04 bill type (837I only)
  drg_cd              VARCHAR(10),               -- [DRG](/terms/drg) (inpatient 837I only)
  PRIMARY KEY (clm_id)
);

Claim Line Table

CREATE TABLE claims.clm_line (
  clm_id              VARCHAR(50)    NOT NULL,
  clm_line_nbr        SMALLINT       NOT NULL,
  proc_cd             VARCHAR(5),                -- CPT code (837P)
  rev_cd              VARCHAR(4),                -- Revenue code (837I)
  proc_mod_cd1        VARCHAR(2),
  proc_mod_cd2        VARCHAR(2),
  icd_diag_cd         VARCHAR(7)     NOT NULL,   -- principal diagnosis, no decimal
  units               NUMBER(8,2)    NOT NULL,
  billed_amt          NUMBER(12,2)   NOT NULL,
  allowed_amt         NUMBER(12,2),
  paid_amt            NUMBER(12,2),
  denial_rsn_cd       VARCHAR(20),
  svc_from_dt         DATE           NOT NULL,
  svc_thru_dt         DATE           NOT NULL,
  rendering_npi       VARCHAR(10),
  PRIMARY KEY (clm_id, clm_line_nbr),
  FOREIGN KEY (clm_id) REFERENCES claims.clm_header(clm_id)
);

Diagnosis Codes (Normalized)

Professional claims carry up to 12 diagnosis codes; institutional claims up to 25. Store them normalized, not as 25 columns on the header.

CREATE TABLE claims.clm_diagnosis (
  clm_id              VARCHAR(50)    NOT NULL,
  dx_cd_seq           SMALLINT       NOT NULL,   -- 1 = principal
  icd_diag_cd         VARCHAR(7)     NOT NULL,
  poa_ind             CHAR(1),                   -- Present on Admission (institutional)
  PRIMARY KEY (clm_id, dx_cd_seq)
);

Remittance (835) Table

The 835 Electronic Remittance Advice closes the loop between adjudication and payment. Link it to claim header via claim ID.

CREATE TABLE claims.remittance (
  remit_id            VARCHAR(50)    NOT NULL,
  clm_id              VARCHAR(50)    NOT NULL,
  clm_line_nbr        SMALLINT,                  -- null = header-level adjustment
  check_nbr           VARCHAR(30),
  paid_dt             DATE           NOT NULL,
  paid_amt            NUMBER(12,2)   NOT NULL,
  adj_rsn_cd          VARCHAR(5),                -- CAS segment reason code
  adj_amt             NUMBER(12,2),
  remit_type_cd       VARCHAR(10)    NOT NULL,   -- PAYMENT, REVERSAL, ADJUSTMENT
  payer_id            VARCHAR(30),
  PRIMARY KEY (remit_id)
);

Member and Provider Dimensions

CREATE TABLE claims.mbr_dim (
  mbr_id              VARCHAR(50)    NOT NULL,
  last_nm             VARCHAR(100)   NOT NULL,
  first_nm            VARCHAR(100)   NOT NULL,
  dob                 DATE           NOT NULL,
  sex_cd              CHAR(1)        NOT NULL,
  mbr_state_cd        CHAR(2),
  zip_cd              VARCHAR(10),
  plan_id             VARCHAR(20)    NOT NULL,
  eff_dt              DATE           NOT NULL,
  term_dt             DATE,
  PRIMARY KEY (mbr_id, eff_dt)
);

CREATE TABLE claims.prvdr_dim (
  npi                 VARCHAR(10)    NOT NULL,
  prvdr_last_nm       VARCHAR(100),
  prvdr_first_nm      VARCHAR(100),
  prvdr_org_nm        VARCHAR(255),
  taxonomy_cd         VARCHAR(15),
  prvdr_type_cd       VARCHAR(20),
  prvdr_state_cd      CHAR(2),
  zip_cd              VARCHAR(10),
  nppes_refresh_dt    DATE,
  PRIMARY KEY (npi)
);

Key Query Patterns

Paid claims per member per month

SELECT
    DATE_TRUNC('month', h.paid_dt)   AS paid_month,
    h.mbr_id,
    COUNT(DISTINCT h.clm_id)         AS clm_count,
    SUM(h.paid_amt)                  AS total_paid
FROM claims.clm_header h
WHERE h.clm_sts_cd = 'PAID'
  AND h.paid_dt BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY 1, 2
ORDER BY 1, total_paid DESC;

Denial rate by billing provider

SELECT
    h.billing_npi,
    p.prvdr_org_nm,
    COUNT(*)                                                     AS total_claims,
    COUNT(*) FILTER (WHERE h.clm_sts_cd = 'DENIED')             AS denied_claims,
    ROUND(
      COUNT(*) FILTER (WHERE h.clm_sts_cd = 'DENIED') * 100.0
      / NULLIF(COUNT(*), 0), 1
    )                                                            AS denial_rate_pct
FROM claims.clm_header h
LEFT JOIN claims.prvdr_dim p ON h.billing_npi = p.npi
GROUP BY 1, 2
HAVING COUNT(*) > 50
ORDER BY denial_rate_pct DESC;

Common Data Quality Problems

Duplicate claims are endemic in raw 837 feeds. Deduplicate on a composite key of (clm_id, svc_from_dt, billed_amt, billing_npi) before loading. Treating each raw EDI transaction as unique will inflate utilization metrics significantly.

ICD-10 format inconsistency. Some source systems include the decimal point in diagnosis codes and others strip it. Normalize to no-decimal format at ingestion — CMS reference files use the no-decimal format and your crosswalks depend on exact matching.

🔎ICD-10 Search

Search all 70,000+ ICD-10-CM diagnosis codes instantly by description or code prefix.

Try it free

Adjudication date sequencing. Enforce a pipeline rule that adjudication_dt >= svc_thru_dt and paid_dt >= adjudication_dt. Violations are real and common due to batch processing lag, and they make revenue cycle reports impossible to reconcile.


Frequently Asked Questions

What is the difference between 837P and 837I claims?

837P (professional) claims are submitted by individual providers — physicians, nurse practitioners, therapists — for services rendered in an office or non-facility setting. They use CPT procedure codes. 837I (institutional) claims are submitted by hospitals and facilities using revenue codes and the UB-04 form. The key data model difference is at the claim line level: 837P lines carry CPT codes; 837I lines carry revenue codes. Mixing them in a single undifferentiated table causes systematic errors in procedure-based analytics.

What is a claim status code and what values should I expect?

The claim status tracks where a claim is in the adjudication lifecycle. Common values are PAID, DENIED (with a denial reason code explaining why), PENDED (under review), REVERSED (previously paid, now clawed back), and ADJUSTMENT (a corrected claim). Your data model must handle reversals correctly — a naive SUM of paid amounts will be wrong if reversed claims are not offset.

How should I handle claim reversals in a data warehouse?

Store reversals as separate records linked to the original claim. Do not delete or overwrite the original. In reporting, calculate net paid as the sum of paid amounts plus the sum of reversal amounts (negative). Claims that have been reversed and reprocessed appear as three records: original paid, reversal, and replacement paid.

Why is the adjudication date different from the paid date?

Adjudication is the coverage decision — covered or not, at what allowed amount, under what plan rules. Payment is the disbursement of funds. The gap is typically 14–30 days. For revenue cycle analysis, use adjudication date to measure processing speed; use paid date to reconcile with bank statements and capitation reports.

M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

Claims Adjudication

Medical claims processing, auto-adjudication, EOB generation, and denial management.

Read Guide

EDI Transactions

X12 EDI 837, 835, 270/271, and healthcare electronic data interchange.

Read Guide

Key Terms in This Article

Adjudication Date

More in Healthcare Data Modeling

AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in 30 Seconds

Healthcare data architects spend days designing schemas from scratch — Medicare Advantage claims warehouses alone require 20+ tables, hundreds of columns, and platform-specific syntax. The mdatool AI Data Modeling tool generates a production-ready Star Schema for Snowflake, BigQuery, or Databricks in 30 seconds, with ISO-11179 standard column names built in.

Read more

Enforcing ISO-11179 Healthcare Naming Standards in dbt Projects

Every healthcare data warehouse eventually develops naming drift — DOB in one model, birth_dt in another, member_birth_date in a third. The dbt-healthcare-standards package brings ISO-11179 column naming directly into your dbt project as installable macros and schema tests.

Read more

AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in Seconds

Healthcare data models take weeks to design manually — HIPAA constraints, CMS reporting requirements, and ISO-11179 naming standards all have to be applied correctly from the start. AI data modeling changes that. Here is how to generate production-ready schemas for Snowflake, BigQuery, and Databricks in seconds.

Read more

Free Tools

Free SQL Linter

Catch SQL bugs, performance issues, and naming violations before production.

Try it free

Free DDL Converter

Translate SQL schemas between Snowflake, BigQuery, Oracle, and SQL Server.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • Why Claims Data Modeling Is Harder Than It Looks
  • Claims Data Fundamentals
  • Core Schema: Claim Header
  • Claim Line Table
  • Diagnosis Codes (Normalized)
  • Remittance (835) Table
  • Member and Provider Dimensions
  • Key Query Patterns
  • Paid claims per member per month
  • Denial rate by billing provider
  • Common Data Quality Problems
  • Frequently Asked Questions
  • What is the difference between 837P and 837I claims?
  • What is a claim status code and what values should I expect?
  • How should I handle claim reversals in a data warehouse?
  • Why is the adjudication date different from the paid date?

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools