Why Claims Data Modeling Is Harder Than It Looks
Healthcare claims are the financial record of care delivery, and designing a data model to hold them correctly is one of the highest-stakes schema decisions a healthcare data engineer makes. A poorly normalized claims model produces incorrect utilization reports, miscalculated provider reimbursement, and failed CMS submissions. A model that conflates professional and institutional claims, or collapses claim header and line into a single flat table, will produce errors that compound across every downstream use case — revenue cycle, quality measurement, risk adjustment, and network analytics.
This guide covers the core tables, relationships, and SQL DDL for a production-grade payer-side claims data warehouse. The schema covers professional claims (837P), institutional claims (837I), remittance (835), and the member and provider dimensions they reference.
While ERwin requires a complex setup for schema generation, you can generate clean DDL in seconds using our free converter.
Convert your first 5 DDLs — No Credit Card RequiredClaims Data Fundamentals
Before touching a schema, every data engineer working with claims should understand three structural facts.
Claim header vs. claim line. Every claim has a header record (one per claim) and one or more line records (one per service). The header holds the member, billing provider, dates of service, and claim-level financial totals. Each line holds the specific procedure code (CPT or revenue code), units, billed amount, and line-level adjudication. Never flatten header and line into a single table — the many-to-one relationship is fundamental.
Professional vs. institutional. Professional claims (837P) are submitted by physicians and other non-facility providers using CPT codes on the CMS-1500 form. Institutional claims (837I) are submitted by hospitals and facilities using revenue codes on the UB-04. They share a header structure but diverge significantly at the line level. A clean schema separates them or uses a discriminator column — mixing them undifferentiated is a major source of downstream errors.
Adjudicated vs. paid. The adjudication date is when the payer makes a coverage decision. The paid date is when funds transfer. Many claims are adjudicated and denied; others are adjudicated, paid, then reversed. Your schema must accommodate the full lifecycle.
Core Schema: Claim Header
CREATE TABLE claims.clm_header (
clm_id VARCHAR(50) NOT NULL,
clm_type_cd VARCHAR(10) NOT NULL, -- PROF (837P) or INST (837I)
mbr_id VARCHAR(50) NOT NULL,
billing_npi VARCHAR(10) NOT NULL,
rendering_npi VARCHAR(10),
referring_npi VARCHAR(10),
billing_tin VARCHAR(9),
plan_id VARCHAR(20) NOT NULL,
svc_from_dt DATE NOT NULL,
svc_thru_dt DATE NOT NULL,
clm_recv_dt DATE NOT NULL,
adjudication_dt DATE,
paid_dt DATE,
clm_sts_cd VARCHAR(20) NOT NULL, -- PAID, DENIED, PENDED, REVERSED
billed_amt NUMBER(12,2) NOT NULL,
allowed_amt NUMBER(12,2),
paid_amt NUMBER(12,2),
denial_rsn_cd VARCHAR(20),
place_of_svc_cd VARCHAR(2), -- CMS POS codes (837P only)
bill_type_cd VARCHAR(3), -- UB-04 bill type (837I only)
drg_cd VARCHAR(10), -- [DRG](/terms/drg) (inpatient 837I only)
PRIMARY KEY (clm_id)
);
Claim Line Table
CREATE TABLE claims.clm_line (
clm_id VARCHAR(50) NOT NULL,
clm_line_nbr SMALLINT NOT NULL,
proc_cd VARCHAR(5), -- CPT code (837P)
rev_cd VARCHAR(4), -- Revenue code (837I)
proc_mod_cd1 VARCHAR(2),
proc_mod_cd2 VARCHAR(2),
icd_diag_cd VARCHAR(7) NOT NULL, -- principal diagnosis, no decimal
units NUMBER(8,2) NOT NULL,
billed_amt NUMBER(12,2) NOT NULL,
allowed_amt NUMBER(12,2),
paid_amt NUMBER(12,2),
denial_rsn_cd VARCHAR(20),
svc_from_dt DATE NOT NULL,
svc_thru_dt DATE NOT NULL,
rendering_npi VARCHAR(10),
PRIMARY KEY (clm_id, clm_line_nbr),
FOREIGN KEY (clm_id) REFERENCES claims.clm_header(clm_id)
);
Diagnosis Codes (Normalized)
Professional claims carry up to 12 diagnosis codes; institutional claims up to 25. Store them normalized, not as 25 columns on the header.
CREATE TABLE claims.clm_diagnosis (
clm_id VARCHAR(50) NOT NULL,
dx_cd_seq SMALLINT NOT NULL, -- 1 = principal
icd_diag_cd VARCHAR(7) NOT NULL,
poa_ind CHAR(1), -- Present on Admission (institutional)
PRIMARY KEY (clm_id, dx_cd_seq)
);
Remittance (835) Table
The 835 Electronic Remittance Advice closes the loop between adjudication and payment. Link it to claim header via claim ID.
CREATE TABLE claims.remittance (
remit_id VARCHAR(50) NOT NULL,
clm_id VARCHAR(50) NOT NULL,
clm_line_nbr SMALLINT, -- null = header-level adjustment
check_nbr VARCHAR(30),
paid_dt DATE NOT NULL,
paid_amt NUMBER(12,2) NOT NULL,
adj_rsn_cd VARCHAR(5), -- CAS segment reason code
adj_amt NUMBER(12,2),
remit_type_cd VARCHAR(10) NOT NULL, -- PAYMENT, REVERSAL, ADJUSTMENT
payer_id VARCHAR(30),
PRIMARY KEY (remit_id)
);
Member and Provider Dimensions
CREATE TABLE claims.mbr_dim (
mbr_id VARCHAR(50) NOT NULL,
last_nm VARCHAR(100) NOT NULL,
first_nm VARCHAR(100) NOT NULL,
dob DATE NOT NULL,
sex_cd CHAR(1) NOT NULL,
mbr_state_cd CHAR(2),
zip_cd VARCHAR(10),
plan_id VARCHAR(20) NOT NULL,
eff_dt DATE NOT NULL,
term_dt DATE,
PRIMARY KEY (mbr_id, eff_dt)
);
CREATE TABLE claims.prvdr_dim (
npi VARCHAR(10) NOT NULL,
prvdr_last_nm VARCHAR(100),
prvdr_first_nm VARCHAR(100),
prvdr_org_nm VARCHAR(255),
taxonomy_cd VARCHAR(15),
prvdr_type_cd VARCHAR(20),
prvdr_state_cd CHAR(2),
zip_cd VARCHAR(10),
nppes_refresh_dt DATE,
PRIMARY KEY (npi)
);
Key Query Patterns
Paid claims per member per month
SELECT
DATE_TRUNC('month', h.paid_dt) AS paid_month,
h.mbr_id,
COUNT(DISTINCT h.clm_id) AS clm_count,
SUM(h.paid_amt) AS total_paid
FROM claims.clm_header h
WHERE h.clm_sts_cd = 'PAID'
AND h.paid_dt BETWEEN '2025-01-01' AND '2025-12-31'
GROUP BY 1, 2
ORDER BY 1, total_paid DESC;
Denial rate by billing provider
SELECT
h.billing_npi,
p.prvdr_org_nm,
COUNT(*) AS total_claims,
COUNT(*) FILTER (WHERE h.clm_sts_cd = 'DENIED') AS denied_claims,
ROUND(
COUNT(*) FILTER (WHERE h.clm_sts_cd = 'DENIED') * 100.0
/ NULLIF(COUNT(*), 0), 1
) AS denial_rate_pct
FROM claims.clm_header h
LEFT JOIN claims.prvdr_dim p ON h.billing_npi = p.npi
GROUP BY 1, 2
HAVING COUNT(*) > 50
ORDER BY denial_rate_pct DESC;
Common Data Quality Problems
Duplicate claims are endemic in raw 837 feeds. Deduplicate on a composite key of (clm_id, svc_from_dt, billed_amt, billing_npi) before loading. Treating each raw EDI transaction as unique will inflate utilization metrics significantly.
ICD-10 format inconsistency. Some source systems include the decimal point in diagnosis codes and others strip it. Normalize to no-decimal format at ingestion — CMS reference files use the no-decimal format and your crosswalks depend on exact matching.
Adjudication date sequencing. Enforce a pipeline rule that adjudication_dt >= svc_thru_dt and paid_dt >= adjudication_dt. Violations are real and common due to batch processing lag, and they make revenue cycle reports impossible to reconcile.
Frequently Asked Questions
What is the difference between 837P and 837I claims?
837P (professional) claims are submitted by individual providers — physicians, nurse practitioners, therapists — for services rendered in an office or non-facility setting. They use CPT procedure codes. 837I (institutional) claims are submitted by hospitals and facilities using revenue codes and the UB-04 form. The key data model difference is at the claim line level: 837P lines carry CPT codes; 837I lines carry revenue codes. Mixing them in a single undifferentiated table causes systematic errors in procedure-based analytics.
What is a claim status code and what values should I expect?
The claim status tracks where a claim is in the adjudication lifecycle. Common values are PAID, DENIED (with a denial reason code explaining why), PENDED (under review), REVERSED (previously paid, now clawed back), and ADJUSTMENT (a corrected claim). Your data model must handle reversals correctly — a naive SUM of paid amounts will be wrong if reversed claims are not offset.
How should I handle claim reversals in a data warehouse?
Store reversals as separate records linked to the original claim. Do not delete or overwrite the original. In reporting, calculate net paid as the sum of paid amounts plus the sum of reversal amounts (negative). Claims that have been reversed and reprocessed appear as three records: original paid, reversal, and replacement paid.
Why is the adjudication date different from the paid date?
Adjudication is the coverage decision — covered or not, at what allowed amount, under what plan rules. Payment is the disbursement of funds. The gap is typically 14–30 days. For revenue cycle analysis, use adjudication date to measure processing speed; use paid date to reconcile with bank statements and capitation reports.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
Key Terms in This Article
More in Healthcare Data Modeling
AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in 30 Seconds
Healthcare data architects spend days designing schemas from scratch — Medicare Advantage claims warehouses alone require 20+ tables, hundreds of columns, and platform-specific syntax. The mdatool AI Data Modeling tool generates a production-ready Star Schema for Snowflake, BigQuery, or Databricks in 30 seconds, with ISO-11179 standard column names built in.
Read moreEnforcing ISO-11179 Healthcare Naming Standards in dbt Projects
Every healthcare data warehouse eventually develops naming drift — DOB in one model, birth_dt in another, member_birth_date in a third. The dbt-healthcare-standards package brings ISO-11179 column naming directly into your dbt project as installable macros and schema tests.
Read moreAI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in Seconds
Healthcare data models take weeks to design manually — HIPAA constraints, CMS reporting requirements, and ISO-11179 naming standards all have to be applied correctly from the start. AI data modeling changes that. Here is how to generate production-ready schemas for Snowflake, BigQuery, and Databricks in seconds.
Read moreFree Tools
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.