mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data teams.

HomeBlogClaims & AdjudicationHealthcare Claims Data: Complete Guide for Data Architects and Engineers
Claims & Adjudication

Healthcare Claims Data: Complete Guide for Data Architects and Engineers

Everything healthcare data architects and engineers need to know about claims data — from EDI 837 transaction standards and ICD-10 coding to production-ready Star Schema design for Snowflake, Databricks, and BigQuery. Includes DDL examples, common analytics use cases, and data quality best practices.

mdatool Team·June 13, 2026·15 min read
claims dataEDI 837ICD-10healthcare data warehousestar schemaSnowflakeDatabricksBigQueryHCCHEDISdata modelingclaims analytics

Healthcare claims data is the backbone of payer analytics. Every time a patient receives care — a physician visit, a hospital stay, a prescription fill — a claim is generated and submitted to the insurance payer for reimbursement. For health plans, PBMs, and provider organizations, claims data is the primary source of truth for financial reporting, quality measurement, risk adjustment, and population health analytics.

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

This guide covers everything a healthcare data architect, data modeler, or data engineer needs to know about claims data — from the core data elements and EDI transaction standards to production-ready schema design for Snowflake, Databricks, and BigQuery.

What Is Healthcare Claims Data?

A healthcare claim is a formal request for reimbursement submitted by a healthcare provider to an insurance payer after delivering covered services to a member. Claims data is the structured, coded representation of that request — capturing who provided the service, who received it, what was done, when and where it happened, and how much was charged.

Claims data is generated across three claim types:

Professional claims (CMS-1500 / EDI 837P) are submitted by physicians, nurse practitioners, physician assistants, and other professional providers for outpatient and office-based services. Professional claims identify the rendering provider by NPI, describe services using CPT and HCPCS procedure codes, report diagnoses using ICD-10-CM codes, and specify the place of service.

🏥NPI Lookup

Look up any NPI number and validate provider data against the NPPES registry.

Try it free

Institutional claims (UB-04 / EDI 837I) are submitted by hospitals, skilled nursing facilities, ambulatory surgery centers, and other facility providers. Institutional claims use revenue codes to identify service categories, report DRG assignments for inpatient stays, and include detailed service line information for each billable item.

Pharmacy claims (NCPDP / EDI 837D) are submitted by retail pharmacies, mail-order pharmacies, and specialty pharmacies for prescription drug dispensing. Pharmacy claims identify drugs by National Drug Code, report days supply and quantity dispensed, and capture formulary tier and prior authorization information.

Core Claims Data Elements

Understanding the key data elements in claims data is essential before designing any claims data warehouse. Here are the most important fields every healthcare data team works with:

The claim identifier (clm_id) is the unique key assigned to each claim transaction. It links all service lines within a claim and serves as the primary join key across claims processing tables.

The claim type code (clm_typ_cd) distinguishes professional, institutional, and pharmacy claims and drives which adjudication rules and fee schedules apply.

Diagnosis codes (diag_cd) captured as ICD-10-CM codes identify the medical conditions that justify the services billed. Claims may carry up to 25 diagnosis codes, with the principal diagnosis in the first position driving DRG assignment for inpatient claims.

Procedure codes (proc_cd) identify the specific services performed using CPT codes for professional services and HCPCS Level II codes for supplies, drugs, and non-physician services.

The claim paid amount (clm_pd_amt) is the total dollar amount the payer reimbursed for the claim after applying contractual adjustments, member cost-sharing, and any other reductions from billed charges.

The claim billed amount (clm_bill_amt) is the provider standard charge before any adjustments, representing the gross charge submitted to the payer.

The claim allowed amount (clm_alwd_amt) is the maximum the payer will pay based on the contracted fee schedule, serving as the basis for member cost-sharing calculations.

The claim status code (clm_sts_cd) indicates whether the claim was paid, denied, pending, or adjusted and is the primary field for tracking adjudication outcomes.

The service from date (svc_from_dt) and service to date (svc_to_dt) identify when care was delivered and anchor all date-based analytics including eligibility verification and quality measure assignment.

The rendering provider NPI (prvdr_rndrg_npi) identifies the individual clinician who delivered the service, enabling provider-level performance measurement.

The member identifier (mbr_id) links the claim to the member enrollment record, enabling longitudinal analytics across all care settings for a given member.

Claims Data Standards and EDI Transactions

Healthcare claims data flows through standardized electronic data interchange formats mandated under HIPAA. Every healthcare data engineer working with claims data needs to understand these transaction standards:

EDI 837 is the HIPAA standard electronic claims transaction. The 837P is used for professional claims, 837I for institutional claims, and 837D for dental claims. Understanding 837 loop and segment structure is essential for parsing raw claims feeds from clearinghouses.

EDI 835 is the electronic remittance advice transaction that communicates how a claim was paid or denied. The 835 contains Claim Adjustment Reason Codes and Remittance Advice Remark Codes that explain payment decisions. Healthcare data teams process 835 transactions to populate denial reason fields and calculate contractual adjustment amounts.

ICD-10-CM is the diagnosis code standard used on all claim types since October 2015, replacing ICD-9. With over 70,000 codes organized in a hierarchical structure, ICD-10-CM provides the clinical specificity needed for risk adjustment, quality measurement, and population health analytics. Search our ICD-10 code library to find diagnosis codes for any condition.

CPT codes maintained by the American Medical Association identify physician procedures and evaluation and management services. CPT codes drive professional claim reimbursement under the Medicare Physician Fee Schedule and commercial fee schedules.

Revenue codes are four-digit codes used on institutional claims to classify service categories including room and board, operating room, laboratory, radiology, and pharmacy services.

DRG codes are the Medicare inpatient payment classification system. Each inpatient admission is assigned to one of over 750 MS-DRGs based on principal diagnosis, secondary diagnoses, procedures, age, and discharge status, with a fixed payment weight determining reimbursement.

Claims Data Model Design

Designing an effective claims data warehouse requires careful consideration of grain, dimensionality, and the analytical use cases the schema needs to support. Here is a production-ready Star Schema for healthcare claims analytics.

Fact Table: FACT_CLAIM_HEADER

The claim header fact table captures one row per claim with summary financial amounts and key dimension foreign keys:

-- Snowflake DDL — generated with mdatool AI Data Modeling
CREATE TABLE FACT_CLAIM_HEADER (
  CLM_ID              VARCHAR(50)     NOT NULL,
  MBR_KEY             INTEGER         NOT NULL,
  PRVDR_KEY           INTEGER         NOT NULL,
  PLAN_KEY            INTEGER         NOT NULL,
  SVC_FROM_DT_KEY     INTEGER         NOT NULL,
  SVC_TO_DT_KEY       INTEGER         NOT NULL,
  CLM_TYP_CD          VARCHAR(10),
  CLM_STAT_CD         VARCHAR(10),
  TOT_CHRG_AMT        DECIMAL(18,2),
  TOT_ALWD_AMT        DECIMAL(18,2),
  TOT_PD_AMT          DECIMAL(18,2),
  TOT_MBR_LIAB_AMT    DECIMAL(18,2),
  TOT_COINS_AMT       DECIMAL(18,2),
  TOT_COPAY_AMT       DECIMAL(18,2),
  TOT_DED_AMT         DECIMAL(18,2),
  DIAG_CD_1           VARCHAR(10),
  DIAG_CD_2           VARCHAR(10),
  DIAG_CD_3           VARCHAR(10),
  DRG_CD              VARCHAR(10),
  ADMIT_DT            DATE,
  DSCH_DT             DATE,
  DSCH_DISP_CD        VARCHAR(5),
  LOAD_DT             TIMESTAMP_NTZ   NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT PK_FACT_CLAIM_HEADER PRIMARY KEY (CLM_ID)
);

Fact Table: FACT_CLAIM_LINE

The claim line fact table captures one row per service line with procedure-level detail:

CREATE TABLE FACT_CLAIM_LINE (
  CLM_ID              VARCHAR(50)     NOT NULL,
  CLM_LINE_NBR        SMALLINT        NOT NULL,
  PRVDR_KEY           INTEGER         NOT NULL,
  SVC_DT_KEY          INTEGER         NOT NULL,
  PROC_CD             VARCHAR(10),
  PROC_MOD_CD_1       VARCHAR(2),
  PROC_MOD_CD_2       VARCHAR(2),
  REV_CD              VARCHAR(4),
  POS_CD              VARCHAR(2),
  DIAG_CD             VARCHAR(10),
  UNITS_QTY           DECIMAL(10,3),
  CHRG_AMT            DECIMAL(18,2),
  ALWD_AMT            DECIMAL(18,2),
  PD_AMT              DECIMAL(18,2),
  DENIAL_RSN_CD       VARCHAR(10),
  LOAD_DT             TIMESTAMP_NTZ   NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT PK_FACT_CLAIM_LINE
    PRIMARY KEY (CLM_ID, CLM_LINE_NBR),
  CONSTRAINT FK_CLAIM_LINE_HEADER
    FOREIGN KEY (CLM_ID) REFERENCES FACT_CLAIM_HEADER(CLM_ID)
);

Dimension Tables

CREATE TABLE DIM_MEMBER (
  MBR_KEY             INTEGER         NOT NULL,
  MBR_ID              VARCHAR(50)     NOT NULL,
  MBR_FIRST_NM        VARCHAR(100),
  MBR_LAST_NM         VARCHAR(100),
  BIRTH_DT            DATE,
  SEX_CD              CHAR(1),
  STATE_CD            CHAR(2),
  ZIP_CD              VARCHAR(10),
  DUAL_ELIG_IND       CHAR(1),
  EFF_START_DT        DATE            NOT NULL,
  EFF_END_DT          DATE,
  CURR_ROW_IND        BOOLEAN         NOT NULL DEFAULT TRUE,
  CONSTRAINT PK_DIM_MEMBER PRIMARY KEY (MBR_KEY)
);

CREATE TABLE DIM_PROVIDER (
  PRVDR_KEY           INTEGER         NOT NULL,
  PRVDR_NPI           VARCHAR(10),
  PRVDR_FIRST_NM      VARCHAR(100),
  PRVDR_LAST_NM       VARCHAR(100),
  PRVDR_ORG_NM        VARCHAR(255),
  PRVDR_TYP_CD        VARCHAR(20),
  PRVDR_SPCLTY_CD     VARCHAR(10),
  PRVDR_STATE_CD      CHAR(2),
  PRVDR_ZIP_CD        VARCHAR(10),
  EFF_START_DT        DATE            NOT NULL,
  EFF_END_DT          DATE,
  CURR_ROW_IND        BOOLEAN         NOT NULL DEFAULT TRUE,
  CONSTRAINT PK_DIM_PROVIDER PRIMARY KEY (PRVDR_KEY)
);

CREATE TABLE DIM_DATE (
  DT_KEY              INTEGER         NOT NULL,
  FULL_DT             DATE            NOT NULL,
  YR_NBR              SMALLINT,
  QTR_NBR             SMALLINT,
  MO_NBR              SMALLINT,
  MO_NM               VARCHAR(20),
  WK_NBR              SMALLINT,
  DAY_OF_WK_NBR       SMALLINT,
  DAY_OF_WK_NM        VARCHAR(20),
  HLDY_IND            BOOLEAN,
  CONSTRAINT PK_DIM_DATE PRIMARY KEY (DT_KEY)
);

Generate this entire schema automatically using the mdatool AI Data Modeling tool — select Claims and Adjudication as your domain, Star Schema as your architecture, and your target platform to get production-ready DDL in 30 seconds.

Common Claims Analytics Use Cases

Healthcare data teams use claims data for a wide range of analytical workflows:

Member Total Cost of Care aggregates all claim payments across medical, pharmacy, and behavioral health to calculate the total cost of care for attributed member populations, used in value-based care program performance measurement and actuarial modeling.

HEDIS Quality Measure Calculation uses claims data to identify eligible members and measure care gap events across dozens of preventive care and chronic disease management measures. Claims-based HEDIS measures include colorectal cancer screening, diabetes care, and follow-up after hospitalization.

HCC Risk Adjustment processes diagnosis codes from claims to identify Hierarchical Condition Category assignments that drive RAF score calculations for Medicare Advantage capitation payments. Use our HCC Calculator to estimate RAF scores from ICD-10 diagnosis codes.

Provider Performance Measurement aggregates claims data at the provider level to calculate cost efficiency, quality measure rates, and utilization patterns used in value-based contract performance evaluation and network management.

Fraud, Waste, and Abuse Detection applies statistical analysis and predictive modeling to claims data to identify billing anomalies including upcoding, unbundling, duplicate billing, and services inconsistent with the member documented clinical condition.

Pharmacy Utilization Management analyzes pharmacy claims by drug category, formulary tier, and therapeutic class to measure medication adherence, identify high-cost specialty drug utilization, and evaluate pharmacy benefit design effectiveness.

Claims Data Quality Considerations

Claims data quality directly affects every downstream analytics use case. Healthcare data teams must address these common data quality challenges:

Duplicate claims occur when the same service is billed multiple times across different claim versions or resubmissions. Deduplication logic must identify the definitive paid version of each claim by selecting the most recent paid or final adjudicated claim per service line.

Claim versions are created each time a claim is adjusted or reprocessed. Analytics pipelines must track claim version history and select the appropriate version for reporting — typically the final paid version for financial analytics and all versions for denial management analytics.

Provider identity resolution is challenging when the same provider bills under multiple NPIs, tax IDs, or group affiliations across different claim sources. Provider master data management processes must resolve these identities to enable accurate provider-level analytics.

Diagnosis code completeness varies significantly across providers and claim types. Claims-based risk adjustment analytics must account for providers who systematically under-document secondary diagnoses, producing artificially low HCC risk scores.

Date field inconsistencies between paid date, processed date, service date, and incurred date require careful attention when building time-series analytics. Always anchor financial analytics to paid date for cash basis reporting and service date for incurred basis reporting.

Claims Data Tools

mdatool provides several free tools specifically designed for healthcare claims data work:

  • SQL Linter — Validate your claims SQL against healthcare naming standards and catch common query errors before they reach production
  • DDL Converter — Convert claims schema DDL between Snowflake, BigQuery, Databricks, and other platforms instantly
  • AI Data Modeling — Generate a complete claims data warehouse schema for any platform in 30 seconds
  • ICD-10 Code Search — Search 70,000 plus ICD-10-CM diagnosis codes used in claims data
  • HCC Calculator — Calculate Medicare Advantage RAF scores from ICD-10 diagnosis codes
  • Data Model Canvas — Visualize your claims schema as an interactive ER diagram

Frequently Asked Questions

What is the difference between a claim header and a claim line? The claim header captures summary information for the entire claim including total amounts, member and provider identifiers, and principal diagnosis. The claim line captures service-level detail for each individual procedure or revenue code billed within the claim. A single claim header may have one to hundreds of associated claim lines depending on the claim type and services rendered.

What is the difference between billed, allowed, and paid amounts on a claim? The billed amount is the provider standard charge before any adjustments. The allowed amount is the maximum the payer will pay based on the contracted fee schedule. The paid amount is what the payer actually paid after applying member cost-sharing including deductibles, copays, and coinsurance. The difference between billed and allowed is the contractual adjustment, and the difference between allowed and paid is the member liability.

How are claim denials tracked in a claims data warehouse? Claim denials are identified by claim status codes indicating non-payment and denial reason codes from the EDI 835 remittance advice. The CARC code explains why the claim was not paid and the RARC code provides additional detail. Claims data warehouses typically maintain a denial fact table or denial flags on the claim line fact table to support denial analytics and denial management workflows.

What is the grain of a claims fact table? Most claims data warehouses maintain two fact tables at different grains — a claim header fact at the one-row-per-claim grain for summary financial analytics and a claim line fact at the one-row-per-service-line grain for procedure-level analytics. Some organizations also maintain a claim diagnosis fact table at the one-row-per-diagnosis grain to support HCC risk adjustment pipelines.

How do you handle retroactive claims adjustments in a claims data warehouse? Retroactive adjustments require tracking claim versions with effective dates and selecting the correct version for reporting. Common approaches include maintaining all claim versions with a version number and adjustment indicator, using a slowly changing dimension approach on the claim fact table, or maintaining a separate adjustment fact table that records changes from the original claim.

What EDI transaction sets are used for healthcare claims? The primary claims EDI transactions are 837P for professional claims, 837I for institutional claims, and 837D for dental claims. The 835 electronic remittance advice returns payment information. The 270/271 transaction set is used for eligibility inquiry and response. The 276/277 transaction set supports claim status inquiry and response.

How is claims data used in HCC risk adjustment? Diagnosis codes from claims are mapped to Hierarchical Condition Category codes using CMS mapping files. Each HCC has a coefficient representing its contribution to the RAF score. The sum of all HCC coefficients plus demographic adjustments for age, gender, and enrollment status produces the member RAF score used to calculate Medicare Advantage capitation payments. Use our HCC Calculator to explore RAF score calculations from ICD-10 codes.

M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

Claims Adjudication

Medical claims processing, auto-adjudication, EOB generation, and denial management.

Read Guide

EDI Transactions

X12 EDI 837, 835, 270/271, and healthcare electronic data interchange.

Read Guide

Key Terms in This Article

ICD-10 Diagnosis CodeStar Schema

Free Tools

Free HCC RAF Score Calculator

Calculate CMS-HCC V28 RAF scores from ICD-10 codes. No account required.

Try it free

Free ICD-10 Code Search

Search 70,000+ ICD-10-CM diagnosis codes by description or code prefix.

Try it free

Free SQL Linter

Catch SQL bugs, performance issues, and naming violations before production.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • What Is Healthcare Claims Data?
  • Core Claims Data Elements
  • Claims Data Standards and EDI Transactions
  • Claims Data Model Design
  • Fact Table: FACT_CLAIM_HEADER
  • Fact Table: FACT_CLAIM_LINE
  • Dimension Tables
  • Common Claims Analytics Use Cases
  • Claims Data Quality Considerations
  • Claims Data Tools
  • Frequently Asked Questions

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools