mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data teams.

HomeBlogHealthcare Data ModelingMaster Data Management in Healthcare: Patient, Provider, and Payer MDM Explained
Healthcare Data Modeling

Master Data Management in Healthcare: Patient, Provider, and Payer MDM Explained

Duplicate patient records, fragmented provider directories, and inconsistent plan data are MDM failures. Here is how to design MDM architecture for the three core healthcare domains.

mdatool Team·April 21, 2026·9 min read
master data managementMDMpatient identityMPIprovider registryhealthcare data modeling

Introduction

A health plan with 2 million members has, on average, between 1.5 and 3 million member records — because the same person enrolled, churned, re-enrolled, changed names, moved, and got entered differently in three different source systems. That is not a data quality problem. It is a master data management failure. MDM in healthcare is not a nice-to-have data engineering exercise; it is the foundation that makes population health analytics, risk adjustment, and care management operationally possible.

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

This guide covers the architecture and data modeling patterns for the three domains that matter most: patient identity, provider registry, and payer/plan master data.


Domain 1: Patient Identity — The Master Patient Index (MPI)

The Master Patient Index is the authoritative, deduplicated record of every patient or member across all source systems. Every downstream use case — HEDIS measures, HCC risk scores, care gap analysis — depends on an accurate MPI.

The Deduplication Problem

Patient records duplicate for predictable reasons:

  • Name variations (Maria vs. María, Jr. vs. omitted suffix)
  • Address changes that create new enrollment records
  • Multiple coverage periods under different member IDs
  • Data entry errors at point of care

A naive approach (match on SSN) fails because SSN is missing, incorrect, or deliberately obscured in roughly 10–15% of records in most health plan systems.

MPI Data Model

CREATE TABLE mdm.patient_master (
  enterprise_patient_id   VARCHAR(36)  NOT NULL,  -- golden record ID
  first_name              VARCHAR(100),
  last_name               VARCHAR(100),
  date_of_birth           DATE,
  gender_code             CHAR(1),
  ssn_hash                VARCHAR(64),             -- SHA-256, not plaintext
  primary_address_id      BIGINT,
  current_coverage_flag   BOOLEAN,
  created_at              TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at              TIMESTAMP    NOT NULL,
  match_confidence_score  DECIMAL(5,4),            -- 0.0000–1.0000
  source_count            INT,                     -- number of source records matched
  PRIMARY KEY (enterprise_patient_id)
);

CREATE TABLE mdm.patient_crosswalk (
  crosswalk_id            BIGINT GENERATED ALWAYS AS IDENTITY,
  enterprise_patient_id   VARCHAR(36)  NOT NULL,
  source_system           VARCHAR(50)  NOT NULL,   -- EPIC, FACETS, CLAIMS, etc.
  source_patient_id       VARCHAR(100) NOT NULL,
  match_type              VARCHAR(20),             -- EXACT, PROBABILISTIC, MANUAL
  match_score             DECIMAL(5,4),
  linked_at               TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (crosswalk_id),
  UNIQUE (source_system, source_patient_id)
);

Matching Strategy

Use a probabilistic matching algorithm (Fellegi-Sunter is the standard; commercial tools like IBM InfoSphere QualityStage implement it). Weight matching fields:

FieldWeightNotes
SSN0.30High weight but often missing/incorrect
DOB0.25Reliable for enrollment; less so for EHR
Last name + First name0.20Account for phonetic variants (Soundex/Metaphone)
Address0.15Use USPS standardized form
Phone0.10High churn; low weight

Match pairs above a confidence threshold (typically 0.85+) are auto-linked. Pairs in a review band (0.70–0.85) queue for manual stewardship review.


Domain 2: Provider Registry

A provider registry is the authoritative directory of every clinician, facility, and organization that delivers care within your network or claims ecosystem. Broken provider data causes misdirected payments, inaccurate directories, and failed credentialing processes.

Provider Registry Data Model

CREATE TABLE mdm.provider_master (
  enterprise_provider_id  VARCHAR(36)  NOT NULL,
  npi                     CHAR(10)     NOT NULL,   -- National Provider Identifier
  provider_type           VARCHAR(20)  NOT NULL,   -- INDIVIDUAL, ORGANIZATION
  first_name              VARCHAR(100),            -- null for org
  last_name               VARCHAR(100),            -- null for org
  organization_name       VARCHAR(200),            -- null for individual
  primary_specialty_code  VARCHAR(10),             -- NUCC taxonomy
  primary_taxonomy_desc   VARCHAR(200),
  license_state           CHAR(2),
  license_number          VARCHAR(50),
  credential_status       VARCHAR(20),             -- ACTIVE, EXPIRED, SUSPENDED
  network_status          VARCHAR(20),             -- IN_NETWORK, OUT_OF_NETWORK
  effective_date          DATE,
  termination_date        DATE,
  PRIMARY KEY (enterprise_provider_id),
  UNIQUE (npi)
);

The NPI is the universal provider identifier — use it as the deduplication key across source systems. The NPI Lookup tool can validate NPI numbers and retrieve taxonomy classifications from the NPPES registry directly.

Common MDM Failures in Provider Data

  • Group NPI vs. Individual NPI confusion: Claims paid to group NPI 1234567890 but attributed to rendering NPI 9876543210. MDM must maintain both and link them.
  • Taxonomy code drift: A provider changes specialty but the taxonomy update doesn't propagate from NPPES to the internal registry. Creates incorrect network status.
  • Credentialing lag: Provider onboarded to claims before credentialing completes. MDM should gate network activation on credentialing status.

Domain 3: Payer and Plan Master Data

Plan master data governs the products your organization offers or interacts with: benefit plans, formularies, network configurations, and contract terms. In a health plan, inaccurate plan data is a root cause of incorrect member cost-sharing, failed prior authorization checks, and misapplied formulary rules.

Plan Master Data Model

CREATE TABLE mdm.plan_master (
  plan_id                 VARCHAR(36)  NOT NULL,
  plan_name               VARCHAR(200) NOT NULL,
  plan_type               VARCHAR(20)  NOT NULL,   -- HMO, PPO, EPO, HDHP, MA, PDP
  lob                     VARCHAR(30)  NOT NULL,   -- COMMERCIAL, MEDICARE, MEDICAID
  cms_contract_number     VARCHAR(10),             -- for Medicare Advantage
  cms_plan_id             VARCHAR(10),             -- H, R, or S number
  effective_date          DATE         NOT NULL,
  termination_date        DATE,
  formulary_id            VARCHAR(36),
  network_id              VARCHAR(36),
  service_area_state      CHAR(2),
  metal_tier              VARCHAR(10),             -- BRONZE, SILVER, GOLD, PLATINUM
  is_active               BOOLEAN      NOT NULL DEFAULT TRUE,
  PRIMARY KEY (plan_id)
);

MDM Architecture Patterns

Registry style (virtual MDM): A crosswalk table links source IDs to a master ID without physically consolidating records. Fast to implement; requires all consumers to join through the crosswalk.

Consolidation style (physical golden record): A golden record is created by merging source attributes. Best match attribute wins per field. More reliable for analytics but requires ongoing merge maintenance.

Coexistence style: Both source records and the golden record exist and are synchronized in both directions. Most common in healthcare because source systems (Epic, Facets) remain the system of record for clinical and operational data.


Key Takeaways

  • Patient MDM is the foundation of population health analytics. Start with probabilistic matching, not exact match on SSN.
  • The crosswalk table pattern is the most portable MDM architecture — it works regardless of whether you consolidate records or not.
  • NPI is the universal deduplication key for providers. Use the NPI Lookup to validate and enrich provider records from NPPES.
  • Plan master data failures cause downstream errors in cost-sharing, prior auth, and formulary logic — not just data quality dashboards.
  • MDM is a process, not a one-time project. Merge confidence scores must be reviewed continuously as new records flow in.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

EHR Systems

Electronic Health Record systems, data models, and interoperability standards.

Read Guide

Healthcare Analytics

Population health analytics, data warehousing, and clinical intelligence.

Read Guide

Key Terms in This Article

Master Data Management

More in Healthcare Data Modeling

AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in 30 Seconds

Healthcare data architects spend days designing schemas from scratch — Medicare Advantage claims warehouses alone require 20+ tables, hundreds of columns, and platform-specific syntax. The mdatool AI Data Modeling tool generates a production-ready Star Schema for Snowflake, BigQuery, or Databricks in 30 seconds, with ISO-11179 standard column names built in.

Read more

Enforcing ISO-11179 Healthcare Naming Standards in dbt Projects

Every healthcare data warehouse eventually develops naming drift — DOB in one model, birth_dt in another, member_birth_date in a third. The dbt-healthcare-standards package brings ISO-11179 column naming directly into your dbt project as installable macros and schema tests.

Read more

AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in Seconds

Healthcare data models take weeks to design manually — HIPAA constraints, CMS reporting requirements, and ISO-11179 naming standards all have to be applied correctly from the start. AI data modeling changes that. Here is how to generate production-ready schemas for Snowflake, BigQuery, and Databricks in seconds.

Read more

Free Tools

Free NPI Lookup

Search any provider by NPI number, name, or organization instantly.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • Introduction
  • Domain 1: Patient Identity — The Master Patient Index (MPI)
  • The Deduplication Problem
  • MPI Data Model
  • Matching Strategy
  • Domain 2: Provider Registry
  • Provider Registry Data Model
  • Common MDM Failures in Provider Data
  • Domain 3: Payer and Plan Master Data
  • Plan Master Data Model
  • MDM Architecture Patterns
  • Key Takeaways

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools