Introduction

A health plan with 2 million members has, on average, between 1.5 and 3 million member records — because the same person enrolled, churned, re-enrolled, changed names, moved, and got entered differently in three different source systems. That is not a data quality problem. It is a master data management failure. MDM in healthcare is not a nice-to-have data engineering exercise; it is the foundation that makes population health analytics, risk adjustment, and care management operationally possible.

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

This guide covers the architecture and data modeling patterns for the three domains that matter most: patient identity, provider registry, and payer/plan master data.

Domain 1: Patient Identity — The Master Patient Index (MPI)

The Master Patient Index is the authoritative, deduplicated record of every patient or member across all source systems. Every downstream use case — HEDIS measures, HCC risk scores, care gap analysis — depends on an accurate MPI.

The Deduplication Problem

Patient records duplicate for predictable reasons:

Name variations (Maria vs. María, Jr. vs. omitted suffix)
Address changes that create new enrollment records
Multiple coverage periods under different member IDs
Data entry errors at point of care

A naive approach (match on SSN) fails because SSN is missing, incorrect, or deliberately obscured in roughly 10–15% of records in most health plan systems.

MPI Data Model

CREATE TABLE mdm.patient_master (
  enterprise_patient_id   VARCHAR(36)  NOT NULL,  -- golden record ID
  first_name              VARCHAR(100),
  last_name               VARCHAR(100),
  date_of_birth           DATE,
  gender_code             CHAR(1),
  ssn_hash                VARCHAR(64),             -- SHA-256, not plaintext
  primary_address_id      BIGINT,
  current_coverage_flag   BOOLEAN,
  created_at              TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at              TIMESTAMP    NOT NULL,
  match_confidence_score  DECIMAL(5,4),            -- 0.0000–1.0000
  source_count            INT,                     -- number of source records matched
  PRIMARY KEY (enterprise_patient_id)
);

CREATE TABLE mdm.patient_crosswalk (
  crosswalk_id            BIGINT GENERATED ALWAYS AS IDENTITY,
  enterprise_patient_id   VARCHAR(36)  NOT NULL,
  source_system           VARCHAR(50)  NOT NULL,   -- EPIC, FACETS, CLAIMS, etc.
  source_patient_id       VARCHAR(100) NOT NULL,
  match_type              VARCHAR(20),             -- EXACT, PROBABILISTIC, MANUAL
  match_score             DECIMAL(5,4),
  linked_at               TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (crosswalk_id),
  UNIQUE (source_system, source_patient_id)
);

Matching Strategy

Use a probabilistic matching algorithm (Fellegi-Sunter is the standard; commercial tools like IBM InfoSphere QualityStage implement it). Weight matching fields:

Field	Weight	Notes
SSN	0.30	High weight but often missing/incorrect
DOB	0.25	Reliable for enrollment; less so for EHR
Last name + First name	0.20	Account for phonetic variants (Soundex/Metaphone)
Address	0.15	Use USPS standardized form
Phone	0.10	High churn; low weight

Match pairs above a confidence threshold (typically 0.85+) are auto-linked. Pairs in a review band (0.70–0.85) queue for manual stewardship review.

Domain 2: Provider Registry

A provider registry is the authoritative directory of every clinician, facility, and organization that delivers care within your network or claims ecosystem. Broken provider data causes misdirected payments, inaccurate directories, and failed credentialing processes.

Provider Registry Data Model

CREATE TABLE mdm.provider_master (
  enterprise_provider_id  VARCHAR(36)  NOT NULL,
  npi                     CHAR(10)     NOT NULL,   -- National Provider Identifier
  provider_type           VARCHAR(20)  NOT NULL,   -- INDIVIDUAL, ORGANIZATION
  first_name              VARCHAR(100),            -- null for org
  last_name               VARCHAR(100),            -- null for org
  organization_name       VARCHAR(200),            -- null for individual
  primary_specialty_code  VARCHAR(10),             -- NUCC taxonomy
  primary_taxonomy_desc   VARCHAR(200),
  license_state           CHAR(2),
  license_number          VARCHAR(50),
  credential_status       VARCHAR(20),             -- ACTIVE, EXPIRED, SUSPENDED
  network_status          VARCHAR(20),             -- IN_NETWORK, OUT_OF_NETWORK
  effective_date          DATE,
  termination_date        DATE,
  PRIMARY KEY (enterprise_provider_id),
  UNIQUE (npi)
);

The NPI is the universal provider identifier — use it as the deduplication key across source systems. The NPI Lookup tool can validate NPI numbers and retrieve taxonomy classifications from the NPPES registry directly.

Common MDM Failures in Provider Data

Group NPI vs. Individual NPI confusion: Claims paid to group NPI 1234567890 but attributed to rendering NPI 9876543210. MDM must maintain both and link them.
Taxonomy code drift: A provider changes specialty but the taxonomy update doesn't propagate from NPPES to the internal registry. Creates incorrect network status.
Credentialing lag: Provider onboarded to claims before credentialing completes. MDM should gate network activation on credentialing status.

Domain 3: Payer and Plan Master Data

Plan master data governs the products your organization offers or interacts with: benefit plans, formularies, network configurations, and contract terms. In a health plan, inaccurate plan data is a root cause of incorrect member cost-sharing, failed prior authorization checks, and misapplied formulary rules.

Plan Master Data Model

CREATE TABLE mdm.plan_master (
  plan_id                 VARCHAR(36)  NOT NULL,
  plan_name               VARCHAR(200) NOT NULL,
  plan_type               VARCHAR(20)  NOT NULL,   -- HMO, PPO, EPO, HDHP, MA, PDP
  lob                     VARCHAR(30)  NOT NULL,   -- COMMERCIAL, MEDICARE, MEDICAID
  cms_contract_number     VARCHAR(10),             -- for Medicare Advantage
  cms_plan_id             VARCHAR(10),             -- H, R, or S number
  effective_date          DATE         NOT NULL,
  termination_date        DATE,
  formulary_id            VARCHAR(36),
  network_id              VARCHAR(36),
  service_area_state      CHAR(2),
  metal_tier              VARCHAR(10),             -- BRONZE, SILVER, GOLD, PLATINUM
  is_active               BOOLEAN      NOT NULL DEFAULT TRUE,
  PRIMARY KEY (plan_id)
);

MDM Architecture Patterns

Registry style (virtual MDM): A crosswalk table links source IDs to a master ID without physically consolidating records. Fast to implement; requires all consumers to join through the crosswalk.

Consolidation style (physical golden record): A golden record is created by merging source attributes. Best match attribute wins per field. More reliable for analytics but requires ongoing merge maintenance.

Coexistence style: Both source records and the golden record exist and are synchronized in both directions. Most common in healthcare because source systems (Epic, Facets) remain the system of record for clinical and operational data.

Key Takeaways

Patient MDM is the foundation of population health analytics. Start with probabilistic matching, not exact match on SSN.
The crosswalk table pattern is the most portable MDM architecture — it works regardless of whether you consolidate records or not.
NPI is the universal deduplication key for providers. Use the NPI Lookup to validate and enrich provider records from NPPES.
Plan master data failures cause downstream errors in cost-sharing, prior auth, and formulary logic — not just data quality dashboards.
MDM is a process, not a one-time project. Merge confidence scores must be reviewed continuously as new records flow in.

Master Data Management in Healthcare: Patient, Provider, and Payer MDM Explained