What Is an NPI Number?
The National Provider Identifier (NPI) is a unique 10-digit identification number assigned to healthcare providers by CMS through the National Plan and Provider Enumeration System (NPPES). Every provider who transmits health information electronically in HIPAA-covered transactions — which covers virtually every provider in the US — is required to have an NPI.
There are two types:
- Type 1 NPI: Assigned to individual providers — physicians, nurse practitioners, physical therapists, dentists. Each individual gets exactly one Type 1 NPI regardless of how many organizations they work with.
- Type 2 NPI: Assigned to organizations — hospitals, group practices, laboratories, pharmacies. A single hospital system may have multiple Type 2 NPIs for different locations or legal entities.
For data engineers, the NPI is the universal join key across almost every healthcare dataset: claims, credentialing, directory, eligibility, and prior authorization data all reference it. Data quality problems with NPI propagate everywhere downstream.
NPI Format and the Luhn Check Digit
Every valid NPI passes the Luhn algorithm check-digit test. The algorithm is deterministic and can be implemented as a database function, making it a mandatory gate in any provider data ingestion pipeline.
An NPI consists of 10 digits. The last digit is the check digit. To validate:
- Prefix the 10-digit NPI with
80840(the ISO standard prefix for healthcare provider NPIs). - Starting from the rightmost digit, double every second digit going left.
- If doubling produces a number greater than 9, subtract 9.
- Sum all digits.
- If the total modulo 10 equals 0, the NPI is valid.
SQL Luhn Validation (Snowflake)
CREATE OR REPLACE FUNCTION util.validate_npi(npi VARCHAR)
RETURNS BOOLEAN
LANGUAGE JAVASCRIPT
AS
$$
if (!/^\d{10}$/.test(NPI)) return false;
const full = '80840' + NPI;
let sum = 0;
for (let i = 0; i < full.length; i++) {
let d = parseInt(full[full.length - 1 - i]);
if (i % 2 === 1) { d *= 2; if (d > 9) d -= 9; }
sum += d;
}
return sum % 10 === 0;
$$;
-- Flag invalid NPIs before loading to provider master
SELECT npi, COUNT(*) AS occurrences
FROM staging.provider_feed
WHERE NOT util.validate_npi(npi)
AND LENGTH(npi) = 10
AND npi REGEXP '^[0-9]{10}$'
GROUP BY npi
ORDER BY occurrences DESC;
Python Luhn Validation
def validate_npi(npi: str) -> bool:
if not npi.isdigit() or len(npi) != 10:
return False
full = "80840" + npi
total = 0
for i, ch in enumerate(reversed(full)):
d = int(ch)
if i % 2 == 1: # double every second digit from right
d *= 2
if d > 9:
d -= 9
total += d
return total % 10 == 0
# Vectorized for pandas DataFrames
import pandas as pd
df["npi_valid"] = df["npi"].apply(validate_npi)
invalid = df[~df["npi_valid"]]
print(f"{len(invalid)} invalid NPIs out of {len(df)} records")
CMS data shows approximately 1–2% of NPIs in raw provider feeds fail Luhn validation. These are almost always transposed digits or truncated values. Reject them at ingestion rather than propagating bad identifiers into your provider master.
Provider Master Data Model
A provider master is the system of record that links NPI to provider attributes across all downstream datasets. It must handle:
- Providers with multiple practice locations (one NPI, many addresses)
- Group practices with multiple individual providers (many Type 1 NPIs, one Type 2 NPI)
- Address and demographic data that changes frequently — NPPES estimates 15–20% of records change annually
- Taxonomy codes that evolve as CMS updates the NUCC taxonomy file
CREATE TABLE provider.prvdr_master (
npi VARCHAR(10) NOT NULL,
npi_type_cd CHAR(1) NOT NULL, -- 1 = individual, 2 = organization
prvdr_last_nm VARCHAR(100),
prvdr_first_nm VARCHAR(100),
prvdr_org_nm VARCHAR(255),
prvdr_gender_cd CHAR(1),
sole_proprietor_ind BOOLEAN DEFAULT FALSE,
enum_dt DATE,
deactivation_dt DATE,
nppes_refresh_dt DATE NOT NULL,
PRIMARY KEY (npi)
);
CREATE TABLE provider.prvdr_taxonomy (
npi VARCHAR(10) NOT NULL,
taxonomy_cd VARCHAR(15) NOT NULL,
taxonomy_desc VARCHAR(255),
primary_ind BOOLEAN NOT NULL,
nucc_version VARCHAR(10),
eff_dt DATE NOT NULL,
end_dt DATE,
PRIMARY KEY (npi, taxonomy_cd, eff_dt)
);
CREATE TABLE provider.prvdr_address (
npi VARCHAR(10) NOT NULL,
addr_type_cd VARCHAR(20) NOT NULL, -- MAILING or PRACTICE
addr_line1 VARCHAR(255),
city_nm VARCHAR(100),
state_cd CHAR(2),
zip_cd VARCHAR(10),
phone_nbr VARCHAR(20),
eff_dt DATE NOT NULL,
end_dt DATE,
PRIMARY KEY (npi, addr_type_cd, eff_dt)
);
CREATE TABLE provider.prvdr_group_affiliation (
individual_npi VARCHAR(10) NOT NULL, -- Type 1
group_npi VARCHAR(10) NOT NULL, -- Type 2
affiliation_eff_dt DATE NOT NULL,
affiliation_end_dt DATE,
PRIMARY KEY (individual_npi, group_npi, affiliation_eff_dt)
);
NPPES Registry Reconciliation
CMS publishes the full NPPES data dissemination file monthly. This is the authoritative source for provider data — more reliable than any payer's internal credentialing system. Build a monthly reconciliation job that flags discrepancies in name, address, taxonomy, and deactivation status.
-- Providers in your master whose NPPES data has changed
SELECT
m.npi,
m.prvdr_last_nm AS master_last_nm,
n.provider_last_name_legal_name AS nppes_last_nm,
m.nppes_refresh_dt AS last_refreshed
FROM provider.prvdr_master m
JOIN staging.nppes_dissemination n ON m.npi = n.npi
WHERE m.prvdr_last_nm <> n.provider_last_name_legal_name
OR m.prvdr_first_nm <> n.provider_first_name;
-- Deactivated NPIs still marked active in your master
SELECT
m.npi,
m.prvdr_last_nm,
m.nppes_refresh_dt
FROM provider.prvdr_master m
JOIN staging.nppes_dissemination n ON m.npi = n.npi
WHERE n.npi_deactivation_date IS NOT NULL
AND m.deactivation_dt IS NULL;
Network Directory Accuracy
Provider NPI accuracy is a regulatory requirement, not just a data quality goal. CMS and state regulators conduct directory accuracy surveys and levy fines for plans whose directories contain materially incorrect information. The most common failures are: providers listed as accepting new patients who are no longer in-network, address data that is 12+ months stale, and providers listed under the wrong specialty taxonomy.
Implement a quarterly refresh cycle that re-validates every active in-network provider against the NPPES dissemination file. Flag providers whose NPPES record has been deactivated — these must be removed from the directory within 2 business days under CMS regulations.
Frequently Asked Questions
What is the difference between Type 1 and Type 2 NPI?
A Type 1 NPI is assigned to an individual healthcare provider regardless of where they practice. A single person has exactly one Type 1 NPI for their entire career. A Type 2 NPI is assigned to an organization. Organizations can have multiple Type 2 NPIs for different legal entities or locations. In claims data, the billing NPI is typically Type 2 (the organization) while the rendering NPI is Type 1 (the individual who provided the service).
How do I validate an NPI without calling an external API?
The Luhn check-digit algorithm validates NPI format deterministically without any external API call. Implement it as a SQL UDF or Python function (see examples above) and apply it to every NPI at ingestion. Luhn catches transposed digits and random errors but does not verify that an NPI has been issued by CMS — for that, reconcile against the NPPES dissemination file.
How often does NPPES data change and how should I handle updates?
CMS publishes a full NPPES dissemination file monthly and incremental weekly update files. Approximately 15–20% of provider records change annually. Run a full replacement load monthly and apply weekly incremental updates. Design your provider master with SCD Type 2 tracking so historical claims can always be joined to the provider record that was active at the time of service.
What causes NPI validation failures in production feeds?
The most common causes are: truncation to fewer than 10 digits from legacy systems that stored NPI as an integer and dropped leading zeros, manual data entry transpositions, and placeholder values like 9999999999 used by systems that lack an NPI for a provider. Truncation needs zero-padding and re-validation; transpositions require manual research against NPPES; placeholders need a provider lookup and replacement.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
Free Tools
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.