mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data engineers & architects.

HomeBlogHealthcare Data EngineeringNPI Validation in Python: Luhn Algorithm, NPPES Lookup, and Snowflake UDF Guide
Healthcare Data Engineering

NPI Validation in Python: Luhn Algorithm, NPPES Lookup, and Snowflake UDF Guide

Every healthcare data warehouse has a provider dimension table with NPI numbers. Bad NPIs cause claims rejections, failed eligibility checks, and broken provider analytics. This guide shows you how to validate NPIs programmatically — from format checking to Luhn algorithm to bulk Snowflake UDF deployment.

mdatool Team·May 31, 2026·10 min read
PythonNPIdata qualitySnowflakehealthcare data engineering

Introduction

Every healthcare data warehouse has a provider dimension table. And every provider dimension table has NPI numbers that were copied from spreadsheets, scraped from enrollment systems, or keyed in by hand — which means some of them are wrong.

A bad NPI does not fail silently. It causes claims submissions to reject at the clearinghouse, breaks eligibility verification when a payer cannot match the rendering provider, and corrupts provider analytics when the same physician appears under two different identifiers. In a Medicare Advantage plan, a single miscoded NPI on an encounter can trigger a CMS audit finding.

The good news: NPI validation is deterministic. CMS defines an exact algorithm for checking whether an NPI is valid — the same Luhn check digit algorithm used by credit card numbers, applied with a healthcare-specific prefix. You can validate millions of NPIs in seconds without a network call.

This guide covers the full validation stack: the Luhn algorithm in Python, NPPES registry lookup for live provider data, bulk validation against a DataFrame, Snowflake and BigQuery UDF deployment, and dbt schema tests.


What Makes an NPI Valid?

The National Provider Identifier is a 10-digit numeric identifier assigned by CMS to every healthcare provider in the United States. Before applying the check digit algorithm, three structural rules must pass:

1. Exactly 10 digits. No dashes, spaces, letters, or leading zeros that shorten the numeric value to fewer than 10 digits. 123456789 (9 digits) and NPI-1234567890 (contains non-numeric characters) both fail before the Luhn check even runs.

2. Cannot be all zeros. 0000000000 is structurally a 10-digit number and would technically pass Luhn, but CMS explicitly prohibits it as an NPI value.

3. Must pass the CMS Luhn check digit. The tenth digit is a check digit computed from the first nine. An NPI where someone transposed two digits will fail this test even if the format looks correct.

Beyond validity, the first digit encodes the NPI type:

  • Type 1 (Individual): First digit is 1 or 2. Assigned to individual healthcare providers — physicians, nurses, therapists, and any other licensed practitioner.
  • Type 2 (Organization): First digit is 3 through 9. Assigned to group practices, hospitals, clinics, labs, and other organizational providers.

See the full provider domain glossary for the complete taxonomy of provider identifier types, and the NPI Guide for database schema design patterns.


The Luhn Algorithm for NPI Validation

CMS uses a modified Luhn check to generate and validate the tenth digit of every NPI. The modification is a constant prefix: CMS prepends 80840 to the 10-digit NPI before running the standard algorithm. This prefix was chosen to namespace healthcare NPIs within the broader ISO 7812 identifier space.

How it works, step by step:

Given NPI 1234567893:

  1. Prepend 80840: 8084012345678 + 93 → 808401234567893
  2. Reverse the digits: 3, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 4, 8, 0, 8
  3. Starting from index 0, double every digit at an odd index (1, 3, 5, ...): 9×2=18, 7×2=14, 5×2=10, 3×2=6, 1×2=2, 4×2=8, 0×2=0
  4. If any doubled value exceeds 9, subtract 9: 18→9, 14→5, 10→1
  5. Sum all digits. A valid NPI yields a sum divisible by 10.

Here is the complete Python implementation:

import re

def _luhn_check(npi: str) -> bool:
    """CMS Luhn check: prepend 80840 to the 10-digit NPI, then apply standard Luhn."""
    full_number = "80840" + npi
    digits = [int(d) for d in reversed(full_number)]
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:      # double every second digit (odd index after reversal)
            d *= 2
            if d > 9:
                d -= 9      # equivalent to summing the two digits of a 2-digit result
        total += d
    return total % 10 == 0


def validate_npi_format(npi: str) -> dict:
    """
    Validate an NPI against CMS rules.
    Returns a dict with keys: is_valid, npi, npi_type, errors.
    """
    errors = []
    npi = str(npi).strip()

    # Rule 1: exactly 10 digits
    if not re.match(r'^d{10}$', npi):
        errors.append("NPI must be exactly 10 digits (numeric only)")
        return {"is_valid": False, "npi": npi, "npi_type": None, "errors": errors}

    # Rule 2: not all zeros
    if npi == "0000000000":
        errors.append("NPI cannot be all zeros")

    # Rule 3: Luhn check digit
    if not _luhn_check(npi):
        errors.append("Failed CMS Luhn check digit validation")

    # Type detection (independent of validity)
    first = int(npi[0])
    npi_type = "Type 1 (Individual)" if first in (1, 2) else "Type 2 (Organization)"

    return {
        "is_valid": len(errors) == 0,
        "npi": npi,
        "npi_type": npi_type if not errors else None,
        "errors": errors,
    }

Usage:

>>> validate_npi_format("1234567893")
{'is_valid': True, 'npi': '1234567893', 'npi_type': 'Type 1 (Individual)', 'errors': []}

>>> validate_npi_format("1234567890")
{'is_valid': False, 'npi': '1234567890', 'npi_type': None, 'errors': ['Failed CMS Luhn check digit validation']}

>>> validate_npi_format("123456789")
{'is_valid': False, 'npi': '123456789', 'npi_type': None, 'errors': ['NPI must be exactly 10 digits (numeric only)']}

Looking Up NPIs from the NPPES Registry

Format validation confirms an NPI is structurally sound — it does not confirm the provider exists or is still active. For that you need the CMS NPPES API, which exposes the full public registry of 7+ million providers.

import httpx

def lookup_npi(npi: str, timeout: float = 10.0) -> dict:
    """
    Look up a provider in the CMS NPPES registry.
    Returns name, credential, status, taxonomy, and enumeration date.
    """
    url = "https://npiregistry.cms.hhs.gov/api/"
    params = {"number": npi, "version": "2.1"}

    try:
        response = httpx.get(url, params=params, timeout=timeout)
        response.raise_for_status()
        data = response.json()

        if not data.get("results"):
            return {"success": False, "error": "NPI not found in NPPES registry"}

        result = data["results"][0]
        basic = result.get("basic", {})
        taxonomies = result.get("taxonomies", [])
        primary = next((t for t in taxonomies if t.get("primary")), taxonomies[0] if taxonomies else {})

        name = (
            f"{basic.get('first_name', '')} {basic.get('last_name', '')}".strip()
            or basic.get("organization_name", "")
        )

        return {
            "success": True,
            "npi": npi,
            "name": name,
            "credential": basic.get("credential", ""),
            "status": basic.get("status", ""),
            "taxonomy_code": primary.get("code", ""),
            "taxonomy_desc": primary.get("desc", ""),
            "enumeration_date": basic.get("enumeration_date", ""),
        }

    except httpx.TimeoutException:
        return {"success": False, "error": f"NPPES API timed out after {timeout}s"}
    except Exception as e:
        return {"success": False, "error": str(e)}

Install httpx with pip install httpx. For high-volume lookups, use httpx.AsyncClient with asyncio.gather to parallelize requests — the NPPES API allows up to 200 results per call using the number parameter as a comma-separated list.

🏥

Free Tool

Look up any NPI number instantly →


Bulk NPI Validation for Data Warehouses

Production provider dimension tables often have tens of thousands of rows. Run validation across the whole column before it reaches the warehouse:

def validate_npi_column(npi_list: list[str]) -> list[dict]:
    """Validate a list of NPIs. Returns one result dict per input."""
    return [validate_npi_format(str(npi)) for npi in npi_list]


def get_invalid_npis(npi_list: list[str]) -> list[dict]:
    """Filter to only the invalid NPIs — useful for reporting and alerting."""
    return [r for r in validate_npi_column(npi_list) if not r["is_valid"]]

Pandas integration — validate a provider dimension CSV before loading:

import pandas as pd

df = pd.read_csv("provider_dim.csv")

validation = df["prvdr_npi"].apply(lambda x: validate_npi_format(str(x)))
df["npi_valid"]  = validation.apply(lambda x: x["is_valid"])
df["npi_errors"] = validation.apply(lambda x: ", ".join(x["errors"]))
df["npi_type"]   = validation.apply(lambda x: x["npi_type"] or "")

invalid = df[~df["npi_valid"]][["prvdr_id", "prvdr_npi", "npi_type", "npi_errors"]]
print(f"Invalid NPIs: {len(invalid)} of {len(df)} ({len(invalid)/len(df)*100:.1f}%)")
print(invalid.head(20).to_string(index=False))

Where to use this in a healthcare data pipeline:

  • Ingestion validation — reject or quarantine invalid NPIs at the bronze/raw layer before they propagate downstream
  • Pre-submission claims validation — every 837 transaction requires a valid rendering and billing NPI; catch failures before the clearinghouse does
  • Provider master data cleanup — audit the provider dimension on a scheduled basis and route failures to a data steward queue
  • Reference data joins — validate NPIs before joining to NPPES reference tables so bad keys do not silently produce null rows

Deploying as a Snowflake Python UDF

Once the validation logic is tested locally, deploy it as a Snowflake UDF so any SQL query or dbt model can call it inline — no Python environment required.

CREATE OR REPLACE FUNCTION validate_npi(npi VARCHAR)
  RETURNS BOOLEAN
  LANGUAGE PYTHON
  RUNTIME_VERSION = '3.11'
  HANDLER = 'validate_npi'
AS $$
import re

def validate_npi(npi: str) -> bool:
    if not npi or not re.match(r'^\d{10}$', str(npi).strip()):
        return False
    if npi.strip() == '0000000000':
        return False
    full_number = '80840' + npi.strip()
    digits = [int(d) for d in reversed(full_number)]
    total = 0
    for i, d in enumerate(digits):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        total += d
    return total % 10 == 0
$$;

Find invalid NPIs in the provider dimension:

SELECT
    prvdr_id,
    prvdr_npi,
    prvdr_nm,
    validate_npi(prvdr_npi) AS npi_valid
FROM dim_provider
WHERE validate_npi(prvdr_npi) = FALSE
ORDER BY prvdr_id;

Data quality summary — run this as a monitoring query:

SELECT
    COUNT(*)                                                          AS total_providers,
    SUM(CASE WHEN validate_npi(prvdr_npi) THEN 1 ELSE 0 END)         AS valid_npis,
    SUM(CASE WHEN NOT validate_npi(prvdr_npi) THEN 1 ELSE 0 END)     AS invalid_npis,
    ROUND(
        SUM(CASE WHEN NOT validate_npi(prvdr_npi) THEN 1 ELSE 0 END)
        * 100.0 / NULLIF(COUNT(*), 0), 2
    )                                                                 AS pct_invalid
FROM dim_provider;

BigQuery equivalent — using a JavaScript UDF, which avoids the need for a Python runtime:

CREATE OR REPLACE FUNCTION healthcare.validate_npi(npi STRING)
RETURNS BOOL
LANGUAGE js AS '''
  if (!npi || !/^\d{10}$/.test(npi)) return false;
  if (npi === '0000000000') return false;
  const digits = ('80840' + npi).split('').reverse().map(Number);
  let total = 0;
  digits.forEach((d, i) => {
    if (i % 2 === 1) { d *= 2; if (d > 9) d -= 9; }
    total += d;
  });
  return total % 10 === 0;
''';

-- Find invalid NPIs in BigQuery
SELECT npi_id, provider_npi, provider_nm
FROM `project.healthcare.dim_provider`
WHERE NOT healthcare.validate_npi(provider_npi);

While ERwin requires a complex setup for schema generation, you can generate clean DDL in seconds using our free converter.

Convert your first 5 DDLs — No Credit Card Required

Adding NPI Validation to Your dbt Project

Enforce NPI format at the dbt model layer so violations surface during dbt test before models are materialized into the warehouse.

# models/staging/schema.yml
version: 2

models:
  - name: stg_providers
    description: "Staged provider records from NPPES and internal credentialing."
    columns:

      - name: provider_npi_id
        description: "10-digit NPI. Must pass format and Luhn check."
        tests:
          - not_null
          - unique
          - dbt_utils.expression_is_true:
              expression: "regexp_like(provider_npi_id, '^[0-9]{10}$')"
              config:
                error_if: ">0"
                warn_if: ">0"

      - name: rendering_provider_npi_id
        description: "Rendering provider NPI on the claim."
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "regexp_like(rendering_provider_npi_id, '^[0-9]{10}$')"

For Snowflake, you can call the validate_npi UDF directly in a dbt test expression:

          - dbt_utils.expression_is_true:
              expression: "validate_npi(provider_npi_id) = TRUE"

For broader column naming enforcement across your provider models — ensuring provider_npi_id follows ISO-11179 standards rather than providerNPI or npi_number — use the dbt-healthcare-standards package alongside these tests:

# Add naming standard enforcement at the model level
    tests:
      - dbt_healthcare_standards.assert_healthcare_naming:
          exclude: ['_fivetran_synced', '_loaded_at']

The package is available at github.com/smudvar/dbt-healthcare-standards and links to mdatool.com/glossary in its validation error messages.


Common NPI Data Quality Issues

These are the patterns that appear most often when auditing provider dimension tables for the first time:

IssueExampleFix
Wrong length123456789 (9 digits)Pad with leading zero if confirmed correct, otherwise reject
All zeros0000000000Remove or flag for manual review
Failed Luhn1234567890Reject — likely a transcription error; look up correct NPI in NPPES
Type 2 used as Type 13456789012 assigned to individualReclassify or audit the source credentialing system
Null or emptyNULL / ""Apply default handling — either a placeholder NPI or exclude from claims
Non-numeric prefixNPI-1234567890Strip non-numeric characters, then re-validate
Transposed digits1243567893 vs 1234567893Fails Luhn; requires lookup against NPPES to identify correct NPI
Deactivated NPIPasses Luhn, not found in NPPESValid format but provider no longer active — exclude from active network

The mdatool NPI Lookup tool handles the NPPES lookup step — paste any NPI and get the provider name, taxonomy, credential, and active status without writing a line of code.

For a complete provider data schema with all NPI-related columns following ISO-11179 naming, see Healthcare Claims Data Model: Complete SQL Schema.


Frequently Asked Questions

How do I validate an NPI number in Python?

Use the CMS Luhn check digit algorithm: first confirm the NPI is exactly 10 numeric digits and not all zeros, then prepend the constant 80840 to the NPI and apply the standard Luhn algorithm to the resulting 15-digit string. A valid NPI produces a Luhn sum divisible by 10. The validate_npi_format function in this guide implements all three checks and returns a dict with is_valid, npi_type, and a list of errors. For lookups against the live registry, call the CMS NPPES API at npiregistry.cms.hhs.gov/api/ with the number parameter, or use the free mdatool NPI Lookup tool.

What is the NPI Luhn algorithm?

CMS uses a modified version of the ISO/IEC 7812 Luhn algorithm to validate NPIs. The modification is a constant 5-digit prefix: 80840 is prepended to the 10-digit NPI before running the standard Luhn check, making the input 15 digits long. The algorithm then reverses the digits, doubles every digit at an odd index (counting from 0 after reversal), subtracts 9 from any doubled value that exceeds 9, and sums all digits. If the total modulo 10 equals zero, the NPI is valid. The tenth digit of the NPI is the check digit — the value that makes the sum come out to a multiple of 10. CMS uses this to detect transcription errors in provider identifiers submitted on claims.

How do I validate NPIs in Snowflake?

Create a Python UDF using CREATE OR REPLACE FUNCTION validate_npi(npi VARCHAR) RETURNS BOOLEAN LANGUAGE PYTHON RUNTIME_VERSION = '3.11' with the Luhn check logic embedded in the handler. Once deployed, call validate_npi(prvdr_npi) directly in SQL — in WHERE clauses to filter invalid NPIs, in SELECT lists to add a validation flag, or in dbt test expressions to enforce data quality at build time. The data quality summary query in this guide shows how to generate a count of valid, invalid, and percentage invalid across the full provider dimension table.

What is the difference between Type 1 and Type 2 NPI?

Type 1 NPIs are assigned to individual healthcare providers — physicians, nurses, therapists, and any licensed practitioner who bills independently. Type 1 NPIs begin with the digit 1 or 2. Type 2 NPIs are assigned to organizations — hospitals, group practices, clinics, labs, health plans, and any other healthcare entity that bills as a legal entity rather than as an individual. Type 2 NPIs begin with digits 3 through 9. The distinction matters for claims processing: an 837 professional claim requires both a rendering NPI (always Type 1) and a billing NPI (may be Type 1 or Type 2 depending on whether the provider bills independently or through a group practice).

How do I look up NPI provider details?

The CMS NPPES registry exposes a free public API at npiregistry.cms.hhs.gov/api/ that returns provider name, credential, taxonomy code, status, and enumeration date for any valid NPI. The lookup_npi function in this guide wraps the API with httpx and handles timeouts and errors gracefully. For high-volume lookups, pass up to 200 NPIs per request using a comma-separated number parameter and process results asynchronously. For one-off lookups without writing code, use the free mdatool NPI Lookup tool — search by NPI number, provider name, or organization and get full NPPES details instantly.

M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

Healthcare Analytics

Population health analytics, data warehousing, and clinical intelligence.

Read Guide

NPI: National Provider Identifiers

NPI codes, database design, taxonomy classifications, and provider data standards.

Read Guide

Free Tools

Free NPI Lookup

Search any provider by NPI number, name, or organization instantly.

Try it free

Free SQL Linter

Catch SQL bugs, performance issues, and naming violations before production.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • Introduction
  • What Makes an NPI Valid?
  • The Luhn Algorithm for NPI Validation
  • Looking Up NPIs from the NPPES Registry
  • Bulk NPI Validation for Data Warehouses
  • Deploying as a Snowflake Python UDF
  • Adding NPI Validation to Your dbt Project
  • Common NPI Data Quality Issues
  • Frequently Asked Questions
  • How do I validate an NPI number in Python?
  • What is the NPI Luhn algorithm?
  • How do I validate NPIs in Snowflake?
  • What is the difference between Type 1 and Type 2 NPI?
  • How do I look up NPI provider details?

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools