Provider data is one of the most operationally complex domains in healthcare analytics. Providers move between organizations, merge practices, update credentials, join and leave networks, and maintain relationships with dozens of facilities and payers simultaneously. A single physician may have multiple NPIs, practice under several group affiliations, hold privileges at multiple hospitals, and participate in dozens of insurance networks — all while their demographic and credentialing information changes continuously.
For health plans, provider networks are the product. For hospitals, provider data drives clinical operations. For payers and regulators, provider data quality is a compliance requirement. This guide covers everything a healthcare data architect, data modeler, or data engineer needs to know about provider data — from NPI numbers and credentialing to production-ready master data model design for Snowflake, Databricks, and BigQuery.
What Is Healthcare Provider Data?
A healthcare provider is any individual or organization licensed to deliver clinical services and eligible to bill insurance payers for those services. Provider data encompasses the complete set of administrative, credentialing, network, and performance information maintained about providers by health plans, hospitals, credentialing organizations, and government agencies.
Provider data falls into four primary categories:
Identity and demographics capture who the provider is — their name, NPI number, tax identification number, practice addresses, phone numbers, and organizational affiliations. The National Provider Identifier is the universal provider identifier mandated by HIPAA for all healthcare administrative transactions.
Credentials and licensure document the provider's qualifications — medical school, residency training, board certifications, state licenses, DEA registration, and malpractice history. Provider credentialing is the formal verification process that health plans and hospitals use to confirm these credentials before granting network participation or clinical privileges.
Network participation defines which insurance plans a provider has contracted with, what services are covered under each contract, what reimbursement rates apply, and whether the provider is currently accepting new patients. Provider network status is one of the most queried fields in healthcare operations — driving claims adjudication, member-facing provider directories, and network adequacy filings.
Performance and quality captures how the provider performs on clinical quality metrics, patient experience measures, cost efficiency scores, and utilization patterns. Provider quality scores and provider star ratings are increasingly used in tiered network designs and value-based payment programs.
Core Provider Data Elements
Every healthcare data team working with provider data needs to understand these fundamental fields:
The National Provider Identifier (npi) is the 10-digit unique identification number assigned to every healthcare provider in the United States by CMS under HIPAA. Type 1 NPIs are assigned to individual practitioners and Type 2 NPIs to organizational providers. The NPI is required on all HIPAA standard transactions including claims, eligibility, and remittance.
The provider taxonomy code (prvdr_tax) is the NUCC 10-character alphanumeric code identifying the provider type, classification, and specialization in a standardized hierarchy. Taxonomy codes drive fee schedule assignment, claims editing, and network adequacy measurement.
The provider specialty code (prvdr_spclty_cd) identifies the provider clinical specialty and drives network adequacy analysis — health plans must demonstrate sufficient specialist availability within defined access standards for each member ZIP code.
The provider tax ID (prvdr_tax_id) is the federal tax identification number used for claims payment and 1099 tax reporting. For individual providers this may be a Social Security Number and for organizations an Employer Identification Number.
The provider network status (prvdr_ntwk_sts) indicates whether the provider is actively contracted and participating in a specific health plan network — driving claims adjudication benefit tier application and member directory accuracy.
The provider credentialing status (prvdr_credntl_sts) tracks where a provider is in the credentialing workflow — from application received through committee approval to active network participation.
The provider panel status (prvdr_pnl_sts) indicates whether the provider is accepting new patients — a critical network adequacy data element that CMS requires health plans to maintain accurately and update within 30 days of any change.
The provider exclusion indicator (prvdr_excl_ind) flags providers listed on the OIG List of Excluded Individuals and Entities — health plans must screen against this list monthly to avoid contracting with excluded providers.
NPI Registry and NPPES Data
The National Plan and Provider Enumeration System is the CMS database that assigns and maintains NPI numbers for all healthcare providers. NPPES data is publicly available and serves as the authoritative source for provider identity information. Use our NPI Lookup tool to search any provider by NPI number or name.
NPPES contains these key data elements for every registered provider:
- NPI number and NPI type (Type 1 individual / Type 2 organization)
- Provider name (legal name and credential suffix)
- Practice location addresses (up to 50 locations per provider)
- Mailing address for correspondence
- Taxonomy codes (primary and secondary specializations)
- Enumeration date and last update date
- Deactivation status for providers who have surrendered their NPI
Healthcare data teams download the monthly NPPES full replacement file to maintain current provider identity data, validate provider NPIs submitted on claims, and enrich internal provider records with standardized demographic and taxonomy information from the authoritative CMS source.
Provider Credentialing Process
Provider credentialing is the formal verification of a provider's qualifications before granting network participation or clinical privileges. NCQA standards require health plans to complete initial credentialing within 180 days of application and recredential providers every three years.
The credentialing process involves primary source verification of:
- Medical school graduation from an accredited institution
- Residency and fellowship completion at accredited programs
- Board certification status from recognized specialty boards
- State licensure status in all states of practice
- DEA registration for providers with prescribing authority
- Malpractice insurance coverage and claims history
- Hospital privileges and facility affiliations
- OIG exclusion and SAM.gov debarment screening
- National Practitioner Data Bank query results
Healthcare data teams build credentialing workflow systems that track provider credentialing application status through each verification step, calculate provider recredentialing due dates three years from initial approval, generate advance notice workflows at 180, 90, and 30 days before expiration, and maintain audit trails required for NCQA credentialing accreditation surveys.
Provider Network Design
A health plan provider network is the set of contracted providers who have agreed to deliver covered services to members at negotiated reimbursement rates. Network design decisions — which specialties to include, how many providers per service area, what reimbursement rates to offer — directly determine member access to care and plan financial performance.
Network adequacy is the regulatory requirement that health plans maintain sufficient provider availability within defined distance and time standards for each member ZIP code in the service area. CMS requires Medicare Advantage plans to demonstrate that members can access primary care within 15 miles and 30 minutes and specialists within 30 miles and 60 minutes.
Tiered networks create multiple provider tiers within a single network, with members paying lower cost sharing for Tier 1 preferred providers who have demonstrated superior quality and cost efficiency. Provider star ratings and provider cost efficiency scores determine tier placement.
Narrow networks include a smaller subset of providers with favorable quality and cost profiles, enabling lower premiums in exchange for more limited provider choice. Narrow network design requires careful adequacy analysis to ensure the reduced provider set still meets regulatory access standards.
Healthcare data teams build network management systems that track provider network effective dates and termination dates, calculate network adequacy metrics by specialty and service area ZIP code, identify access gaps requiring additional contracting, and maintain provider accepting patients status for directory accuracy.
Provider Master Data Model Design
Provider master data management requires a hub-and-spoke architecture that maintains the provider identity record separately from the multiple relationships and attributes that can change independently — network participation, facility affiliations, credentials, and performance metrics.
Below is a production-ready provider master data model generated by the mdatool AI Data Modeling tool:
Core Provider Identity Table
-- Snowflake DDL — generated with mdatool AI Data Modeling
CREATE TABLE DIM_PROVIDER (
PRVDR_KEY INTEGER NOT NULL, -- surrogate key
PRVDR_NPI VARCHAR(10), -- NPI (Type 1 or 2)
PRVDR_TIN VARCHAR(10), -- tax identification number
PRVDR_FIRST_NM VARCHAR(100), -- first name
PRVDR_LAST_NM VARCHAR(100), -- last name
PRVDR_ORG_NM VARCHAR(255), -- organization name
PRVDR_DBA_NM VARCHAR(255), -- doing business as name
PRVDR_TYP_CD VARCHAR(20), -- provider type code
PRVDR_TAX VARCHAR(10), -- primary taxonomy code
PRVDR_SPCLTY_CD VARCHAR(10), -- specialty code
PRVDR_DEG_CD VARCHAR(20), -- degree code (MD/DO/NP)
PRVDR_LIC_TYP_CD VARCHAR(20), -- license type code
PRVDR_STATE_CD CHAR(2), -- primary state
PRVDR_ZIP_CD VARCHAR(10), -- primary ZIP code
PRVDR_CNTY VARCHAR(50), -- primary county
PRVDR_GRP_NPI VARCHAR(10), -- group NPI (Type 2)
PRVDR_BRD_CERT_IND CHAR(1), -- board certified indicator
EFF_START_DT DATE NOT NULL, -- SCD2 effective start
EFF_END_DT DATE, -- SCD2 effective end
CURR_ROW_IND BOOLEAN NOT NULL DEFAULT TRUE,
LOAD_DT TIMESTAMP_NTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT PK_DIM_PROVIDER PRIMARY KEY (PRVDR_KEY)
);
Provider Network Participation Table
-- One row per provider per network per contract period
CREATE TABLE FACT_PROVIDER_NETWORK (
PRVDR_NTWK_KEY INTEGER NOT NULL, -- surrogate key
PRVDR_KEY INTEGER NOT NULL, -- FK to DIM_PROVIDER
PLAN_KEY INTEGER NOT NULL, -- FK to DIM_PLAN
NTWK_EFF_DT DATE NOT NULL, -- network effective date
NTWK_TERM_DT DATE, -- network termination date
PRVDR_NTWK_STS VARCHAR(10), -- active/pending/terminated
PRVDR_IN_NTWK_IND CHAR(1), -- in network indicator
PRVDR_ACCPT_PT_IND CHAR(1), -- accepting patients
PRVDR_PNL_STS VARCHAR(10), -- panel open/closed
PRVDR_CNTRCT_TYP_CD VARCHAR(10), -- contract type code
PRVDR_REIMB_RT DECIMAL(10,4), -- reimbursement rate
LOAD_DT TIMESTAMP_NTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT PK_FACT_PROVIDER_NETWORK
PRIMARY KEY (PRVDR_NTWK_KEY)
);
Provider Credentialing Table
-- Tracks credentialing lifecycle for each provider
CREATE TABLE FACT_PROVIDER_CREDENTIAL (
PRVDR_CRED_KEY INTEGER NOT NULL,
PRVDR_KEY INTEGER NOT NULL, -- FK to DIM_PROVIDER
CREDNTL_TYP_CD VARCHAR(20), -- credential type
CREDNTL_STS VARCHAR(20), -- credentialing status
APPL_RCVD_DT DATE, -- application received date
CMTE_APPR_DT DATE, -- committee approval date
CREDNTL_EFF_DT DATE, -- credential effective date
RECREDNTL_DUE_DT DATE, -- recredentialing due date
SANCT_IND CHAR(1), -- sanction indicator
EXCL_IND CHAR(1), -- exclusion indicator
DEBAR_IND CHAR(1), -- debarment indicator
LOAD_DT TIMESTAMP_NTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT PK_FACT_PROVIDER_CREDENTIAL
PRIMARY KEY (PRVDR_CRED_KEY)
);
Provider Performance Table
-- Annual provider quality and efficiency scores
CREATE TABLE FACT_PROVIDER_PERFORMANCE (
PRVDR_PERF_KEY INTEGER NOT NULL,
PRVDR_KEY INTEGER NOT NULL, -- FK to DIM_PROVIDER
PLAN_KEY INTEGER NOT NULL, -- FK to DIM_PLAN
MEAS_YR SMALLINT NOT NULL, -- measurement year
PRVDR_QLTY_SCR DECIMAL(5,2), -- quality score
PRVDR_STAR_RTG DECIMAL(3,1), -- star rating (1-5)
PRVDR_CST_EFF_SCR DECIMAL(5,2), -- cost efficiency score
PRVDR_HEDIS_SCR DECIMAL(5,2), -- [HEDIS](/terms/hedis) composite score
PRVDR_ATTR_MBR_CNT INTEGER, -- attributed member count
LOAD_DT TIMESTAMP_NTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT PK_FACT_PROVIDER_PERFORMANCE
PRIMARY KEY (PRVDR_PERF_KEY)
);
Generate this complete schema instantly using the mdatool AI Data Modeling tool — select Provider Network as your domain, Star Schema as your architecture, and your target platform. Production-ready DDL with ISO-11179 standard column names in 30 seconds.
Common Provider Data Analytics Use Cases
Healthcare data teams use provider data across a wide range of analytical programs:
Network Adequacy Analysis measures whether health plan members can access required provider types within defined distance and time standards. Analytics calculate member-to-provider distances by ZIP code using provider practice location data, identify geographic access gaps, and produce CMS-required network adequacy filings.
Provider Directory Maintenance ensures member-facing provider directories accurately reflect current network participation, accepting patient status, practice locations, and specialty information. CMS requires Medicare Advantage plans to update directory information within 30 days of any change and attest to accuracy quarterly.
Credentialing Workflow Management tracks provider applications through each verification step, calculates recredentialing due dates, generates advance notice communications, and produces NCQA accreditation audit documentation.
Provider Performance Reporting aggregates claims data to calculate provider-level quality rates, cost efficiency scores, and utilization patterns for value-based contract performance evaluation and tiered network placement decisions.
Fraud Detection and Program Integrity screens provider rosters against OIG exclusion lists, identifies billing patterns inconsistent with provider specialty or practice location, and detects providers with anomalous claim volumes or code distributions.
Value-Based Care Attribution assigns members to primary care providers based on plurality of qualifying primary care visits, supporting shared savings calculations, quality measure attribution, and care gap notification workflows.
Provider Data Quality Challenges
Provider data quality is notoriously difficult to maintain. Healthcare data teams must address these common challenges:
Provider identity resolution is the most complex provider data quality problem. The same physician may have multiple NPIs from different enrollment periods, bill under both individual and group NPIs, appear under name variations across source systems, and have different addresses in different payer databases. Master provider indexes using probabilistic matching on NPI, name, date of birth, and license number resolve these identities for accurate analytics.
Network participation lag occurs when providers join or leave networks but directory systems are not updated in a timely manner. Inaccurate network status data results in members receiving incorrect benefit tier information at point of service, leading to member complaints and regulatory findings.
Credentialing data gaps arise when credential expiration dates pass without renewal, license sanctions are not identified through ongoing monitoring, or primary source verification is not completed within required timeframes. Healthcare data teams implement automated monitoring workflows that flag expiring credentials and active sanctions for immediate review.
Taxonomy code accuracy affects fee schedule assignment and network adequacy measurement. Providers with incorrect primary taxonomy codes may be paid at wrong rates, included in incorrect specialty counts for adequacy analysis, or excluded from member directories for their actual specialty.
Provider Data Tools
mdatool provides several free tools specifically designed for provider data work:
- NPI Lookup — Search any provider by NPI number, name, organization, or specialty instantly
- AI Data Modeling — Generate a complete provider master data model for Snowflake, BigQuery, or Databricks in 30 seconds
- Data Model Canvas — Visualize your provider schema as an interactive ER diagram
- Naming Auditor — Audit provider data column names against ISO-11179 healthcare naming standards
- SQL Linter — Validate provider analytics SQL before it reaches production
- DDL Converter — Convert provider schema DDL between Snowflake, BigQuery, Databricks, and other platforms
Frequently Asked Questions
What is the difference between a Type 1 and Type 2 NPI? A Type 1 NPI is assigned to individual healthcare practitioners — physicians, nurse practitioners, physician assistants, and other licensed individual providers. A Type 2 NPI is assigned to organizational providers — hospitals, medical groups, clinics, skilled nursing facilities, and other entities that deliver care. Individual providers bill under their Type 1 NPI when working independently and may also bill under a Type 2 group NPI when working within a group practice.
What is provider credentialing and how long does it take? Provider credentialing is the formal process of verifying a provider qualifications, training, licensure, and professional standing before granting network participation or clinical privileges. The process involves primary source verification of medical education, training, board certifications, licenses, malpractice history, and exclusion status. Initial credentialing typically takes 60 to 90 days from completed application to committee approval. NCQA standards require health plans to complete credentialing within 180 days and recredential providers every three years.
What is the OIG exclusion list and why does it matter? The OIG List of Excluded Individuals and Entities is maintained by the Office of Inspector General and identifies providers and organizations excluded from participation in Medicare, Medicaid, and other federal healthcare programs due to fraud, abuse, patient harm, or other disqualifying conduct. Health plans and healthcare organizations are legally required to screen against the OIG exclusion list and may not employ or contract with excluded providers for services billed to federal programs. Monthly screening against the LEIE is a standard compliance requirement.
How do you maintain provider directory accuracy? Provider directory accuracy requires systematic processes including provider attestation portals where providers confirm their information regularly, automated outreach for status updates when information has not been verified recently, comparison against NPPES for demographic accuracy, and audit programs that verify directory information against actual appointment availability. CMS requires Medicare Advantage plans to update directories within 30 days of any change and to conduct quarterly attestation of directory accuracy.
What is network adequacy and how is it measured? Network adequacy is the requirement that a health plan maintain sufficient provider availability to ensure members can access covered services within defined distance and time standards. CMS measures Medicare Advantage network adequacy by calculating the percentage of members who live within required distance and time thresholds from contracted providers for each specialty type. Health plans must demonstrate adequacy for primary care, specialists, hospitals, and other provider types using member ZIP code data and provider practice location coordinates.
What is provider attribution and why does it matter? Provider attribution is the assignment of health plan members to a specific primary care provider for quality measurement, shared savings calculations, and care management coordination. CMS uses plurality of primary care evaluation and management visits to attribute Medicare Advantage members to providers for the Medicare Shared Savings Program and related value-based programs. Accurate attribution is essential for fair performance measurement — providers should only be accountable for outcomes of members they actually managed.
How are provider taxonomy codes used in claims adjudication? Provider taxonomy codes identify the provider specialty and service category and are used in claims adjudication to validate that the procedure codes billed are appropriate for the provider type, apply the correct fee schedule for the specialty and service setting, and route claims to the appropriate clinical review process for medical necessity determination. Incorrect taxonomy codes on claims can result in payment at wrong rates, inappropriate claim edits, or incorrect medical necessity determinations.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
Key Terms in This Article
Free Tools
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.