BlogHealthcare Data ModelingHCC Risk Adjustment Data Model: Building Accurate Risk Score Pipelines in SQL
Healthcare Data Modeling

HCC Risk Adjustment Data Model: Building Accurate Risk Score Pipelines in SQL

HCC risk adjustment determines how much revenue a health plan receives for each Medicare Advantage member. The data model behind it — from diagnosis code ingestion to RAF score calculation — is one of the most consequential schemas a payer data engineer will build.

mdatool Team·April 29, 2026·12 min read
HCCrisk adjustmentRAF scoreMedicare AdvantageCMS-HCChealthcare data modeling

Introduction

For a Medicare Advantage plan with 100,000 members, a 0.01 difference in average RAF score across the book translates to roughly $4 million in annual revenue. The RAF score is computed from the HCC risk adjustment data model — specifically from the diagnosis codes your data pipeline ingests, validates, maps to Hierarchical Condition Categories, and submits to CMS.

A modeling error that drops 3% of valid diagnosis codes does not produce an error. The pipeline runs clean, the submission goes out, and the plan receives significantly less than it should. You find out twelve months later during reconciliation — if at all.

This guide covers the CMS-HCC data model, the pipeline architecture that produces accurate RAF scores, the SQL patterns for each stage, and the common modeling mistakes that silently undercount risk.


How CMS-HCC Risk Adjustment Works

CMS risk-adjusts Medicare Advantage capitation payments to account for member health status. The process:

  1. CMS assigns a base rate (benchmark) for each county
  2. Each member's RAF (Risk Adjustment Factor) score is calculated from demographics and diagnosis history
  3. The plan's capitation payment = benchmark × average RAF score across members
  4. A RAF score of 1.0 means average expected cost. 1.5 means 50% above average.

The RAF score has two components:

Demographic score: based on age, sex, dual eligibility status, and Medicaid eligibility. Computed from enrollment data.

Disease score: based on HCC codes mapped from submitted diagnosis codes. Computed from encounter and claims data.

The CMS-HCC model (Version 28 as of 2024) maps roughly 10,000 ICD-10-CM diagnosis codes into 115 HCC categories. Each HCC has a coefficient that contributes to the RAF score. HCCs interact — comorbidity adjustments apply when certain HCC pairs occur together.


The Core Data Model

The HCC risk adjustment pipeline requires these key tables:

-- Diagnosis encounter staging
CREATE TABLE stg_risk_adjustment_encounters (
    encounter_key           BIGINT          NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id               VARCHAR(50)     NOT NULL,
    rendering_npi_id        CHAR(10)        NOT NULL,
    service_date            DATE            NOT NULL,
    face_to_face_flag       BOOLEAN         NOT NULL DEFAULT FALSE,
    diagnosis_code          VARCHAR(10)     NOT NULL,
    diagnosis_code_type     CHAR(5)         NOT NULL DEFAULT 'ICD10',
    data_source             VARCHAR(50)     NOT NULL,  -- 'CLAIMS','ENCOUNTER','CHART'
    submission_year         SMALLINT        NOT NULL,
    loaded_datetime         TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_stg_ra_encounters PRIMARY KEY (encounter_key)
);

CREATE INDEX idx_stg_ra_member_year ON stg_risk_adjustment_encounters (member_id, submission_year);
CREATE INDEX idx_stg_ra_diagnosis   ON stg_risk_adjustment_encounters (diagnosis_code);
-- CMS-HCC mapping reference (loaded from CMS software release)
CREATE TABLE ref_icd10_to_hcc (
    icd10_code      VARCHAR(10)  NOT NULL,
    hcc_number      SMALLINT     NOT NULL,
    hcc_label       VARCHAR(200) NOT NULL,
    model_version   VARCHAR(10)  NOT NULL,  -- 'V28', 'V24'
    effective_year  SMALLINT     NOT NULL,

    CONSTRAINT pk_icd10_hcc PRIMARY KEY (icd10_code, hcc_number, model_version, effective_year)
);

-- HCC coefficient table (CMS model coefficients by segment)
CREATE TABLE ref_hcc_coefficients (
    model_version   VARCHAR(10)     NOT NULL,
    payment_year    SMALLINT        NOT NULL,
    hcc_number      SMALLINT        NOT NULL,
    segment         VARCHAR(50)     NOT NULL,  -- 'CNA','ESRD','NE','SNPNE'
    coefficient     DECIMAL(10, 6)  NOT NULL,

    CONSTRAINT pk_hcc_coefficients PRIMARY KEY (model_version, payment_year, hcc_number, segment)
);
-- Member HCC summary — one row per member per HCC per year
CREATE TABLE fct_member_hcc (
    member_hcc_key      BIGINT    NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id           VARCHAR(50) NOT NULL,
    hcc_number          SMALLINT    NOT NULL,
    submission_year     SMALLINT    NOT NULL,
    encounter_count     SMALLINT    NOT NULL DEFAULT 1,
    first_service_date  DATE        NOT NULL,
    last_service_date   DATE        NOT NULL,

    CONSTRAINT pk_member_hcc PRIMARY KEY (member_hcc_key),
    CONSTRAINT uq_member_hcc_year UNIQUE (member_id, hcc_number, submission_year)
);

-- Member RAF score — final output
CREATE TABLE fct_member_raf_score (
    member_id           VARCHAR(50)     NOT NULL,
    payment_year        SMALLINT        NOT NULL,
    segment             VARCHAR(50)     NOT NULL,
    demographic_score   DECIMAL(10, 6)  NOT NULL,
    disease_score       DECIMAL(10, 6)  NOT NULL,
    interaction_score   DECIMAL(10, 6)  NOT NULL DEFAULT 0,
    total_raf_score     DECIMAL(10, 6)  NOT NULL,
    hcc_count           SMALLINT        NOT NULL,
    calculated_datetime TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_member_raf PRIMARY KEY (member_id, payment_year, segment)
);

Stage 1: Diagnosis Code Ingestion and Validation

The first pipeline stage ingests diagnosis codes from multiple sources — claims (837P/837I), encounter records, and chart reviews — and applies validation rules.

-- Load validated encounters only
-- Face-to-face flag must be TRUE for risk adjustment eligibility
INSERT INTO stg_risk_adjustment_encounters (
    member_id, rendering_npi_id, service_date, face_to_face_flag,
    diagnosis_code, data_source, submission_year
)
SELECT
    s.member_id,
    s.rendering_npi_id,
    s.service_date,
    s.face_to_face_flag,
    s.diagnosis_code,
    s.data_source,
    EXTRACT(YEAR FROM s.service_date)::SMALLINT
FROM stg_raw_encounters s
INNER JOIN ref_icd10_codes r
    ON s.diagnosis_code = r.icd10_code
   AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE s.face_to_face_flag = TRUE
  AND s.service_date >= DATE_TRUNC('year', CURRENT_DATE - INTERVAL '2 years');

-- Audit: how many codes were rejected?
SELECT
    s.diagnosis_code,
    COUNT(*) AS rejected_count
FROM stg_raw_encounters s
LEFT JOIN ref_icd10_codes r
    ON s.diagnosis_code = r.icd10_code
   AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE r.icd10_code IS NULL
GROUP BY s.diagnosis_code
ORDER BY rejected_count DESC;

Stage 2: ICD-10 to HCC Mapping

Map valid diagnosis codes to HCC categories using the CMS crosswalk:

-- Deduplicate to one row per member per HCC per year
-- A member gets credit for an HCC once regardless of how many times the code appears
INSERT INTO fct_member_hcc (
    member_id, hcc_number, submission_year,
    encounter_count, first_service_date, last_service_date
)
SELECT
    e.member_id,
    h.hcc_number,
    e.submission_year,
    COUNT(DISTINCT e.service_date) AS encounter_count,
    MIN(e.service_date)            AS first_service_date,
    MAX(e.service_date)            AS last_service_date
FROM stg_risk_adjustment_encounters e
JOIN ref_icd10_to_hcc h
    ON e.diagnosis_code = h.icd10_code
   AND h.model_version = 'V28'
   AND h.effective_year = e.submission_year
GROUP BY e.member_id, h.hcc_number, e.submission_year
ON CONFLICT (member_id, hcc_number, submission_year)
DO UPDATE SET
    encounter_count   = EXCLUDED.encounter_count,
    last_service_date = EXCLUDED.last_service_date;

Stage 3: HCC Hierarchy Application

The CMS-HCC model uses hierarchies — when a more severe HCC in a category is present, less severe HCCs in the same category are suppressed. For example, HCC 19 (Diabetes without Complication) is suppressed if HCC 17 or HCC 18 is present.

-- HCC hierarchy suppression table
CREATE TABLE ref_hcc_hierarchy (
    model_version   VARCHAR(10) NOT NULL,
    higher_hcc      SMALLINT    NOT NULL,  -- if present...
    lower_hcc       SMALLINT    NOT NULL,  -- ...suppress this one
    CONSTRAINT pk_hcc_hierarchy PRIMARY KEY (model_version, higher_hcc, lower_hcc)
);

-- Identify suppressed HCCs for each member
CREATE OR REPLACE VIEW vw_member_hcc_after_hierarchy AS
SELECT
    m.*,
    EXISTS (
        SELECT 1
        FROM ref_hcc_hierarchy h
        JOIN fct_member_hcc m2
            ON h.higher_hcc = m2.hcc_number
           AND m2.member_id = m.member_id
           AND m2.submission_year = m.submission_year
        WHERE h.lower_hcc = m.hcc_number
          AND h.model_version = 'V28'
    ) AS is_suppressed_flag
FROM fct_member_hcc m;

Stage 4: RAF Score Calculation

-- Sum coefficients for non-suppressed HCCs to get disease score
WITH disease_scores AS (
    SELECT
        m.member_id,
        m.submission_year,
        SUM(c.coefficient) AS disease_score,
        COUNT(*)           AS hcc_count
    FROM vw_member_hcc_after_hierarchy m
    JOIN ref_hcc_coefficients c
        ON m.hcc_number = c.hcc_number
       AND c.model_version = 'V28'
       AND c.payment_year = m.submission_year + 1
       AND c.segment = 'CNA'
    WHERE m.is_suppressed_flag = FALSE
    GROUP BY m.member_id, m.submission_year
)
INSERT INTO fct_member_raf_score (
    member_id, payment_year, segment,
    demographic_score, disease_score, interaction_score, total_raf_score, hcc_count
)
SELECT
    d.member_id,
    d.submission_year + 1       AS payment_year,
    'CNA'                       AS segment,
    dem.total_demographic_score AS demographic_score,
    d.disease_score,
    0                           AS interaction_score,
    dem.total_demographic_score + d.disease_score AS total_raf_score,
    d.hcc_count
FROM disease_scores d
JOIN fct_member_demographic_score dem
    ON d.member_id = dem.member_id
   AND dem.payment_year = d.submission_year + 1
ON CONFLICT (member_id, payment_year, segment)
DO UPDATE SET
    disease_score       = EXCLUDED.disease_score,
    total_raf_score     = EXCLUDED.total_raf_score,
    hcc_count           = EXCLUDED.hcc_count,
    calculated_datetime = CURRENT_TIMESTAMP;

Common Modeling Mistakes That Undercount Risk

Not filtering for face-to-face encounters. CMS requires diagnosis codes to come from face-to-face encounters with eligible provider types. Lab-only visits, ancillary services, and telehealth from non-eligible providers do not count. A pipeline that ingests all encounters without this filter loads diagnosis codes that CMS will reject at submission.

Using claim received dates instead of service dates. Late claims can arrive in the following year. A service date of December 2025 on a claim loaded in February 2026 should count toward the 2025 submission year. Always use service_date, never claim_received_date.

Missing the hierarchy application step. If your pipeline sums coefficients for all HCCs without applying hierarchy suppression, you will overcount RAF scores in your internal projections — and your numbers will not match CMS's calculation.

Not deduplicating across data sources. The same encounter may appear in claims, an encounter feed, and a chart review extract. Without deduplication at the member-HCC-year level, you may count the same HCC multiple times internally while CMS counts it once.

Dropping diagnosis codes with invalid NPI. If your validation pipeline rejects encounters with unvalidated NPIs before the diagnosis code is mapped to an HCC, you lose risk score contributions from those encounters. Validate NPIs separately — do not drop the diagnosis code because the provider record has a formatting issue.


Frequently Asked Questions

How far back does CMS look at diagnosis codes?

For the standard prospective model, CMS uses diagnoses from the prior calendar year. Diagnosis codes from 2025 determine risk scores for payment year 2026. The RAPS and EDPS submission windows open each year — check the CMS RADV schedule for current deadlines.

What is the difference between RAPS and EDPS submissions?

RAPS (legacy) accepts diagnosis codes extracted from claims. EDPS (current standard) accepts full encounter records — demographics, procedure codes, diagnosis codes, and provider information. Most plans submit via both pathways. Your data model should support both submission formats.

How do we handle mid-year member additions to the RAF model?

CMS applies a partial-year factor for members who join a plan mid-year. The RAF score for those members is prorated. Your fct_member_raf_score table should store both the full RAF score and the enrollment months count used by CMS for the prorated calculation.


Operationalizing HCC Risk Adjustment with mdatool

For payer data teams building and maintaining HCC risk adjustment pipelines, mdatool provides tooling at every stage. The mdatool HCC Calculator validates individual member risk scores — enter demographics and HCC codes to verify your RAF calculation against the CMS-HCC model coefficients without a full pipeline run. The mdatool ICD-10 Search lets your coding and engineering teams look up ICD-10 codes, confirm their HCC mapping, and verify active status before diagnosis codes enter your submission pipeline. The mdatool Naming Auditor enforces the column naming conventions — submission_year, hcc_number, total_raf_score — that make risk adjustment schemas consistent across your data team. The mdatool Healthcare Data Dictionary contains HCC, RAF, CMS-HCC, RAPS, and EDPS terminology so your engineering, actuarial, and compliance teams share a common vocabulary.

M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free