Introduction
For a Medicare Advantage plan with 100,000 members, a 0.01 difference in average RAF score across the book translates to roughly $4 million in annual revenue. The RAF score is computed from the HCC risk adjustment data model — specifically from the diagnosis codes your data pipeline ingests, validates, maps to Hierarchical Condition Categories, and submits to CMS.
A modeling error that drops 3% of valid diagnosis codes does not produce an error. The pipeline runs clean, the submission goes out, and the plan receives significantly less than it should. You find out twelve months later during reconciliation — if at all.
This guide covers the CMS-HCC data model, the pipeline architecture that produces accurate RAF scores, the SQL patterns for each stage, and the common modeling mistakes that silently undercount risk.
How CMS-HCC Risk Adjustment Works
CMS risk-adjusts Medicare Advantage capitation payments to account for member health status. The process:
- CMS assigns a base rate (benchmark) for each county
- Each member's RAF (Risk Adjustment Factor) score is calculated from demographics and diagnosis history
- The plan's capitation payment = benchmark × average RAF score across members
- A RAF score of 1.0 means average expected cost. 1.5 means 50% above average.
The RAF score has two components:
Demographic score: based on age, sex, dual eligibility status, and Medicaid eligibility. Computed from enrollment data.
Disease score: based on HCC codes mapped from submitted diagnosis codes. Computed from encounter and claims data.
The CMS-HCC model (Version 28 as of 2024) maps roughly 10,000 ICD-10-CM diagnosis codes into 115 HCC categories. Each HCC has a coefficient that contributes to the RAF score. HCCs interact — comorbidity adjustments apply when certain HCC pairs occur together.
The Core Data Model
The HCC risk adjustment pipeline requires these key tables:
-- Diagnosis encounter staging
CREATE TABLE stg_risk_adjustment_encounters (
encounter_key BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
member_id VARCHAR(50) NOT NULL,
rendering_npi_id CHAR(10) NOT NULL,
service_date DATE NOT NULL,
face_to_face_flag BOOLEAN NOT NULL DEFAULT FALSE,
diagnosis_code VARCHAR(10) NOT NULL,
diagnosis_code_type CHAR(5) NOT NULL DEFAULT 'ICD10',
data_source VARCHAR(50) NOT NULL, -- 'CLAIMS','ENCOUNTER','CHART'
submission_year SMALLINT NOT NULL,
loaded_datetime TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT pk_stg_ra_encounters PRIMARY KEY (encounter_key)
);
CREATE INDEX idx_stg_ra_member_year ON stg_risk_adjustment_encounters (member_id, submission_year);
CREATE INDEX idx_stg_ra_diagnosis ON stg_risk_adjustment_encounters (diagnosis_code);
-- CMS-HCC mapping reference (loaded from CMS software release)
CREATE TABLE ref_icd10_to_hcc (
icd10_code VARCHAR(10) NOT NULL,
hcc_number SMALLINT NOT NULL,
hcc_label VARCHAR(200) NOT NULL,
model_version VARCHAR(10) NOT NULL, -- 'V28', 'V24'
effective_year SMALLINT NOT NULL,
CONSTRAINT pk_icd10_hcc PRIMARY KEY (icd10_code, hcc_number, model_version, effective_year)
);
-- HCC coefficient table (CMS model coefficients by segment)
CREATE TABLE ref_hcc_coefficients (
model_version VARCHAR(10) NOT NULL,
payment_year SMALLINT NOT NULL,
hcc_number SMALLINT NOT NULL,
segment VARCHAR(50) NOT NULL, -- 'CNA','ESRD','NE','SNPNE'
coefficient DECIMAL(10, 6) NOT NULL,
CONSTRAINT pk_hcc_coefficients PRIMARY KEY (model_version, payment_year, hcc_number, segment)
);
-- Member HCC summary — one row per member per HCC per year
CREATE TABLE fct_member_hcc (
member_hcc_key BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
member_id VARCHAR(50) NOT NULL,
hcc_number SMALLINT NOT NULL,
submission_year SMALLINT NOT NULL,
encounter_count SMALLINT NOT NULL DEFAULT 1,
first_service_date DATE NOT NULL,
last_service_date DATE NOT NULL,
CONSTRAINT pk_member_hcc PRIMARY KEY (member_hcc_key),
CONSTRAINT uq_member_hcc_year UNIQUE (member_id, hcc_number, submission_year)
);
-- Member RAF score — final output
CREATE TABLE fct_member_raf_score (
member_id VARCHAR(50) NOT NULL,
payment_year SMALLINT NOT NULL,
segment VARCHAR(50) NOT NULL,
demographic_score DECIMAL(10, 6) NOT NULL,
disease_score DECIMAL(10, 6) NOT NULL,
interaction_score DECIMAL(10, 6) NOT NULL DEFAULT 0,
total_raf_score DECIMAL(10, 6) NOT NULL,
hcc_count SMALLINT NOT NULL,
calculated_datetime TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT pk_member_raf PRIMARY KEY (member_id, payment_year, segment)
);
Stage 1: Diagnosis Code Ingestion and Validation
The first pipeline stage ingests diagnosis codes from multiple sources — claims (837P/837I), encounter records, and chart reviews — and applies validation rules.
-- Load validated encounters only
-- Face-to-face flag must be TRUE for risk adjustment eligibility
INSERT INTO stg_risk_adjustment_encounters (
member_id, rendering_npi_id, service_date, face_to_face_flag,
diagnosis_code, data_source, submission_year
)
SELECT
s.member_id,
s.rendering_npi_id,
s.service_date,
s.face_to_face_flag,
s.diagnosis_code,
s.data_source,
EXTRACT(YEAR FROM s.service_date)::SMALLINT
FROM stg_raw_encounters s
INNER JOIN ref_icd10_codes r
ON s.diagnosis_code = r.icd10_code
AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE s.face_to_face_flag = TRUE
AND s.service_date >= DATE_TRUNC('year', CURRENT_DATE - INTERVAL '2 years');
-- Audit: how many codes were rejected?
SELECT
s.diagnosis_code,
COUNT(*) AS rejected_count
FROM stg_raw_encounters s
LEFT JOIN ref_icd10_codes r
ON s.diagnosis_code = r.icd10_code
AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE r.icd10_code IS NULL
GROUP BY s.diagnosis_code
ORDER BY rejected_count DESC;
Stage 2: ICD-10 to HCC Mapping
Map valid diagnosis codes to HCC categories using the CMS crosswalk:
-- Deduplicate to one row per member per HCC per year
-- A member gets credit for an HCC once regardless of how many times the code appears
INSERT INTO fct_member_hcc (
member_id, hcc_number, submission_year,
encounter_count, first_service_date, last_service_date
)
SELECT
e.member_id,
h.hcc_number,
e.submission_year,
COUNT(DISTINCT e.service_date) AS encounter_count,
MIN(e.service_date) AS first_service_date,
MAX(e.service_date) AS last_service_date
FROM stg_risk_adjustment_encounters e
JOIN ref_icd10_to_hcc h
ON e.diagnosis_code = h.icd10_code
AND h.model_version = 'V28'
AND h.effective_year = e.submission_year
GROUP BY e.member_id, h.hcc_number, e.submission_year
ON CONFLICT (member_id, hcc_number, submission_year)
DO UPDATE SET
encounter_count = EXCLUDED.encounter_count,
last_service_date = EXCLUDED.last_service_date;
Stage 3: HCC Hierarchy Application
The CMS-HCC model uses hierarchies — when a more severe HCC in a category is present, less severe HCCs in the same category are suppressed. For example, HCC 19 (Diabetes without Complication) is suppressed if HCC 17 or HCC 18 is present.
-- HCC hierarchy suppression table
CREATE TABLE ref_hcc_hierarchy (
model_version VARCHAR(10) NOT NULL,
higher_hcc SMALLINT NOT NULL, -- if present...
lower_hcc SMALLINT NOT NULL, -- ...suppress this one
CONSTRAINT pk_hcc_hierarchy PRIMARY KEY (model_version, higher_hcc, lower_hcc)
);
-- Identify suppressed HCCs for each member
CREATE OR REPLACE VIEW vw_member_hcc_after_hierarchy AS
SELECT
m.*,
EXISTS (
SELECT 1
FROM ref_hcc_hierarchy h
JOIN fct_member_hcc m2
ON h.higher_hcc = m2.hcc_number
AND m2.member_id = m.member_id
AND m2.submission_year = m.submission_year
WHERE h.lower_hcc = m.hcc_number
AND h.model_version = 'V28'
) AS is_suppressed_flag
FROM fct_member_hcc m;
Stage 4: RAF Score Calculation
-- Sum coefficients for non-suppressed HCCs to get disease score
WITH disease_scores AS (
SELECT
m.member_id,
m.submission_year,
SUM(c.coefficient) AS disease_score,
COUNT(*) AS hcc_count
FROM vw_member_hcc_after_hierarchy m
JOIN ref_hcc_coefficients c
ON m.hcc_number = c.hcc_number
AND c.model_version = 'V28'
AND c.payment_year = m.submission_year + 1
AND c.segment = 'CNA'
WHERE m.is_suppressed_flag = FALSE
GROUP BY m.member_id, m.submission_year
)
INSERT INTO fct_member_raf_score (
member_id, payment_year, segment,
demographic_score, disease_score, interaction_score, total_raf_score, hcc_count
)
SELECT
d.member_id,
d.submission_year + 1 AS payment_year,
'CNA' AS segment,
dem.total_demographic_score AS demographic_score,
d.disease_score,
0 AS interaction_score,
dem.total_demographic_score + d.disease_score AS total_raf_score,
d.hcc_count
FROM disease_scores d
JOIN fct_member_demographic_score dem
ON d.member_id = dem.member_id
AND dem.payment_year = d.submission_year + 1
ON CONFLICT (member_id, payment_year, segment)
DO UPDATE SET
disease_score = EXCLUDED.disease_score,
total_raf_score = EXCLUDED.total_raf_score,
hcc_count = EXCLUDED.hcc_count,
calculated_datetime = CURRENT_TIMESTAMP;
Common Modeling Mistakes That Undercount Risk
Not filtering for face-to-face encounters. CMS requires diagnosis codes to come from face-to-face encounters with eligible provider types. Lab-only visits, ancillary services, and telehealth from non-eligible providers do not count. A pipeline that ingests all encounters without this filter loads diagnosis codes that CMS will reject at submission.
Using claim received dates instead of service dates. Late claims can arrive in the following year. A service date of December 2025 on a claim loaded in February 2026 should count toward the 2025 submission year. Always use service_date, never claim_received_date.
Missing the hierarchy application step. If your pipeline sums coefficients for all HCCs without applying hierarchy suppression, you will overcount RAF scores in your internal projections — and your numbers will not match CMS's calculation.
Not deduplicating across data sources. The same encounter may appear in claims, an encounter feed, and a chart review extract. Without deduplication at the member-HCC-year level, you may count the same HCC multiple times internally while CMS counts it once.
Dropping diagnosis codes with invalid NPI. If your validation pipeline rejects encounters with unvalidated NPIs before the diagnosis code is mapped to an HCC, you lose risk score contributions from those encounters. Validate NPIs separately — do not drop the diagnosis code because the provider record has a formatting issue.
Frequently Asked Questions
How far back does CMS look at diagnosis codes?
For the standard prospective model, CMS uses diagnoses from the prior calendar year. Diagnosis codes from 2025 determine risk scores for payment year 2026. The RAPS and EDPS submission windows open each year — check the CMS RADV schedule for current deadlines.
What is the difference between RAPS and EDPS submissions?
RAPS (legacy) accepts diagnosis codes extracted from claims. EDPS (current standard) accepts full encounter records — demographics, procedure codes, diagnosis codes, and provider information. Most plans submit via both pathways. Your data model should support both submission formats.
How do we handle mid-year member additions to the RAF model?
CMS applies a partial-year factor for members who join a plan mid-year. The RAF score for those members is prorated. Your fct_member_raf_score table should store both the full RAF score and the enrollment months count used by CMS for the prorated calculation.
Operationalizing HCC Risk Adjustment with mdatool
For payer data teams building and maintaining HCC risk adjustment pipelines, mdatool provides tooling at every stage. The mdatool HCC Calculator validates individual member risk scores — enter demographics and HCC codes to verify your RAF calculation against the CMS-HCC model coefficients without a full pipeline run. The mdatool ICD-10 Search lets your coding and engineering teams look up ICD-10 codes, confirm their HCC mapping, and verify active status before diagnosis codes enter your submission pipeline. The mdatool Naming Auditor enforces the column naming conventions — submission_year, hcc_number, total_raf_score — that make risk adjustment schemas consistent across your data team. The mdatool Healthcare Data Dictionary contains HCC, RAF, CMS-HCC, RAPS, and EDPS terminology so your engineering, actuarial, and compliance teams share a common vocabulary.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
More in Healthcare Data Modeling
NPI Number Validation: How to Clean and Enrich Provider Data in Your Database
Invalid NPI numbers in your provider table silently break claims routing, credentialing workflows, and CMS submissions. This guide covers validation approaches, NPPES enrichment patterns, and SQL queries for catching NPI data quality problems before they reach production.
Read moreHEDIS Measure Data Architecture: Building a Compliant Quality Reporting Pipeline
HEDIS measure calculation is deceptively complex. NCQA technical specifications are hundreds of pages long, source data comes from four different systems, and one logic error can affect thousands of members. Here is how to build a pipeline that holds up to NCQA audit.
Read morePrior Authorization Data Modeling: End-to-End Architecture Guide
Prior authorization is one of the most operationally complex workflows in healthcare — and one of the most data-intensive. Here is the end-to-end data model, from PA request through appeal.
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.