BlogData ModelingICD-10 vs ICD-11: What Changes for Your Data Model
Data Modeling

ICD-10 vs ICD-11: What Changes for Your Data Model

ICD-11 is not a minor revision — it restructures the entire classification hierarchy, expands code length, and introduces new data types. Here is what every healthcare data engineer needs to know before their warehouse is forced to migrate.

mdatool Team·April 14, 2026·9 min read
ICD-10ICD-11Data ModelingHealthcare ClaimsSchema Migration

Why ICD-11 Matters for Your Data Model

The World Health Organization officially released ICD-11 in 2019, and countries are actively migrating — with the United States expected to follow after ICD-10-CM is fully stabilized. For healthcare data engineers, this is not a business problem — it is a schema problem. ICD-11 changes code length, introduces new structural concepts, and forces a rethinking of how diagnosis data is stored, indexed, and queried.

This article breaks down exactly what changes, why it matters for your data model, and how to prepare now.

What Is ICD-10 and What Are Its Limits?

ICD-10-CM (the US clinical modification) uses alphanumeric codes up to 7 characters in the format A00.0XXA. Your current schema almost certainly stores these as VARCHAR(7) or CHAR(7).

Limitations that ICD-11 directly addresses:

  • Fixed hierarchy: ICD-10 has a rigid parent-child tree that cannot express modern disease understanding (e.g., genomic conditions, post-COVID sequelae)
  • Limited granularity: Many conditions share a single code with no way to encode severity, etiology, or manifestation separately
  • No extension mechanism: Clinicians have to memorize workarounds for conditions the hierarchy cannot represent

How ICD-11 Changes the Code Structure

Code Format

ICD-11 uses a 4-character alphanumeric stem code followed by optional extension codes. The full expression can reach 16 characters or more when extension codes are concatenated.

PropertyICD-10-CMICD-11
Max stem code length7 chars4 chars
Extension codesNoneYes (unlimited chaining)
Full code max length7 chars16+ chars
Character setAlpha + numericAlpha + numeric (no I, O)
Hierarchy depthFixedFlexible (multiple parents)
Post-coordinationNoYes

Impact on your schema: Any column storing a diagnosis code as VARCHAR(7) or CHAR(7) will truncate ICD-11 codes. You need to plan for at least VARCHAR(20) on any column that will hold ICD-11 post-coordinated expressions.

Post-Coordination

This is the biggest structural change. ICD-11 allows post-coordination — combining a stem code with extension codes to express a fully specified clinical meaning.

Example: A traumatic brain injury in a specific anatomical location, with a specific severity, caused by a specific mechanism, would previously require multiple separate codes. In ICD-11, this becomes one post-coordinated expression:

NA07.0 / XA5Q57 / XS0G / XE1XR

Storing this in a single VARCHAR(7) column is impossible. You need a redesigned schema.

Schema Changes You Will Need to Make

Option 1: Widen the Column (Minimum Change)

The simplest migration — safe for storing codes, not for querying sub-components:

ALTER TABLE encounters
  ALTER COLUMN diagnosis_code TYPE VARCHAR(20);

ALTER TABLE claims
  ALTER COLUMN primary_diagnosis TYPE VARCHAR(20),
  ALTER COLUMN secondary_diagnosis_1 TYPE VARCHAR(20),
  ALTER COLUMN secondary_diagnosis_2 TYPE VARCHAR(20);

Use the mdatool DDL Converter to quickly rewrite your CREATE TABLE statements across dialects (Snowflake, Redshift, BigQuery, SQL Server) when applying this change.

For teams doing clinical analysis, store the stem and extensions separately:

CREATE TABLE encounter_diagnoses (
  encounter_id       VARCHAR(36)  NOT NULL,
  diagnosis_seq      SMALLINT     NOT NULL,       -- 1=primary, 2+=secondary
  code_version       VARCHAR(10)  NOT NULL,       -- 'ICD-10-CM' | 'ICD-11'
  stem_code          VARCHAR(10)  NOT NULL,
  extension_codes    VARCHAR(200),                -- pipe-delimited or JSON
  full_expression    VARCHAR(500),                -- full post-coordinated string
  code_description   VARCHAR(500),
  present_on_admit   CHAR(1),
  PRIMARY KEY (encounter_id, diagnosis_seq)
);

Option 3: JSON Column for Extension Codes

If your warehouse supports semi-structured data natively (Snowflake VARIANT, BigQuery JSON, Redshift SUPER):

CREATE TABLE encounter_diagnoses (
  encounter_id   VARCHAR(36)   NOT NULL,
  diagnosis_seq  SMALLINT      NOT NULL,
  code_version   VARCHAR(10)   NOT NULL,
  stem_code      VARCHAR(10)   NOT NULL,
  extensions     VARIANT,                   -- Snowflake; use JSON in BigQuery
  PRIMARY KEY (encounter_id, diagnosis_seq)
);

ICD-11 and Your Existing Lookups

Your current codebase likely has an icd10_codes reference table or joins to an external ICD-10 lookup. That reference table needs to be version-aware for a dual-code-version world.

CREATE TABLE diagnosis_reference (
  code_version     VARCHAR(10)   NOT NULL,   -- 'ICD-10-CM' | 'ICD-11'
  code             VARCHAR(20)   NOT NULL,
  description      VARCHAR(500)  NOT NULL,
  chapter          VARCHAR(200),
  block            VARCHAR(200),
  is_billable      BOOLEAN       DEFAULT TRUE,
  effective_date   DATE,
  end_date         DATE,
  PRIMARY KEY (code_version, code)
);

Use the mdatool ICD-10 Search tool to look up current ICD-10-CM codes while you plan your migration — and cross-reference which ICD-10 codes map to which ICD-11 stems using the WHO's official crosswalk tables.

ICD-11 and HCC Risk Adjustment

ICD codes feed directly into HCC (Hierarchical Condition Category) risk scores. If you use HCC-based risk adjustment models, you will need to update your HCC mapping logic alongside your schema migration — ICD-11 stem codes do not map 1:1 to current HCC categories.

Use the mdatool HCC Calculator to validate current ICD-10-based HCC assignments while you document which HCC mappings will need to change under ICD-11.

Naming Your New Columns Consistently

When you add code_version, stem_code, and extension_codes columns, use consistent naming across all tables. The mdatool Naming Auditor can scan your DDL and flag columns that deviate from your naming convention before they reach production.

Transition Planning: Living With Both Versions

The US healthcare system will operate with ICD-10-CM and ICD-11 in parallel for years. Your pipeline needs to handle both. Key principles:

  • Always store code_version alongside every diagnosis code — never assume the version
  • Index on (code_version, stem_code) for efficient lookups
  • Never hardcode version-specific logic in application code — move it to reference tables
  • Validate at ingestion, not at query time — bad codes are cheaper to reject early

Summary

ActionPriority
Widen all diagnosis code columns to VARCHAR(20)High — do now
Add code_version column to all diagnosis tablesHigh — do now
Redesign lookup tables to be version-awareMedium
Plan stem + extension separation for clinical use casesMedium
Update HCC mapping logic for ICD-11 stemsLow — monitor WHO timeline

ICD-11 migration is not imminent for US payers, but the schema changes needed are significant enough that starting to plan now — before the regulatory mandate — is the right call.

Explore mdatool's Data Dictionary for definitions of HCC, ICD, DRG, and other healthcare coding terms your team will need during this migration.

M

mdatool Team

The mdatool team builds free tools for healthcare data engineers — DDL converters, SQL linters, naming auditors, and data modeling guides.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free