Why ICD-11 Matters for Your Data Model

The World Health Organization officially released ICD-11 in 2019, and countries are actively migrating — with the United States expected to follow after ICD-10-CM is fully stabilized. For healthcare data engineers, this is not a business problem — it is a schema problem. ICD-11 changes code length, introduces new structural concepts, and forces a rethinking of how diagnosis data is stored, indexed, and queried.

This article breaks down exactly what changes, why it matters for your data model, and how to prepare now.

What Is ICD-10 and What Are Its Limits?

ICD-10-CM (the US clinical modification) uses alphanumeric codes up to 7 characters in the format A00.0XXA. Your current schema almost certainly stores these as VARCHAR(7) or CHAR(7).

Limitations that ICD-11 directly addresses:

Fixed hierarchy: ICD-10 has a rigid parent-child tree that cannot express modern disease understanding (e.g., genomic conditions, post-COVID sequelae)
Limited granularity: Many conditions share a single code with no way to encode severity, etiology, or manifestation separately
No extension mechanism: Clinicians have to memorize workarounds for conditions the hierarchy cannot represent

How ICD-11 Changes the Code Structure

Code Format

ICD-11 uses a 4-character alphanumeric stem code followed by optional extension codes. The full expression can reach 16 characters or more when extension codes are concatenated.

Property	ICD-10-CM	ICD-11
Max stem code length	7 chars	4 chars
Extension codes	None	Yes (unlimited chaining)
Full code max length	7 chars	16+ chars
Character set	Alpha + numeric	Alpha + numeric (no I, O)
Hierarchy depth	Fixed	Flexible (multiple parents)
Post-coordination	No	Yes

Impact on your schema: Any column storing a diagnosis code as VARCHAR(7) or CHAR(7) will truncate ICD-11 codes. You need to plan for at least VARCHAR(20) on any column that will hold ICD-11 post-coordinated expressions.

Post-Coordination

This is the biggest structural change. ICD-11 allows post-coordination — combining a stem code with extension codes to express a fully specified clinical meaning.

Example: A traumatic brain injury in a specific anatomical location, with a specific severity, caused by a specific mechanism, would previously require multiple separate codes. In ICD-11, this becomes one post-coordinated expression:

NA07.0 / XA5Q57 / XS0G / XE1XR

Storing this in a single VARCHAR(7) column is impossible. You need a redesigned schema.

Schema Changes You Will Need to Make

Option 1: Widen the Column (Minimum Change)

The simplest migration — safe for storing codes, not for querying sub-components:

ALTER TABLE encounters
  ALTER COLUMN diagnosis_code TYPE VARCHAR(20);

ALTER TABLE claims
  ALTER COLUMN primary_diagnosis TYPE VARCHAR(20),
  ALTER COLUMN secondary_diagnosis_1 TYPE VARCHAR(20),
  ALTER COLUMN secondary_diagnosis_2 TYPE VARCHAR(20);

Use the mdatool DDL Converter to quickly rewrite your CREATE TABLE statements across dialects (Snowflake, Redshift, BigQuery, SQL Server) when applying this change.

🔄DDL Converter

Convert DDL instantly between Snowflake, BigQuery, Databricks, and PostgreSQL.

Try it free

Option 2: Separate Stem and Extension (Recommended)

For teams doing clinical analysis, store the stem and extensions separately:

CREATE TABLE encounter_diagnoses (
  encounter_id       VARCHAR(36)  NOT NULL,
  diagnosis_seq      SMALLINT     NOT NULL,       -- 1=primary, 2+=secondary
  code_version       VARCHAR(10)  NOT NULL,       -- 'ICD-10-CM' | 'ICD-11'
  stem_code          VARCHAR(10)  NOT NULL,
  extension_codes    VARCHAR(200),                -- pipe-delimited or JSON
  full_expression    VARCHAR(500),                -- full post-coordinated string
  code_desc   VARCHAR(500),
  present_on_admit   CHAR(1),
  PRIMARY KEY (encounter_id, diagnosis_seq)
);

Option 3: JSON Column for Extension Codes

If your warehouse supports semi-structured data natively (Snowflake VARIANT, BigQuery JSON, Redshift SUPER):

CREATE TABLE encounter_diagnoses (
  encounter_id   VARCHAR(36)   NOT NULL,
  diagnosis_seq  SMALLINT      NOT NULL,
  code_version   VARCHAR(10)   NOT NULL,
  stem_code      VARCHAR(10)   NOT NULL,
  extensions     VARIANT,                   -- Snowflake; use JSON in BigQuery
  PRIMARY KEY (encounter_id, diagnosis_seq)
);

ICD-11 and Your Existing Lookups

Your current codebase likely has an icd10_codes reference table or joins to an external ICD-10 lookup. That reference table needs to be version-aware for a dual-code-version world.

CREATE TABLE diagnosis_reference (
  code_version     VARCHAR(10)   NOT NULL,   -- 'ICD-10-CM' | 'ICD-11'
  code             VARCHAR(20)   NOT NULL,
  description      VARCHAR(500)  NOT NULL,
  chapter          VARCHAR(200),
  block            VARCHAR(200),
  is_billable      BOOLEAN       DEFAULT TRUE,
  effective_dt   DATE,
  end_dt         DATE,
  PRIMARY KEY (code_version, code)
);

Use the mdatool ICD-10 Search tool to look up current ICD-10-CM codes while you plan your migration — and cross-reference which ICD-10 codes map to which ICD-11 stems using the WHO's official crosswalk tables.

Tired of legacy complexity and high pricing?

mdatool offers instant DDL conversion, HL7 support, and AI-driven data modeling for a fraction of the cost of ER/Studio or ERwin.

Try mdatool for Free

ICD-11 and HCC Risk Adjustment

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

ICD codes feed directly into HCC (Hierarchical Condition Category) risk scores. If you use HCC-based risk adjustment models, you will need to update your HCC mapping logic alongside your schema migration — ICD-11 stem codes do not map 1:1 to current HCC categories.

Use the mdatool HCC Calculator to validate current ICD-10-based HCC assignments while you document which HCC mappings will need to change under ICD-11.

While ERwin requires a complex setup for schema generation, you can generate clean DDL in seconds using our free converter.

Convert your first 5 DDLs — No Credit Card Required

Naming Your New Columns Consistently

When you add code_version, stem_code, and extension_codes columns, use consistent naming across all tables. The mdatool Naming Auditor can scan your DDL and flag columns that deviate from your naming convention before they reach production.

Transition Planning: Living With Both Versions

The US healthcare system will operate with ICD-10-CM and ICD-11 in parallel for years. Your pipeline needs to handle both. Key principles:

Always store code_version alongside every diagnosis code — never assume the version
Index on (code_version, stem_code) for efficient lookups
Never hardcode version-specific logic in application code — move it to reference tables
Validate at ingestion, not at query time — bad codes are cheaper to reject early

Summary

Action	Priority
Widen all diagnosis code columns to VARCHAR(20)	High — do now
Add code_version column to all diagnosis tables	High — do now
Redesign lookup tables to be version-aware	Medium
Plan stem + extension separation for clinical use cases	Medium
Update HCC mapping logic for ICD-11 stems	Low — monitor WHO timeline

ICD-11 migration is not imminent for US payers, but the schema changes needed are significant enough that starting to plan now — before the regulatory mandate — is the right call.

Explore mdatool's Data Dictionary for definitions of HCC, ICD, DRG, and other healthcare coding terms your team will need during this migration.

ICD-10 vs ICD-11: What Changes for Your Data Model