Why ICD-11 Matters for Your Data Model
The World Health Organization officially released ICD-11 in 2019, and countries are actively migrating — with the United States expected to follow after ICD-10-CM is fully stabilized. For healthcare data engineers, this is not a business problem — it is a schema problem. ICD-11 changes code length, introduces new structural concepts, and forces a rethinking of how diagnosis data is stored, indexed, and queried.
This article breaks down exactly what changes, why it matters for your data model, and how to prepare now.
What Is ICD-10 and What Are Its Limits?
ICD-10-CM (the US clinical modification) uses alphanumeric codes up to 7 characters in the format A00.0XXA. Your current schema almost certainly stores these as VARCHAR(7) or CHAR(7).
Limitations that ICD-11 directly addresses:
- Fixed hierarchy: ICD-10 has a rigid parent-child tree that cannot express modern disease understanding (e.g., genomic conditions, post-COVID sequelae)
- Limited granularity: Many conditions share a single code with no way to encode severity, etiology, or manifestation separately
- No extension mechanism: Clinicians have to memorize workarounds for conditions the hierarchy cannot represent
How ICD-11 Changes the Code Structure
Code Format
ICD-11 uses a 4-character alphanumeric stem code followed by optional extension codes. The full expression can reach 16 characters or more when extension codes are concatenated.
| Property | ICD-10-CM | ICD-11 |
|---|---|---|
| Max stem code length | 7 chars | 4 chars |
| Extension codes | None | Yes (unlimited chaining) |
| Full code max length | 7 chars | 16+ chars |
| Character set | Alpha + numeric | Alpha + numeric (no I, O) |
| Hierarchy depth | Fixed | Flexible (multiple parents) |
| Post-coordination | No | Yes |
Impact on your schema: Any column storing a diagnosis code as VARCHAR(7) or CHAR(7) will truncate ICD-11 codes. You need to plan for at least VARCHAR(20) on any column that will hold ICD-11 post-coordinated expressions.
Post-Coordination
This is the biggest structural change. ICD-11 allows post-coordination — combining a stem code with extension codes to express a fully specified clinical meaning.
Example: A traumatic brain injury in a specific anatomical location, with a specific severity, caused by a specific mechanism, would previously require multiple separate codes. In ICD-11, this becomes one post-coordinated expression:
NA07.0 / XA5Q57 / XS0G / XE1XR
Storing this in a single VARCHAR(7) column is impossible. You need a redesigned schema.
Schema Changes You Will Need to Make
Option 1: Widen the Column (Minimum Change)
The simplest migration — safe for storing codes, not for querying sub-components:
ALTER TABLE encounters ALTER COLUMN diagnosis_code TYPE VARCHAR(20); ALTER TABLE claims ALTER COLUMN primary_diagnosis TYPE VARCHAR(20), ALTER COLUMN secondary_diagnosis_1 TYPE VARCHAR(20), ALTER COLUMN secondary_diagnosis_2 TYPE VARCHAR(20);
Use the mdatool DDL Converter to quickly rewrite your CREATE TABLE statements across dialects (Snowflake, Redshift, BigQuery, SQL Server) when applying this change.
Option 2: Separate Stem and Extension (Recommended)
For teams doing clinical analysis, store the stem and extensions separately:
CREATE TABLE encounter_diagnoses ( encounter_id VARCHAR(36) NOT NULL, diagnosis_seq SMALLINT NOT NULL, -- 1=primary, 2+=secondary code_version VARCHAR(10) NOT NULL, -- 'ICD-10-CM' | 'ICD-11' stem_code VARCHAR(10) NOT NULL, extension_codes VARCHAR(200), -- pipe-delimited or JSON full_expression VARCHAR(500), -- full post-coordinated string code_description VARCHAR(500), present_on_admit CHAR(1), PRIMARY KEY (encounter_id, diagnosis_seq) );
Option 3: JSON Column for Extension Codes
If your warehouse supports semi-structured data natively (Snowflake VARIANT, BigQuery JSON, Redshift SUPER):
CREATE TABLE encounter_diagnoses ( encounter_id VARCHAR(36) NOT NULL, diagnosis_seq SMALLINT NOT NULL, code_version VARCHAR(10) NOT NULL, stem_code VARCHAR(10) NOT NULL, extensions VARIANT, -- Snowflake; use JSON in BigQuery PRIMARY KEY (encounter_id, diagnosis_seq) );
ICD-11 and Your Existing Lookups
Your current codebase likely has an icd10_codes reference table or joins to an external ICD-10 lookup. That reference table needs to be version-aware for a dual-code-version world.
CREATE TABLE diagnosis_reference ( code_version VARCHAR(10) NOT NULL, -- 'ICD-10-CM' | 'ICD-11' code VARCHAR(20) NOT NULL, description VARCHAR(500) NOT NULL, chapter VARCHAR(200), block VARCHAR(200), is_billable BOOLEAN DEFAULT TRUE, effective_date DATE, end_date DATE, PRIMARY KEY (code_version, code) );
Use the mdatool ICD-10 Search tool to look up current ICD-10-CM codes while you plan your migration — and cross-reference which ICD-10 codes map to which ICD-11 stems using the WHO's official crosswalk tables.
ICD-11 and HCC Risk Adjustment
ICD codes feed directly into HCC (Hierarchical Condition Category) risk scores. If you use HCC-based risk adjustment models, you will need to update your HCC mapping logic alongside your schema migration — ICD-11 stem codes do not map 1:1 to current HCC categories.
Use the mdatool HCC Calculator to validate current ICD-10-based HCC assignments while you document which HCC mappings will need to change under ICD-11.
Naming Your New Columns Consistently
When you add code_version, stem_code, and extension_codes columns, use consistent naming across all tables. The mdatool Naming Auditor can scan your DDL and flag columns that deviate from your naming convention before they reach production.
Transition Planning: Living With Both Versions
The US healthcare system will operate with ICD-10-CM and ICD-11 in parallel for years. Your pipeline needs to handle both. Key principles:
- Always store
code_versionalongside every diagnosis code — never assume the version - Index on
(code_version, stem_code)for efficient lookups - Never hardcode version-specific logic in application code — move it to reference tables
- Validate at ingestion, not at query time — bad codes are cheaper to reject early
Summary
| Action | Priority |
|---|---|
| Widen all diagnosis code columns to VARCHAR(20) | High — do now |
| Add code_version column to all diagnosis tables | High — do now |
| Redesign lookup tables to be version-aware | Medium |
| Plan stem + extension separation for clinical use cases | Medium |
| Update HCC mapping logic for ICD-11 stems | Low — monitor WHO timeline |
ICD-11 migration is not imminent for US payers, but the schema changes needed are significant enough that starting to plan now — before the regulatory mandate — is the right call.
Explore mdatool's Data Dictionary for definitions of HCC, ICD, DRG, and other healthcare coding terms your team will need during this migration.
mdatool Team
The mdatool team builds free tools for healthcare data engineers — DDL converters, SQL linters, naming auditors, and data modeling guides.
More in Data Modeling
Data Models Don’t Break — Assumptions Do
Why data model fails in production environment? Vulnerable assumptions - historic data is static, data arrives in order and changes are forward only thinking.
Read moreLogical Data Models Explained: The Backbone of Enterprise Systems
Logical data models define how an enterprise understands its data. Learn why logical modeling is the foundation of scalable systems, reliable analytics, and long-term architectural success across industries.
Read moreLogical vs Physical Data Models: Why Enterprises Need Both?
Logical vs Physical Data Model, uses and benefits. Why physical data model should not exist without logical data model?
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.