You spent months designing a logical [[[data model](/tools/modeling)](/tools/modeling)](/tools/modeling). Your ERD is pristine. Every entity has clear relationships. Your business stakeholders signed off. Your DBA implemented it flawlessly in Snowflake. The data warehouse is live.
Then the first quarterly metrics report lands on the CFO's desk. Member attribution is off by 15%. Claim counts don't match finance. Provider network metrics contradict what operations sees in the [[[EHR](/terms/EHR)](/terms/EHR)](https://www.mdatool.com/glossary#EHR). Your "perfect" model just produced garbage metrics.
What happened?
The uncomfortable truth: logical models don't fail because of bad design. They fail because they ignore the brutal realities of healthcare data operations. Here's why even the best logical models produce bad metrics—and how to fix it.
The Core Problem: Logical Models Ignore Data Lineage
Healthcare data doesn't arrive clean. It arrives as:
- [[EDI 837](/terms/EDI%20837)](/terms/EDI%20837) claims with 50+ segments, loops, and conditional fields
- HL7 ADT feeds with patient demographic updates buried in PID segments
- Eligibility files from carriers with retroactive effective dates
- Provider roster updates that contradict yesterday's [NPI](https://www.mdatool.com/glossary#NPI) registry snapshot
Your logical model says: Member → Claim → Provider. Clean. Simple. Wrong.
Reality says: Member records get updated 6 times during a claim lifecycle. Claims get reversed, reprocessed, and resubmitted. Provider addresses change mid-contract but old addresses still need to route payments.
The gap? Your logical model doesn't account for temporal validity, source system precedence, or reconciliation logic.
Example: The Member Attribution Disaster
The Logical Model (Looks Perfect)
CREATE TABLE member (
mmbr_id VARCHAR(50) PRIMARY KEY,
first_nm VARCHAR(100),
last_nm VARCHAR(100),
date_of_birth DATE,
gender_cd CHAR(1),
primary_care_prvdr_id VARCHAR(50)
);
The Reality (Data Hell)
Your member data comes from 4 sources:
- Eligibility file (daily snapshot from carrier)
- Claims data (member name as billed by provider)
- Provider portal (patient self-registration)
- EHR feed (demographic updates from clinic EMR)
All four disagree on:
- Name spelling ("Mary Smith" vs "Marie Smythe")
- Date of birth (off by one day due to timezone differences)
- PCP assignment (eligibility shows Dr. Jones, claims show Dr. Smith)
Your metrics report:
- Member count: 47,315 (eligibility says 48,200)
- Members with PCP: 39,104 (provider portal says 41,892)
- CFO asks: "Which number is right?"
Your logical model has no answer because it never defined:
- Source system hierarchy (which source wins when there's conflict?)
- Effective dating (when did the PCP assignment change?)
- Match/merge rules (how do we know Mary Smith = Marie Smythe?)
Fix #1: Add Source System Lineage to Your Logical Model
Don't model entities. Model SOURCE RECORDS.
Updated Logical Model
CREATE TABLE member_source (
mmbr_src_id VARCHAR(50) PRIMARY KEY,
mmbr_id VARCHAR(50),
source_system_cd VARCHAR(20),
source_record_id VARCHAR(100),
first_nm VARCHAR(100),
last_nm VARCHAR(100),
date_of_birth DATE,
gender_cd CHAR(1),
primary_care_prvdr_id VARCHAR(50),
effective_dt DATE,
end_dt DATE,
is_current_flag CHAR(1),
load_dttm TIMESTAMP,
source_priority INT
);
CREATE TABLE member_master (
mmbr_id VARCHAR(50) PRIMARY KEY,
golden_first_nm VARCHAR(100),
golden_last_nm VARCHAR(100),
golden_dob DATE,
golden_gender_cd CHAR(1),
golden_pcp_id VARCHAR(50),
last_updated_dttm TIMESTAMP
);
Fix #2: Model Temporal Validity, Not Static Snapshots
Healthcare data changes constantly. Your logical model needs bi-temporal tables.
CREATE TABLE claim_history (
clm_history_id VARCHAR(50) PRIMARY KEY,
clm_id VARCHAR(50),
version_nbr INT,
mmbr_id VARCHAR(50),
total_billed_amt DECIMAL(10,2),
total_paid_amt DECIMAL(10,2),
claim_status_cd VARCHAR(20),
effective_dt DATE,
end_dt DATE,
valid_from_dttm TIMESTAMP,
valid_to_dttm TIMESTAMP,
is_current_flag CHAR(1)
);
Fix #3: Model Data Quality Rules as First-Class Entities
CREATE TABLE data_quality_rule (
rule_id VARCHAR(50) PRIMARY KEY,
rule_nm VARCHAR(200),
rule_category_cd VARCHAR(50),
entity_nm VARCHAR(100),
severity_cd VARCHAR(20)
);
CREATE TABLE data_quality_violation (
violation_id VARCHAR(50) PRIMARY KEY,
rule_id VARCHAR(50),
entity_id VARCHAR(100),
violation_dttm TIMESTAMP,
resolved_flag CHAR(1)
);
Fix #4: Model Reconciliation Logic
CREATE TABLE reconciliation_control (
recon_id VARCHAR(50) PRIMARY KEY,
recon_nm VARCHAR(200),
source_system_cd VARCHAR(50),
target_entity_nm VARCHAR(100),
tolerance_pct DECIMAL(5,2)
);
The Checklist
✅ Source System Lineage
- Every entity tracks which source system it came from
- Conflicting sources have documented precedence rules
✅ Temporal Validity
- Entities with changing values use effective dating
- Historical versions are preserved
✅ Data Quality
- Validation rules are documented
- Violations are tracked and reportable
✅ Reconciliation
- External reconciliation points are identified
- Variance thresholds are defined
Conclusion
The best logical models in healthcare don't just show entities and relationships. They show:
- Where data comes from (source lineage)
- When data changed (temporal validity)
- What makes data valid (quality rules)
- How metrics reconcile (external validation)
Your "perfect" logical model produces bad metrics because it models the business as you wish it worked—not as it actually operates.
Fix the model. Fix the metrics.
Related Guides
Key Terms in This Article
More in Healthcare Data Modeling
AI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in 30 Seconds
Healthcare data architects spend days designing schemas from scratch — Medicare Advantage claims warehouses alone require 20+ tables, hundreds of columns, and platform-specific syntax. The mdatool AI Data Modeling tool generates a production-ready Star Schema for Snowflake, BigQuery, or Databricks in 30 seconds, with ISO-11179 standard column names built in.
Read moreEnforcing ISO-11179 Healthcare Naming Standards in dbt Projects
Every healthcare data warehouse eventually develops naming drift — DOB in one model, birth_dt in another, member_birth_date in a third. The dbt-healthcare-standards package brings ISO-11179 column naming directly into your dbt project as installable macros and schema tests.
Read moreAI-Powered Healthcare Data Modeling: Generate Production-Ready Schemas in Seconds
Healthcare data models take weeks to design manually — HIPAA constraints, CMS reporting requirements, and ISO-11179 naming standards all have to be applied correctly from the start. AI data modeling changes that. Here is how to generate production-ready schemas for Snowflake, BigQuery, and Databricks in seconds.
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.