Healthcare Analytics Guide
Complete guide to healthcare data warehousing, clinical analytics, population health metrics, and business intelligence for data-driven healthcare organizations.
Healthcare Analytics Overview
Healthcare analytics transforms clinical, operational, and financial data into actionable insights. Modern healthcare organizations use analytics for population health management, quality improvement, cost reduction, and value-based care initiatives.
A healthcare data warehouse integrates data from EHRs, claims systems, labs, pharmacies, and external sources to create a single source of truth for reporting and analytics.
Healthcare Data Warehouse Architecture
-- Dimensional Model (Star Schema) -- Dimension: Patient CREATE TABLE dim_patient ( patient_key SERIAL PRIMARY KEY, patient_id UUID UNIQUE NOT NULL, mrn VARCHAR(50), age INTEGER, age_group VARCHAR(20), -- 0-17, 18-44, 45-64, 65+ gender VARCHAR(20), race VARCHAR(100), ethnicity VARCHAR(100), zip_code VARCHAR(10), county VARCHAR(100), state VARCHAR(2), -- SCD Type 2 fields effective_dt DATE NOT NULL, expiration_dt DATE, is_current BOOLEAN DEFAULT TRUE, loaded_at TIMESTAMP DEFAULT NOW() ); -- Dimension: Provider CREATE TABLE dim_provider ( prov_key SERIAL PRIMARY KEY, prov_id UUID UNIQUE NOT NULL, npi VARCHAR(10), prov_nm VARCHAR(255), specialty VARCHAR(100), specialty_group VARCHAR(100), organization VARCHAR(255), effective_dt DATE NOT NULL, expiration_dt DATE, is_current BOOLEAN DEFAULT TRUE, loaded_at TIMESTAMP DEFAULT NOW() ); -- Dimension: Diagnosis CREATE TABLE dim_diagnosis ( diagnosis_key SERIAL PRIMARY KEY, icd10_code VARCHAR(10) NOT NULL, icd10_desc VARCHAR(500), icd10_category VARCHAR(100), chronic_flag BOOLEAN, loaded_at TIMESTAMP DEFAULT NOW() ); -- Dimension: Date CREATE TABLE dim_dt ( date_key INTEGER PRIMARY KEY, -- YYYYMMDD format full_dt DATE UNIQUE NOT NULL, year INTEGER, quarter INTEGER, month INTEGER, month_nm VARCHAR(20), day INTEGER, day_of_week INTEGER, day_nm VARCHAR(20), week_of_year INTEGER, is_weekend BOOLEAN, is_holiday BOOLEAN, fiscal_year INTEGER, fiscal_quarter INTEGER ); -- Fact Table: Encounters CREATE TABLE fact_encounter ( encounter_key SERIAL PRIMARY KEY, -- Foreign Keys (Dimensions) patient_key INTEGER REFERENCES dim_patient(patient_key), prov_key INTEGER REFERENCES dim_provider(prov_key), admission_dt_key INTEGER REFERENCES dim_dt(date_key), discharge_dt_key INTEGER REFERENCES dim_dt(date_key), -- Degenerate Dimensions encounter_id UUID NOT NULL, encounter_type VARCHAR(100), visit_type VARCHAR(100), -- Measures length_of_stay INTEGER, total_charges DECIMAL(10,2), readmission_30day BOOLEAN, er_visit BOOLEAN, loaded_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_fact_enc_patient ON fact_encounter(patient_key); CREATE INDEX idx_fact_enc_provider ON fact_encounter(prov_key); CREATE INDEX idx_fact_enc_admit_dt ON fact_encounter(admission_dt_key); -- Fact Table: Quality Measures CREATE TABLE fact_quality_measure ( measure_key SERIAL PRIMARY KEY, patient_key INTEGER REFERENCES dim_patient(patient_key), prov_key INTEGER REFERENCES dim_provider(prov_key), measurement_dt_key INTEGER REFERENCES dim_dt(date_key), -- Measure Details measure_nm VARCHAR(100), -- HEDIS, CMS Star, etc. measure_category VARCHAR(100), -- Results numerator_flag BOOLEAN, -- Met measure? denominator_flag BOOLEAN, -- In measure population? exclusion_flag BOOLEAN, loaded_at TIMESTAMP DEFAULT NOW() );
Key Healthcare Metrics & KPIs
Clinical Quality Metrics
- • HEDIS Measures: Diabetes HbA1c control, breast cancer screening
- • Hospital-Acquired Conditions: C. diff, CAUTI, CLABSI, falls
- • Readmission Rates: 30-day all-cause, condition-specific
- • Mortality Rates: Risk-adjusted mortality by condition
Population Health Metrics
- • Risk Stratification: HCC risk scores, care gaps
- • Chronic Disease Management: Diabetes, CHF, COPD prevalence & control
- • Preventive Care: Screening rates, immunization coverage
- • Social Determinants: SDOH factors, community health metrics
Operational Metrics
- • Utilization: Bed occupancy, OR utilization, ER wait times
- • Patient Flow: Length of stay, discharge delays, throughput
- • Provider Productivity: RVUs, encounters per day, panel size
- • Patient Experience: HCAHPS scores, patient satisfaction
Analytics SQL Examples
30-Day Readmission Rate
WITH readmissions AS (
SELECT
e1.patient_key,
e1.discharge_dt_key,
CASE WHEN EXISTS (
SELECT 1 FROM fact_encounter e2
WHERE e2.patient_key = e1.patient_key
AND e2.admission_dt_key BETWEEN e1.discharge_dt_key AND e1.discharge_dt_key + 30
AND e2.encounter_key != e1.encounter_key
) THEN 1 ELSE 0 END as readmitted_30day
FROM fact_encounter e1
WHERE e1.encounter_type = 'inpatient'
AND e1.discharge_dt_key >= 20240101
)
SELECT
COUNT(*) as total_discharges,
SUM(readmitted_30day) as readmissions,
ROUND(100.0 * SUM(readmitted_30day) / COUNT(*), 2) as readmission_rate
FROM readmissions;Diabetes HbA1c Control Rate (HEDIS)
SELECT
dp.specialty,
COUNT(DISTINCT fq.patient_key) as diabetic_patients,
SUM(CASE WHEN fq.numerator_flag THEN 1 ELSE 0 END) as controlled_patients,
ROUND(100.0 * SUM(CASE WHEN fq.numerator_flag THEN 1 ELSE 0 END) /
COUNT(DISTINCT fq.patient_key), 2) as control_rate
FROM fact_quality_measure fq
JOIN dim_provider dp ON fq.prov_key = dp.prov_key
WHERE fq.measure_nm = 'Diabetes HbA1c Control'
AND fq.denominator_flag = TRUE
AND dp.is_current = TRUE
GROUP BY dp.specialty
ORDER BY control_rate DESC;High-Risk Patient Cohort Analysis
SELECT p.age_group, COUNT(DISTINCT e.patient_key) as patient_count, AVG(e.length_of_stay) as avg_los, SUM(e.total_charges) as total_charges, AVG(e.total_charges) as avg_charges, SUM(CASE WHEN e.readmission_30day THEN 1 ELSE 0 END) as readmissions FROM fact_encounter e JOIN dim_patient p ON e.patient_key = p.patient_key WHERE p.is_current = TRUE AND e.encounter_type = 'inpatient' AND e.admission_dt_key >= 20240101 GROUP BY p.age_group ORDER BY avg_charges DESC;
ETL Best Practices
- •Incremental Loading: Load only changed data using CDC or timestamps
- •Data Quality Checks: Validate completeness, accuracy, consistency
- •SCD Type 2: Track historical changes in dimensions
- •Surrogate Keys: Use integer keys for better join performance
- •Audit Tables: Log all ETL runs with row counts and timestamps
Related Healthcare Guides
From the Blog
Databricks vs Snowflake for Healthcare Data: Complete 2026 Comparison
10 min readRead Healthcare Data ModelingHow to Model Social Determinants of Health (SDOH) Data in Your Warehouse
8 min readRead Data ArchitectureSchema Drift: The Silent Killer of Analytics Trust
5 min readReadBrowse all healthcare data guides