mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data engineers & architects.

HomeBlogData ArchitectureEvent-Driven Architecture for Healthcare Data Pipelines (Kafka + FHIR)
Data Architecture

Event-Driven Architecture for Healthcare Data Pipelines (Kafka + FHIR)

Real-time clinical data demands event-driven architecture. ADT feeds, lab results, and prior auth events cannot wait for a nightly batch. Here is how to design Kafka-based pipelines for FHIR-native healthcare data.

mdatool Team·April 21, 2026·9 min read
event-driven architectureKafkaFHIRhealthcare data pipelinereal-timeHL7

Introduction

Nightly batch is the wrong architecture for real-time clinical data. An ADT (Admit-Discharge-Transfer) event needs to trigger a care management alert within minutes — not appear in tomorrow morning's data warehouse load. A prior authorization approval needs to update the claims adjudication system in real time — not through a file drop that runs at 2 AM. The clinical data that drives care coordination, risk management, and real-time quality monitoring is fundamentally event-driven. Your data pipeline architecture should be too.

📋

Free Tool

Parse this HL7 message →

This guide covers the design of Kafka-based event-driven pipelines for healthcare, with specific patterns for FHIR R4 resources, HL7 v2 messages, and the schema governance that keeps clinical event streams reliable over time.


Why Event-Driven Matters for Healthcare

Healthcare generates events — not records. A patient encounter is not a row that gets inserted once. It is a stream of events: admit, transfer, discharge, lab ordered, result received, diagnosis assigned, medication prescribed, prior auth requested, prior auth approved, claim submitted, claim adjudicated. Each event has a timestamp, a source, and a downstream consumer.

Batch processing collapses this event stream into a point-in-time snapshot. Useful for analytics. Insufficient for:

  • Care management alerts: A care manager needs to know when a high-risk member is admitted to the ED — not 12 hours later.
  • Real-time eligibility: A provider portal checking eligibility at point of care needs sub-second response, not a nightly sync.
  • Prior authorization: CMS's Interoperability Rule requires prior auth decisions within 72 hours (sometimes 24). Real-time PA event processing is no longer optional.
  • Fraud detection: Claims fraud patterns are detectable in near-real-time if the data pipeline supports it.

Kafka Topic Design for Healthcare

Kafka topics in healthcare should be designed around FHIR resource types or HL7 message types — not around source systems. A topic per source system (epic-adt, cerner-labs, athena-encounters) creates a consumer nightmare when the same consumer needs all encounters regardless of source.

Recommended Topic Structure

# FHIR resource-based topics (source-agnostic) healthcare.fhir.patient healthcare.fhir.encounter healthcare.fhir.condition healthcare.fhir.observation # lab results, vitals healthcare.fhir.medicationrequest healthcare.fhir.claim healthcare.fhir.explanationofbenefit healthcare.fhir.coverageeligibilityrequest healthcare.fhir.task # prior auth, referrals # HL7 v2 topics (by message type) healthcare.hl7v2.adt # ADT^A01 (admit), ADT^A03 (discharge), ADT^A08 (update) healthcare.hl7v2.oru # ORU^R01 lab results healthcare.hl7v2.orm # ORM^O01 order messages # Domain event topics (business-level) healthcare.events.member-enrolled healthcare.events.claim-adjudicated healthcare.events.prior-auth-decision healthcare.events.risk-gap-identified

Use partitioning by patient ID (or member ID) to ensure that all events for a given patient land on the same partition. This is critical for stateful consumers that need to reconstruct patient-level event sequences.

# Kafka producer with patient-based partitioning
from confluent_kafka import Producer

producer = Producer({'bootstrap.servers': 'kafka-broker:9092'})

def publish_fhir_event(resource_type: str, fhir_resource: dict):
    topic = f"healthcare.fhir.{resource_type.lower()}"
    patient_id = extract_patient_id(fhir_resource)  # implementation-specific

    producer.produce(
        topic=topic,
        key=patient_id.encode('utf-8'),     # partition key = patient ID
        value=json.dumps(fhir_resource).encode('utf-8'),
        headers={
            'content-type': 'application/fhir+json',
            'fhir-version': 'R4',
            'source-system': 'EPIC',
            'event-time': datetime.utcnow().isoformat()
        }
    )
    producer.flush()

Schema Registry for HL7/FHIR

Schema management is the silent failure mode of healthcare event streams. Without a schema registry, a producer changes the FHIR Patient resource structure (adds a field, removes an extension) and downstream consumers break silently.

Use Confluent Schema Registry (or AWS Glue Schema Registry, or Apicurio) with Avro or JSON Schema formats.

FHIR Resource Schema in Avro

{
  "type": "record",
  "name": "FHIRPatient",
  "namespace": "com.mdatool.healthcare.fhir",
  "fields": [
    {"name": "resourceType", "type": "string"},
    {"name": "id", "type": "string"},
    {"name": "meta", "type": {
      "type": "record",
      "name": "Meta",
      "fields": [
        {"name": "versionId", "type": ["null", "string"], "default": null},
        {"name": "lastUpdated", "type": "string"}
      ]
    }},
    {"name": "identifier", "type": {
      "type": "array",
      "items": {
        "type": "record",
        "name": "Identifier",
        "fields": [
          {"name": "system", "type": "string"},
          {"name": "value", "type": "string"}
        ]
      }
    }},
    {"name": "name", "type": {
      "type": "array",
      "items": {
        "type": "record",
        "name": "HumanName",
        "fields": [
          {"name": "family", "type": "string"},
          {"name": "given", "type": {"type": "array", "items": "string"}}
        ]
      }
    }},
    {"name": "birthDate", "type": ["null", "string"], "default": null},
    {"name": "gender", "type": ["null", "string"], "default": null}
  ]
}

Registering this schema enforces backwards compatibility — adding optional fields is allowed, removing or renaming fields requires a schema version bump.


HL7 v2 to FHIR Transformation Pattern

Most existing healthcare systems still produce HL7 v2 messages, not FHIR. The event pipeline must bridge both.

HL7 v2 Source (MLLP) → Kafka Connect MLLP Source → healthcare.hl7v2.adt topic ↓ Kafka Streams transformer (HL7v2 → FHIR R4 mapping) ↓ healthcare.fhir.encounter topic ↓ [Consumer: Care Management Alerting] [Consumer: Data Warehouse (Snowflake sink)] [Consumer: FHIR Store (GCP Healthcare API)]

The HAPI FHIR library (Java) provides HL7 v2 → FHIR R4 conversion utilities. For Python-based pipelines, use the HL7apy or HL7-FHIR-R4-Core libraries.

Use the HL7 Parser to inspect raw HL7 v2 messages and validate their structure before they enter the Kafka pipeline — catching malformed ADT or ORU messages before they propagate into downstream FHIR stores.


Consumer Patterns

Care Management Alerting Consumer

Stateless consumer on healthcare.fhir.encounter and healthcare.hl7v2.adt. Filters for admit events (ADT^A01) for members flagged as high-risk. Publishes alerts to a separate healthcare.events.care-alert topic consumed by the care management workflow system.

Data Warehouse Sink Consumer

Uses Kafka Connect Snowflake Sink Connector (or BigQuery Sink Connector). Writes FHIR resource events to bronze tables. Schema-on-write using the registered Avro schema. Idempotent writes using the FHIR resource ID as the deduplication key.

FHIR Store Sync Consumer

Consumes the healthcare.fhir.* topics and writes resources to the GCP Cloud Healthcare API or Azure FHIR Service. This keeps the managed FHIR store synchronized with the event stream — providing both event-driven consumers and REST-based FHIR API access from the same source of truth.


Key Takeaways

  • Design Kafka topics by FHIR resource type or HL7 message type, not by source system. Consumers care about data type, not data origin.
  • Partition by patient ID to ensure per-patient event ordering for stateful consumers.
  • A Schema Registry is non-negotiable for production healthcare event pipelines. FHIR structure changes without schema governance break consumers silently.
  • HL7 v2 to FHIR transformation should happen in the Kafka Streams tier, not in every consumer. One transformation, many consumers.
  • Validate HL7 message structure before publishing to Kafka using the HL7 Parser to catch malformed messages at the ingestion boundary.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

HL7 & FHIR Interoperability

HL7 message formats, FHIR resources, and healthcare data exchange standards.

Read Guide

More in Data Architecture

Azure Synapse vs Snowflake for Healthcare Data Architecture: Which Platform Fits Your Team?

Azure Synapse Analytics and Snowflake both promise a unified cloud data platform — but they make different architectural bets that matter enormously in healthcare. This guide compares them across HIPAA compliance, FHIR integration, PHI governance, cost model, and team fit, with concrete SQL examples and a decision framework built for healthcare data engineers.

Read more

Oracle vs Databricks for Healthcare Data Architecture: Which Platform Should You Choose?

Oracle brings four decades of enterprise database maturity, deep EHR integration, and a proven HIPAA compliance story. Databricks brings a unified lakehouse, native AI/ML pipelines, and the ability to handle FHIR, HL7, and unstructured clinical data at scale. This guide breaks down which platform wins in each healthcare scenario — and when you need both.

Read more

Telehealth Data Architecture: Complete Guide for Data Engineers (2026)

A complete guide to building a telehealth data architecture — core schema design, HL7 and FHIR integration, HIPAA compliance, HCC risk adjustment, and the common mistakes that cause claim denials.

Read more

Free Tools

Free HL7 v2 Parser

Paste any HL7 v2 message and decode every segment into labeled fields.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • Introduction
  • Why Event-Driven Matters for Healthcare
  • Kafka Topic Design for Healthcare
  • Recommended Topic Structure
  • Schema Registry for HL7/FHIR
  • FHIR Resource Schema in Avro
  • HL7 v2 to FHIR Transformation Pattern
  • Consumer Patterns
  • Care Management Alerting Consumer
  • Data Warehouse Sink Consumer
  • FHIR Store Sync Consumer
  • Key Takeaways

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools