Introduction
Nightly batch is the wrong architecture for real-time clinical data. An ADT (Admit-Discharge-Transfer) event needs to trigger a care management alert within minutes — not appear in tomorrow morning's data warehouse load. A prior authorization approval needs to update the claims adjudication system in real time — not through a file drop that runs at 2 AM. The clinical data that drives care coordination, risk management, and real-time quality monitoring is fundamentally event-driven. Your data pipeline architecture should be too.
📋Free Tool
Parse this HL7 message →
This guide covers the design of Kafka-based event-driven pipelines for healthcare, with specific patterns for FHIR R4 resources, HL7 v2 messages, and the schema governance that keeps clinical event streams reliable over time.
Why Event-Driven Matters for Healthcare
Healthcare generates events — not records. A patient encounter is not a row that gets inserted once. It is a stream of events: admit, transfer, discharge, lab ordered, result received, diagnosis assigned, medication prescribed, prior auth requested, prior auth approved, claim submitted, claim adjudicated. Each event has a timestamp, a source, and a downstream consumer.
Batch processing collapses this event stream into a point-in-time snapshot. Useful for analytics. Insufficient for:
- Care management alerts: A care manager needs to know when a high-risk member is admitted to the ED — not 12 hours later.
- Real-time eligibility: A provider portal checking eligibility at point of care needs sub-second response, not a nightly sync.
- Prior authorization: CMS's Interoperability Rule requires prior auth decisions within 72 hours (sometimes 24). Real-time PA event processing is no longer optional.
- Fraud detection: Claims fraud patterns are detectable in near-real-time if the data pipeline supports it.
Kafka Topic Design for Healthcare
Kafka topics in healthcare should be designed around FHIR resource types or HL7 message types — not around source systems. A topic per source system (epic-adt, cerner-labs, athena-encounters) creates a consumer nightmare when the same consumer needs all encounters regardless of source.
Recommended Topic Structure
# FHIR resource-based topics (source-agnostic)
healthcare.fhir.patient
healthcare.fhir.encounter
healthcare.fhir.condition
healthcare.fhir.observation # lab results, vitals
healthcare.fhir.medicationrequest
healthcare.fhir.claim
healthcare.fhir.explanationofbenefit
healthcare.fhir.coverageeligibilityrequest
healthcare.fhir.task # prior auth, referrals
# HL7 v2 topics (by message type)
healthcare.hl7v2.adt # ADT^A01 (admit), ADT^A03 (discharge), ADT^A08 (update)
healthcare.hl7v2.oru # ORU^R01 lab results
healthcare.hl7v2.orm # ORM^O01 order messages
# Domain event topics (business-level)
healthcare.events.member-enrolled
healthcare.events.claim-adjudicated
healthcare.events.prior-auth-decision
healthcare.events.risk-gap-identified
Use partitioning by patient ID (or member ID) to ensure that all events for a given patient land on the same partition. This is critical for stateful consumers that need to reconstruct patient-level event sequences.
# Kafka producer with patient-based partitioning
from confluent_kafka import Producer
producer = Producer({'bootstrap.servers': 'kafka-broker:9092'})
def publish_fhir_event(resource_type: str, fhir_resource: dict):
topic = f"healthcare.fhir.{resource_type.lower()}"
patient_id = extract_patient_id(fhir_resource) # implementation-specific
producer.produce(
topic=topic,
key=patient_id.encode('utf-8'), # partition key = patient ID
value=json.dumps(fhir_resource).encode('utf-8'),
headers={
'content-type': 'application/fhir+json',
'fhir-version': 'R4',
'source-system': 'EPIC',
'event-time': datetime.utcnow().isoformat()
}
)
producer.flush()
Schema Registry for HL7/FHIR
Schema management is the silent failure mode of healthcare event streams. Without a schema registry, a producer changes the FHIR Patient resource structure (adds a field, removes an extension) and downstream consumers break silently.
Use Confluent Schema Registry (or AWS Glue Schema Registry, or Apicurio) with Avro or JSON Schema formats.
FHIR Resource Schema in Avro
{
"type": "record",
"name": "FHIRPatient",
"namespace": "com.mdatool.healthcare.fhir",
"fields": [
{"name": "resourceType", "type": "string"},
{"name": "id", "type": "string"},
{"name": "meta", "type": {
"type": "record",
"name": "Meta",
"fields": [
{"name": "versionId", "type": ["null", "string"], "default": null},
{"name": "lastUpdated", "type": "string"}
]
}},
{"name": "identifier", "type": {
"type": "array",
"items": {
"type": "record",
"name": "Identifier",
"fields": [
{"name": "system", "type": "string"},
{"name": "value", "type": "string"}
]
}
}},
{"name": "name", "type": {
"type": "array",
"items": {
"type": "record",
"name": "HumanName",
"fields": [
{"name": "family", "type": "string"},
{"name": "given", "type": {"type": "array", "items": "string"}}
]
}
}},
{"name": "birthDate", "type": ["null", "string"], "default": null},
{"name": "gender", "type": ["null", "string"], "default": null}
]
}
Registering this schema enforces backwards compatibility — adding optional fields is allowed, removing or renaming fields requires a schema version bump.
HL7 v2 to FHIR Transformation Pattern
Most existing healthcare systems still produce HL7 v2 messages, not FHIR. The event pipeline must bridge both.
HL7 v2 Source (MLLP) → Kafka Connect MLLP Source → healthcare.hl7v2.adt topic
↓
Kafka Streams transformer
(HL7v2 → FHIR R4 mapping)
↓
healthcare.fhir.encounter topic
↓
[Consumer: Care Management Alerting]
[Consumer: Data Warehouse (Snowflake sink)]
[Consumer: FHIR Store (GCP Healthcare API)]
The HAPI FHIR library (Java) provides HL7 v2 → FHIR R4 conversion utilities. For Python-based pipelines, use the HL7apy or HL7-FHIR-R4-Core libraries.
Use the HL7 Parser to inspect raw HL7 v2 messages and validate their structure before they enter the Kafka pipeline — catching malformed ADT or ORU messages before they propagate into downstream FHIR stores.
Consumer Patterns
Care Management Alerting Consumer
Stateless consumer on healthcare.fhir.encounter and healthcare.hl7v2.adt. Filters for admit events (ADT^A01) for members flagged as high-risk. Publishes alerts to a separate healthcare.events.care-alert topic consumed by the care management workflow system.
Data Warehouse Sink Consumer
Uses Kafka Connect Snowflake Sink Connector (or BigQuery Sink Connector). Writes FHIR resource events to bronze tables. Schema-on-write using the registered Avro schema. Idempotent writes using the FHIR resource ID as the deduplication key.
FHIR Store Sync Consumer
Consumes the healthcare.fhir.* topics and writes resources to the GCP Cloud Healthcare API or Azure FHIR Service. This keeps the managed FHIR store synchronized with the event stream — providing both event-driven consumers and REST-based FHIR API access from the same source of truth.
Key Takeaways
- Design Kafka topics by FHIR resource type or HL7 message type, not by source system. Consumers care about data type, not data origin.
- Partition by patient ID to ensure per-patient event ordering for stateful consumers.
- A Schema Registry is non-negotiable for production healthcare event pipelines. FHIR structure changes without schema governance break consumers silently.
- HL7 v2 to FHIR transformation should happen in the Kafka Streams tier, not in every consumer. One transformation, many consumers.
- Validate HL7 message structure before publishing to Kafka using the HL7 Parser to catch malformed messages at the ingestion boundary.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
More in Data Architecture
Azure Synapse vs Snowflake for Healthcare Data Architecture: Which Platform Fits Your Team?
Azure Synapse Analytics and Snowflake both promise a unified cloud data platform — but they make different architectural bets that matter enormously in healthcare. This guide compares them across HIPAA compliance, FHIR integration, PHI governance, cost model, and team fit, with concrete SQL examples and a decision framework built for healthcare data engineers.
Read moreOracle vs Databricks for Healthcare Data Architecture: Which Platform Should You Choose?
Oracle brings four decades of enterprise database maturity, deep EHR integration, and a proven HIPAA compliance story. Databricks brings a unified lakehouse, native AI/ML pipelines, and the ability to handle FHIR, HL7, and unstructured clinical data at scale. This guide breaks down which platform wins in each healthcare scenario — and when you need both.
Read moreTelehealth Data Architecture: Complete Guide for Data Engineers (2026)
A complete guide to building a telehealth data architecture — core schema design, HL7 and FHIR integration, HIPAA compliance, HCC risk adjustment, and the common mistakes that cause claim denials.
Read moreFree Tools
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.