Introduction
Nightly batch is the wrong architecture for real-time clinical data. An ADT (Admit-Discharge-Transfer) event needs to trigger a care management alert within minutes — not appear in tomorrow morning's data warehouse load. A prior authorization approval needs to update the claims adjudication system in real time — not through a file drop that runs at 2 AM. The clinical data that drives care coordination, risk management, and real-time quality monitoring is fundamentally event-driven. Your data pipeline architecture should be too.
This guide covers the design of Kafka-based event-driven pipelines for healthcare, with specific patterns for FHIR R4 resources, HL7 v2 messages, and the schema governance that keeps clinical event streams reliable over time.
Why Event-Driven Matters for Healthcare
Healthcare generates events — not records. A patient encounter is not a row that gets inserted once. It is a stream of events: admit, transfer, discharge, lab ordered, result received, diagnosis assigned, medication prescribed, prior auth requested, prior auth approved, claim submitted, claim adjudicated. Each event has a timestamp, a source, and a downstream consumer.
Batch processing collapses this event stream into a point-in-time snapshot. Useful for analytics. Insufficient for:
- Care management alerts: A care manager needs to know when a high-risk member is admitted to the ED — not 12 hours later.
- Real-time eligibility: A provider portal checking eligibility at point of care needs sub-second response, not a nightly sync.
- Prior authorization: CMS's Interoperability Rule requires prior auth decisions within 72 hours (sometimes 24). Real-time PA event processing is no longer optional.
- Fraud detection: Claims fraud patterns are detectable in near-real-time if the data pipeline supports it.
Kafka Topic Design for Healthcare
Kafka topics in healthcare should be designed around FHIR resource types or HL7 message types — not around source systems. A topic per source system (epic-adt, cerner-labs, athena-encounters) creates a consumer nightmare when the same consumer needs all encounters regardless of source.
Recommended Topic Structure
# FHIR resource-based topics (source-agnostic)
healthcare.fhir.patient
healthcare.fhir.encounter
healthcare.fhir.condition
healthcare.fhir.observation # lab results, vitals
healthcare.fhir.medicationrequest
healthcare.fhir.claim
healthcare.fhir.explanationofbenefit
healthcare.fhir.coverageeligibilityrequest
healthcare.fhir.task # prior auth, referrals
# HL7 v2 topics (by message type)
healthcare.hl7v2.adt # ADT^A01 (admit), ADT^A03 (discharge), ADT^A08 (update)
healthcare.hl7v2.oru # ORU^R01 lab results
healthcare.hl7v2.orm # ORM^O01 order messages
# Domain event topics (business-level)
healthcare.events.member-enrolled
healthcare.events.claim-adjudicated
healthcare.events.prior-auth-decision
healthcare.events.risk-gap-identified
Use partitioning by patient ID (or member ID) to ensure that all events for a given patient land on the same partition. This is critical for stateful consumers that need to reconstruct patient-level event sequences.
# Kafka producer with patient-based partitioning from confluent_kafka import Producer producer = Producer({'bootstrap.servers': 'kafka-broker:9092'}) def publish_fhir_event(resource_type: str, fhir_resource: dict): topic = f"healthcare.fhir.{resource_type.lower()}" patient_id = extract_patient_id(fhir_resource) # implementation-specific producer.produce( topic=topic, key=patient_id.encode('utf-8'), # partition key = patient ID value=json.dumps(fhir_resource).encode('utf-8'), headers={ 'content-type': 'application/fhir+json', 'fhir-version': 'R4', 'source-system': 'EPIC', 'event-time': datetime.utcnow().isoformat() } ) producer.flush()
Schema Registry for HL7/FHIR
Schema management is the silent failure mode of healthcare event streams. Without a schema registry, a producer changes the FHIR Patient resource structure (adds a field, removes an extension) and downstream consumers break silently.
Use Confluent Schema Registry (or AWS Glue Schema Registry, or Apicurio) with Avro or JSON Schema formats.
FHIR Resource Schema in Avro
{ "type": "record", "name": "FHIRPatient", "namespace": "com.mdatool.healthcare.fhir", "fields": [ {"name": "resourceType", "type": "string"}, {"name": "id", "type": "string"}, {"name": "meta", "type": { "type": "record", "name": "Meta", "fields": [ {"name": "versionId", "type": ["null", "string"], "default": null}, {"name": "lastUpdated", "type": "string"} ] }}, {"name": "identifier", "type": { "type": "array", "items": { "type": "record", "name": "Identifier", "fields": [ {"name": "system", "type": "string"}, {"name": "value", "type": "string"} ] } }}, {"name": "name", "type": { "type": "array", "items": { "type": "record", "name": "HumanName", "fields": [ {"name": "family", "type": "string"}, {"name": "given", "type": {"type": "array", "items": "string"}} ] } }}, {"name": "birthDate", "type": ["null", "string"], "default": null}, {"name": "gender", "type": ["null", "string"], "default": null} ] }
Registering this schema enforces backwards compatibility — adding optional fields is allowed, removing or renaming fields requires a schema version bump.
HL7 v2 to FHIR Transformation Pattern
Most existing healthcare systems still produce HL7 v2 messages, not FHIR. The event pipeline must bridge both.
HL7 v2 Source (MLLP) → Kafka Connect MLLP Source → healthcare.hl7v2.adt topic
↓
Kafka Streams transformer
(HL7v2 → FHIR R4 mapping)
↓
healthcare.fhir.encounter topic
↓
[Consumer: Care Management Alerting]
[Consumer: Data Warehouse (Snowflake sink)]
[Consumer: FHIR Store (GCP Healthcare API)]
The HAPI FHIR library (Java) provides HL7 v2 → FHIR R4 conversion utilities. For Python-based pipelines, use the HL7apy or HL7-FHIR-R4-Core libraries.
Use the HL7 Parser to inspect raw HL7 v2 messages and validate their structure before they enter the Kafka pipeline — catching malformed ADT or ORU messages before they propagate into downstream FHIR stores.
Consumer Patterns
Care Management Alerting Consumer
Stateless consumer on healthcare.fhir.encounter and healthcare.hl7v2.adt. Filters for admit events (ADT^A01) for members flagged as high-risk. Publishes alerts to a separate healthcare.events.care-alert topic consumed by the care management workflow system.
Data Warehouse Sink Consumer
Uses Kafka Connect Snowflake Sink Connector (or BigQuery Sink Connector). Writes FHIR resource events to bronze tables. Schema-on-write using the registered Avro schema. Idempotent writes using the FHIR resource ID as the deduplication key.
FHIR Store Sync Consumer
Consumes the healthcare.fhir.* topics and writes resources to the GCP Cloud Healthcare API or Azure FHIR Service. This keeps the managed FHIR store synchronized with the event stream — providing both event-driven consumers and REST-based FHIR API access from the same source of truth.
Key Takeaways
- Design Kafka topics by FHIR resource type or HL7 message type, not by source system. Consumers care about data type, not data origin.
- Partition by patient ID to ensure per-patient event ordering for stateful consumers.
- A Schema Registry is non-negotiable for production healthcare event pipelines. FHIR structure changes without schema governance break consumers silently.
- HL7 v2 to FHIR transformation should happen in the Kafka Streams tier, not in every consumer. One transformation, many consumers.
- Validate HL7 message structure before publishing to Kafka using the HL7 Parser to catch malformed messages at the ingestion boundary.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
More in Data Architecture
TEFCA and Data Architecture: What Health Systems Need to Build Now
TEFCA is now operational. Qualified Health Information Networks are live. If your health system or payer is not actively planning TEFCA participation, you are behind. Here is what the architecture requires.
Read moreHealthcare Data Lakehouse Architecture: Building on Delta Lake for Payers
The data lakehouse pattern — combining the scalability of a data lake with the ACID guarantees of a warehouse — is a natural fit for payer data. Here is how to build it on Delta Lake, layer by layer.
Read moreReal-Time vs Batch Processing for Healthcare Claims: Architecture Decision Guide
Not every healthcare claims use case requires real-time processing — and treating them all the same wastes resources and adds complexity. Here is the decision framework for choosing the right architecture.
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.