BlogData ArchitectureReal-Time vs Batch Processing for Healthcare Claims: Architecture Decision Guide
Data Architecture

Real-Time vs Batch Processing for Healthcare Claims: Architecture Decision Guide

Not every healthcare claims use case requires real-time processing — and treating them all the same wastes resources and adds complexity. Here is the decision framework for choosing the right architecture.

mdatool Team·April 21, 2026·8 min read
real-time processingbatch processinghealthcare claimsKafkaSparkAirflowdbt

Introduction

The wrong question for healthcare claims data is "should we do real-time or batch?" The right question is "what is the latency requirement for this specific use case?" Claims adjudication requires sub-second decisioning. [HEDIS](/terms/HEDIS) measure calculation tolerates a 24-hour lag. Fraud detection needs near-real-time. Financial close reporting can wait for a nightly load. Conflating these into a single architecture produces either an over-engineered streaming system for use cases that do not need it or an under-powered batch system that cannot support the ones that do.

This guide provides a decision framework for matching claims processing architecture to actual latency requirements.


Claims Use Cases by Latency Requirement

Use CaseLatency RequirementWhy
Real-time adjudication< 500msProvider needs immediate eligibility/benefit determination at point of care
Fraud detection1–5 minutesDetecting fraud patterns before claim approval
Prior auth status< 60 secondsProvider portal checking PA status in real time
ED alert / care management5–15 minutesTrigger outreach within hours of high-risk admit
Denial management2–4 hoursSame-day rework queue for operational teams
Financial reconciliation24 hoursDaily close, not real-time
HEDIS measure calculation24–48 hoursMeasure logic requires complete adjudication + lab + pharmacy
Risk adjustment scoring48–72 hoursHCC mapping requires validated diagnosis codes
Regulatory reportingDays–weeksRADV, EDGE server submissions are periodic

The pattern is clear: operational and clinical use cases require low latency; analytics and reporting use cases tolerate higher latency. Build your architecture to match this distribution, not to minimize latency across the board.


When Real-Time Streaming is the Right Choice

Use real-time streaming (Kafka, Kinesis, Pub/Sub) for claims when the business outcome depends on acting within minutes of an event:

Fraud detection: Real-time streaming enables pattern matching against incoming claims before adjudication. Detecting a provider billing the same CPT code 50 times in one hour requires stream processing — a batch system will approve the claims before the fraud pattern is visible.

Care management alerting: When a high-risk member's claim signals an ED visit, the care management team needs a near-real-time alert. Kafka consumers on the claims event stream can trigger care alerts within minutes of claim receipt.

Real-time eligibility and benefits: A provider eligibility check at the point of service queries the claims system in real time. The response must come back before the patient leaves the desk.

Streaming Technology Options

# Kafka Streams fraud detection pattern (pseudocode)
from confluent_kafka import Consumer, Producer

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'fraud-detection-consumer',
    'auto.offset.reset': 'latest'
})
consumer.subscribe(['healthcare.claims.submitted'])

# Count claims per provider per 1-hour tumbling window
provider_claim_counts = {}

for message in consumer.poll_loop():
    claim = json.loads(message.value())
    provider_npi = claim['billing_provider_npi']
    window_key = f"{provider_npi}:{hour_bucket(claim['received_timestamp'])}"

    provider_claim_counts[window_key] = provider_claim_counts.get(window_key, 0) + 1

    if provider_claim_counts[window_key] > FRAUD_THRESHOLD:
        flag_for_investigation(claim, 'HIGH_VOLUME_BILLING')

When Batch Processing is the Right Choice

Use batch processing (Airflow + dbt, Spark batch, Glue) for claims when the analysis requires complete, consistent data that only exists after a window closes:

HEDIS measures: HEDIS denominator and numerator logic joins claims, lab results, pharmacy dispensing, and encounter data. This data arrives through multiple pipelines with different latencies. A streaming HEDIS calculation on incomplete data produces wrong answers. Run HEDIS after the claims month close, when all data is present.

HCC risk adjustment: HCC coding requires validated ICD-10 diagnoses from adjudicated claims. Preliminary (not-yet-adjudicated) claims can carry incorrect codes that affect RAF score calculations. Batch processing after adjudication, with medical record validation workflows, produces defensible scores.

Financial close: Monthly and quarterly financial statements require fully adjudicated, reconciled claim data. Streaming does not add value here — the finance team has a defined close process that operates on batch data.

dbt Pipeline for Claims Analytics

-- dbt model: claims silver layer (standardized adjudicated claims)
-- models/silver/claims_adjudicated.sql
{{ config(materialized='incremental', unique_key='claim_id', partition_by={'field': 'service_year_month', 'data_type': 'date'}) }}

SELECT
  c.claim_id,
  c.enterprise_member_id,
  c.billing_provider_npi,
  c.rendering_provider_npi,
  c.service_from_date,
  c.service_to_date,
  DATE_TRUNC('month', c.service_from_date) AS service_year_month,
  c.icd10_primary_dx,
  c.icd10_secondary_dx_1,
  c.icd10_secondary_dx_2,
  c.primary_cpt_code,
  c.claim_type,               -- PROFESSIONAL, INSTITUTIONAL, DENTAL
  c.place_of_service_code,
  c.adjudication_status,      -- PAID, DENIED, PENDED
  c.paid_amount,
  c.allowed_amount,
  c.member_liability,
  c.denial_reason_code,
  CURRENT_TIMESTAMP           AS dbt_updated_at
FROM {{ source('claims_bronze', 'raw_claim_header') }} c
WHERE c.adjudication_status = 'PAID'
  AND c.service_from_date >= '2020-01-01'

{% if is_incremental() %}
  AND c.adjudicated_at > (SELECT MAX(dbt_updated_at) FROM {{ this }})
{% endif %}

The Decision Framework

Ask these three questions:

1. What is the business action triggered by this data? If the action must happen within minutes (care alert, fraud block, eligibility check), you need streaming. If the action can wait until tomorrow (HEDIS report, financial dashboard, risk score update), batch is appropriate.

2. Does the analysis require data completeness? HEDIS, HCC scoring, and financial reconciliation all require complete data windows. Incomplete streaming data produces wrong answers for these use cases. Use batch for analyses that require completeness guarantees.

3. What is the operational cost of streaming? Real-time streaming requires always-on infrastructure, more complex failure handling, and ongoing monitoring. If the business value does not justify the operational overhead, batch wins on cost and simplicity.


Hybrid Architecture: Streaming Ingest, Batch Transformation

The most common production pattern in healthcare:

Claim submitted → Kafka (real-time) → Fraud detection consumer → Flag/approve
                                    → Operational DB consumer → Real-time eligibility
                                    → S3/GCS landing zone (raw storage)
                                             ↓ (batch, hourly or daily)
                                    Airflow → Spark/dbt → Data warehouse (silver/gold)
                                             → HEDIS calculation
                                             → HCC risk scoring
                                             → Financial reconciliation

Stream everything into a durable landing zone. Apply real-time processing only to the consumers that need it. Run batch transformations against the complete dataset for analytics and reporting.


Key Takeaways

  • Match latency architecture to the business action, not to technology trend. Most healthcare claims analytics tolerates 24-hour batch.
  • Fraud detection, care management alerting, and real-time eligibility require streaming. HEDIS, HCC scoring, and financial reporting require batch.
  • The hybrid architecture (streaming ingest, batch transformation) is the most common and most practical production pattern.
  • dbt handles batch transformation elegantly for claims analytics — incremental models with partition-based processing scale well for multi-year claims data.
  • Before deploying claims SQL to your warehouse, validate query logic and catch anti-patterns with the SQL Linter.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free