mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data engineers & architects.

HomeBlogData ArchitectureReal-Time vs Batch Processing for Healthcare Claims: Architecture Decision Guide
Data Architecture

Real-Time vs Batch Processing for Healthcare Claims: Architecture Decision Guide

Not every healthcare claims use case requires real-time processing — and treating them all the same wastes resources and adds complexity. Here is the decision framework for choosing the right architecture.

mdatool Team·April 21, 2026·8 min read
real-time processingbatch processinghealthcare claimsKafkaSparkAirflowdbt

Introduction

The wrong question for healthcare claims data is "should we do real-time or batch?" The right question is "what is the latency requirement for this specific use case?" Claims adjudication requires sub-second decisioning. [HEDIS](/terms/HEDIS) measure calculation tolerates a 24-hour lag. Fraud detection needs near-real-time. Financial close reporting can wait for a nightly load. Conflating these into a single architecture produces either an over-engineered streaming system for use cases that do not need it or an under-powered batch system that cannot support the ones that do.

This guide provides a decision framework for matching claims processing architecture to actual latency requirements.


Claims Use Cases by Latency Requirement

Use CaseLatency RequirementWhy
Real-time adjudication< 500msProvider needs immediate eligibility/benefit determination at point of care
Fraud detection1–5 minutesDetecting fraud patterns before claim approval
Prior auth status< 60 secondsProvider portal checking PA status in real time
ED alert / care management5–15 minutesTrigger outreach within hours of high-risk admit
Denial management2–4 hoursSame-day rework queue for operational teams
Financial reconciliation24 hoursDaily close, not real-time
HEDIS measure calculation24–48 hoursMeasure logic requires complete adjudication + lab + pharmacy
Risk adjustment scoring48–72 hoursHCC mapping requires validated diagnosis codes
Regulatory reportingDays–weeksRADV, EDGE server submissions are periodic
📊

Free Tool

Calculate RAF scores with our free HCC Calculator →

The pattern is clear: operational and clinical use cases require low latency; analytics and reporting use cases tolerate higher latency. Build your architecture to match this distribution, not to minimize latency across the board.


When Real-Time Streaming is the Right Choice

Use real-time streaming (Kafka, Kinesis, Pub/Sub) for claims when the business outcome depends on acting within minutes of an event:

Fraud detection: Real-time streaming enables pattern matching against incoming claims before adjudication. Detecting a provider billing the same CPT code 50 times in one hour requires stream processing — a batch system will approve the claims before the fraud pattern is visible.

Care management alerting: When a high-risk member's claim signals an ED visit, the care management team needs a near-real-time alert. Kafka consumers on the claims event stream can trigger care alerts within minutes of claim receipt.

Real-time eligibility and benefits: A provider eligibility check at the point of service queries the claims system in real time. The response must come back before the patient leaves the desk.

Streaming Technology Options

# Kafka Streams fraud detection pattern (pseudocode)
from confluent_kafka import Consumer, Producer

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'fraud-detection-consumer',
    'auto.offset.reset': 'latest'
})
consumer.subscribe(['healthcare.claims.submitted'])

# Count claims per provider per 1-hour tumbling window
provider_claim_counts = {}

for message in consumer.poll_loop():
    claim = json.loads(message.value())
    provider_npi = claim['billing_provider_npi']
    window_key = f"{provider_npi}:{hour_bucket(claim['received_timestamp'])}"

    provider_claim_counts[window_key] = provider_claim_counts.get(window_key, 0) + 1

    if provider_claim_counts[window_key] > FRAUD_THRESHOLD:
        flag_for_investigation(claim, 'HIGH_VOLUME_BILLING')

When Batch Processing is the Right Choice

Use batch processing (Airflow + dbt, Spark batch, Glue) for claims when the analysis requires complete, consistent data that only exists after a window closes:

HEDIS measures: HEDIS denominator and numerator logic joins claims, lab results, pharmacy dispensing, and encounter data. This data arrives through multiple pipelines with different latencies. A streaming HEDIS calculation on incomplete data produces wrong answers. Run HEDIS after the claims month close, when all data is present.

HCC risk adjustment: HCC coding requires validated ICD-10 diagnoses from adjudicated claims. Preliminary (not-yet-adjudicated) claims can carry incorrect codes that affect RAF score calculations. Batch processing after adjudication, with medical record validation workflows, produces defensible scores.

🔎

Free Tool

Search 70,000+ ICD-10 codes →

Financial close: Monthly and quarterly financial statements require fully adjudicated, reconciled claim data. Streaming does not add value here — the finance team has a defined close process that operates on batch data.

dbt Pipeline for Claims Analytics

-- dbt model: claims silver layer (standardized adjudicated claims)
-- models/silver/claims_adjudicated.sql
{{ config(materialized='incremental', unique_key='claim_id', partition_by={'field': 'service_year_month', 'data_type': 'date'}) }}

SELECT
  c.claim_id,
  c.enterprise_member_id,
  c.billing_provider_npi,
  c.rendering_provider_npi,
  c.service_from_date,
  c.service_to_date,
  DATE_TRUNC('month', c.service_from_date) AS service_year_month,
  c.icd10_primary_dx,
  c.icd10_secondary_dx_1,
  c.icd10_secondary_dx_2,
  c.primary_cpt_code,
  c.claim_type,               -- PROFESSIONAL, INSTITUTIONAL, DENTAL
  c.place_of_service_code,
  c.adjudication_status,      -- PAID, DENIED, PENDED
  c.paid_amount,
  c.allowed_amount,
  c.member_liability,
  c.denial_reason_code,
  CURRENT_TIMESTAMP           AS dbt_updated_at
FROM {{ source('claims_bronze', 'raw_claim_header') }} c
WHERE c.adjudication_status = 'PAID'
  AND c.service_from_date >= '2020-01-01'

{% if is_incremental() %}
  AND c.adjudicated_at > (SELECT MAX(dbt_updated_at) FROM {{ this }})
{% endif %}

The Decision Framework

Ask these three questions:

1. What is the business action triggered by this data? If the action must happen within minutes (care alert, fraud block, eligibility check), you need streaming. If the action can wait until tomorrow (HEDIS report, financial dashboard, risk score update), batch is appropriate.

2. Does the analysis require data completeness? HEDIS, HCC scoring, and financial reconciliation all require complete data windows. Incomplete streaming data produces wrong answers for these use cases. Use batch for analyses that require completeness guarantees.

3. What is the operational cost of streaming? Real-time streaming requires always-on infrastructure, more complex failure handling, and ongoing monitoring. If the business value does not justify the operational overhead, batch wins on cost and simplicity.


Hybrid Architecture: Streaming Ingest, Batch Transformation

The most common production pattern in healthcare:

Claim submitted → Kafka (real-time) → Fraud detection consumer → Flag/approve → Operational DB consumer → Real-time eligibility → S3/GCS landing zone (raw storage) ↓ (batch, hourly or daily) Airflow → Spark/dbt → Data warehouse (silver/gold) → HEDIS calculation → HCC risk scoring → Financial reconciliation

Stream everything into a durable landing zone. Apply real-time processing only to the consumers that need it. Run batch transformations against the complete dataset for analytics and reporting.


Key Takeaways

  • Match latency architecture to the business action, not to technology trend. Most healthcare claims analytics tolerates 24-hour batch.
  • Fraud detection, care management alerting, and real-time eligibility require streaming. HEDIS, HCC scoring, and financial reporting require batch.
  • The hybrid architecture (streaming ingest, batch transformation) is the most common and most practical production pattern.
  • dbt handles batch transformation elegantly for claims analytics — incremental models with partition-based processing scale well for multi-year claims data.
  • Before deploying claims SQL to your warehouse, validate query logic and catch anti-patterns with the SQL Linter.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Related Guides

Claims Adjudication

Medical claims processing, auto-adjudication, EOB generation, and denial management.

Read Guide

EDI Transactions

X12 EDI 837, 835, 270/271, and healthcare electronic data interchange.

Read Guide

More in Data Architecture

Azure Synapse vs Snowflake for Healthcare Data Architecture: Which Platform Fits Your Team?

Azure Synapse Analytics and Snowflake both promise a unified cloud data platform — but they make different architectural bets that matter enormously in healthcare. This guide compares them across HIPAA compliance, FHIR integration, PHI governance, cost model, and team fit, with concrete SQL examples and a decision framework built for healthcare data engineers.

Read more

Oracle vs Databricks for Healthcare Data Architecture: Which Platform Should You Choose?

Oracle brings four decades of enterprise database maturity, deep EHR integration, and a proven HIPAA compliance story. Databricks brings a unified lakehouse, native AI/ML pipelines, and the ability to handle FHIR, HL7, and unstructured clinical data at scale. This guide breaks down which platform wins in each healthcare scenario — and when you need both.

Read more

Telehealth Data Architecture: Complete Guide for Data Engineers (2026)

A complete guide to building a telehealth data architecture — core schema design, HL7 and FHIR integration, HIPAA compliance, HCC risk adjustment, and the common mistakes that cause claim denials.

Read more

Free Tools

Free DDL Converter

Translate SQL schemas between Snowflake, BigQuery, Oracle, and SQL Server.

Try it free

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • Introduction
  • Claims Use Cases by Latency Requirement
  • When Real-Time Streaming is the Right Choice
  • Streaming Technology Options
  • When Batch Processing is the Right Choice
  • dbt Pipeline for Claims Analytics
  • The Decision Framework
  • Hybrid Architecture: Streaming Ingest, Batch Transformation
  • Key Takeaways

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools