Introduction
The wrong question for healthcare claims data is "should we do real-time or batch?" The right question is "what is the latency requirement for this specific use case?" Claims adjudication requires sub-second decisioning. [HEDIS](/terms/HEDIS) measure calculation tolerates a 24-hour lag. Fraud detection needs near-real-time. Financial close reporting can wait for a nightly load. Conflating these into a single architecture produces either an over-engineered streaming system for use cases that do not need it or an under-powered batch system that cannot support the ones that do.
This guide provides a decision framework for matching claims processing architecture to actual latency requirements.
Claims Use Cases by Latency Requirement
| Use Case | Latency Requirement | Why |
|---|---|---|
| Real-time adjudication | < 500ms | Provider needs immediate eligibility/benefit determination at point of care |
| Fraud detection | 1–5 minutes | Detecting fraud patterns before claim approval |
| Prior auth status | < 60 seconds | Provider portal checking PA status in real time |
| ED alert / care management | 5–15 minutes | Trigger outreach within hours of high-risk admit |
| Denial management | 2–4 hours | Same-day rework queue for operational teams |
| Financial reconciliation | 24 hours | Daily close, not real-time |
| HEDIS measure calculation | 24–48 hours | Measure logic requires complete adjudication + lab + pharmacy |
| Risk adjustment scoring | 48–72 hours | HCC mapping requires validated diagnosis codes |
| Regulatory reporting | Days–weeks | RADV, EDGE server submissions are periodic |
The pattern is clear: operational and clinical use cases require low latency; analytics and reporting use cases tolerate higher latency. Build your architecture to match this distribution, not to minimize latency across the board.
When Real-Time Streaming is the Right Choice
Use real-time streaming (Kafka, Kinesis, Pub/Sub) for claims when the business outcome depends on acting within minutes of an event:
Fraud detection: Real-time streaming enables pattern matching against incoming claims before adjudication. Detecting a provider billing the same CPT code 50 times in one hour requires stream processing — a batch system will approve the claims before the fraud pattern is visible.
Care management alerting: When a high-risk member's claim signals an ED visit, the care management team needs a near-real-time alert. Kafka consumers on the claims event stream can trigger care alerts within minutes of claim receipt.
Real-time eligibility and benefits: A provider eligibility check at the point of service queries the claims system in real time. The response must come back before the patient leaves the desk.
Streaming Technology Options
# Kafka Streams fraud detection pattern (pseudocode) from confluent_kafka import Consumer, Producer consumer = Consumer({ 'bootstrap.servers': 'kafka:9092', 'group.id': 'fraud-detection-consumer', 'auto.offset.reset': 'latest' }) consumer.subscribe(['healthcare.claims.submitted']) # Count claims per provider per 1-hour tumbling window provider_claim_counts = {} for message in consumer.poll_loop(): claim = json.loads(message.value()) provider_npi = claim['billing_provider_npi'] window_key = f"{provider_npi}:{hour_bucket(claim['received_timestamp'])}" provider_claim_counts[window_key] = provider_claim_counts.get(window_key, 0) + 1 if provider_claim_counts[window_key] > FRAUD_THRESHOLD: flag_for_investigation(claim, 'HIGH_VOLUME_BILLING')
When Batch Processing is the Right Choice
Use batch processing (Airflow + dbt, Spark batch, Glue) for claims when the analysis requires complete, consistent data that only exists after a window closes:
HEDIS measures: HEDIS denominator and numerator logic joins claims, lab results, pharmacy dispensing, and encounter data. This data arrives through multiple pipelines with different latencies. A streaming HEDIS calculation on incomplete data produces wrong answers. Run HEDIS after the claims month close, when all data is present.
HCC risk adjustment: HCC coding requires validated ICD-10 diagnoses from adjudicated claims. Preliminary (not-yet-adjudicated) claims can carry incorrect codes that affect RAF score calculations. Batch processing after adjudication, with medical record validation workflows, produces defensible scores.
Financial close: Monthly and quarterly financial statements require fully adjudicated, reconciled claim data. Streaming does not add value here — the finance team has a defined close process that operates on batch data.
dbt Pipeline for Claims Analytics
-- dbt model: claims silver layer (standardized adjudicated claims) -- models/silver/claims_adjudicated.sql {{ config(materialized='incremental', unique_key='claim_id', partition_by={'field': 'service_year_month', 'data_type': 'date'}) }} SELECT c.claim_id, c.enterprise_member_id, c.billing_provider_npi, c.rendering_provider_npi, c.service_from_date, c.service_to_date, DATE_TRUNC('month', c.service_from_date) AS service_year_month, c.icd10_primary_dx, c.icd10_secondary_dx_1, c.icd10_secondary_dx_2, c.primary_cpt_code, c.claim_type, -- PROFESSIONAL, INSTITUTIONAL, DENTAL c.place_of_service_code, c.adjudication_status, -- PAID, DENIED, PENDED c.paid_amount, c.allowed_amount, c.member_liability, c.denial_reason_code, CURRENT_TIMESTAMP AS dbt_updated_at FROM {{ source('claims_bronze', 'raw_claim_header') }} c WHERE c.adjudication_status = 'PAID' AND c.service_from_date >= '2020-01-01' {% if is_incremental() %} AND c.adjudicated_at > (SELECT MAX(dbt_updated_at) FROM {{ this }}) {% endif %}
The Decision Framework
Ask these three questions:
1. What is the business action triggered by this data? If the action must happen within minutes (care alert, fraud block, eligibility check), you need streaming. If the action can wait until tomorrow (HEDIS report, financial dashboard, risk score update), batch is appropriate.
2. Does the analysis require data completeness? HEDIS, HCC scoring, and financial reconciliation all require complete data windows. Incomplete streaming data produces wrong answers for these use cases. Use batch for analyses that require completeness guarantees.
3. What is the operational cost of streaming? Real-time streaming requires always-on infrastructure, more complex failure handling, and ongoing monitoring. If the business value does not justify the operational overhead, batch wins on cost and simplicity.
Hybrid Architecture: Streaming Ingest, Batch Transformation
The most common production pattern in healthcare:
Claim submitted → Kafka (real-time) → Fraud detection consumer → Flag/approve
→ Operational DB consumer → Real-time eligibility
→ S3/GCS landing zone (raw storage)
↓ (batch, hourly or daily)
Airflow → Spark/dbt → Data warehouse (silver/gold)
→ HEDIS calculation
→ HCC risk scoring
→ Financial reconciliation
Stream everything into a durable landing zone. Apply real-time processing only to the consumers that need it. Run batch transformations against the complete dataset for analytics and reporting.
Key Takeaways
- Match latency architecture to the business action, not to technology trend. Most healthcare claims analytics tolerates 24-hour batch.
- Fraud detection, care management alerting, and real-time eligibility require streaming. HEDIS, HCC scoring, and financial reporting require batch.
- The hybrid architecture (streaming ingest, batch transformation) is the most common and most practical production pattern.
- dbt handles batch transformation elegantly for claims analytics — incremental models with partition-based processing scale well for multi-year claims data.
- Before deploying claims SQL to your warehouse, validate query logic and catch anti-patterns with the SQL Linter.
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Related Guides
More in Data Architecture
TEFCA and Data Architecture: What Health Systems Need to Build Now
TEFCA is now operational. Qualified Health Information Networks are live. If your health system or payer is not actively planning TEFCA participation, you are behind. Here is what the architecture requires.
Read moreHealthcare Data Lakehouse Architecture: Building on Delta Lake for Payers
The data lakehouse pattern — combining the scalability of a data lake with the ACID guarantees of a warehouse — is a natural fit for payer data. Here is how to build it on Delta Lake, layer by layer.
Read moreEHR Data Integration Architecture: Epic, Cerner, and Oracle Health Compared
Epic, Cerner, and Oracle Health are the three dominant EHR systems — and each requires a different integration strategy. Here is what actually works for extracting clinical data into your warehouse.
Read moreReady to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.