Introduction

Healthcare data teams face a critical architectural decision when building pipelines: write raw SQL directly against source systems, or adopt dbt (data build tool) as a transformation layer. Both approaches work — but they have very different implications for maintainability, auditability, and compliance in regulated environments.

This guide breaks down both options so you can make the right call for your team.

What Is dbt?

dbt is an open-source transformation framework that sits between your raw source data and your analytics-ready tables. You write SQL SELECT statements, dbt wraps them into materialized models (tables or views), and it handles dependency ordering, incremental loading, and documentation automatically.

The key shift: in dbt, your SQL is version-controlled, tested, and documented by design. In raw SQL pipelines, those properties have to be bolted on manually.

The Case for Raw SQL in Healthcare Pipelines

Raw SQL still has a legitimate place in healthcare data work — especially for:

One-off analyses where building a dbt model adds overhead with no reuse benefit
Stored procedures that handle complex transactional logic (claims adjudication, eligibility checks)
Legacy environments where dbt cannot be installed or approved by IT/security
Near-real-time pipelines where dbt's batch-oriented model creates latency issues
Small teams where the overhead of a dbt project outweighs the benefit

The honest truth: raw SQL is faster to write the first time. It requires no tooling setup, no project structure, and no learning curve. For a one-person team building ad hoc reports, dbt may be overkill.

The Case for dbt in Healthcare Pipelines

Healthcare data has properties that make dbt especially valuable.

Auditability and Lineage

HIPAA, HITRUST, and CMS reporting requirements all demand that you can explain where a number came from. dbt's lineage graph gives you a visual, automated map of every table dependency — from raw claims to a final KPI — without having to document it separately.

Raw SQL pipelines built over years typically have no equivalent. When a metric changes unexpectedly, hunting through dozens of stored procedures and ETL jobs is slow and error-prone.

Built-in Data Testing

dbt's schema.yml tests let you assert that:

Every member_id in your encounters table exists in your members table
diagnosis_code values conform to ICD-10 format
clm_paid_amt is never negative
Date fields like admission_dt are never null for inpatient records

models:
  - name: fct_claims
    columns:
      - name: clm_id
        tests:
          - not_null
          - unique
      - name: member_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_members')
              field: member_id
      - name: diagnosis_code
        tests:
          - not_null

Raw SQL pipelines can test too — but tests have to be written, scheduled, and maintained separately. dbt makes them first-class citizens.

For the raw SQL and stored procedures that sit outside your dbt project, use the mdatool SQL Linter to catch missing aliases, ambiguous column references, and healthcare-specific anti-patterns before they reach production.

Modular, Reusable Transformations

Healthcare data models tend to share a lot of logic: member eligibility windows, provider NPI lookups, ICD-10 grouper mappings, plan benefit structures. In raw SQL, this logic gets duplicated across scripts. One change requires hunting down every script that copies it.

When building NPI lookup models, validate provider NPI numbers against the NPPES registry using the mdatool NPI Lookup tool before they enter your staging layer — bad NPIs in dim_providers silently break every downstream join that depends on them.

In dbt, shared logic lives in ref() models or macros:

-- models/mart/fct_inpatient_claims.sql
SELECT
    c.clm_id,
    c.member_id,
    m.member_nm,
    c.admission_dt,
    c.discharge_dt,
    c.drg_code,
    c.billed_amt,
    c.paid_amt
FROM {{ ref('stg_claims') }} c
LEFT JOIN {{ ref('dim_members') }} m
    ON c.member_id = m.member_id
WHERE c.clm_type = 'IP'

Change stg_claims once, and the update propagates through the entire downstream dependency tree automatically.

Version Control and Peer Review

Raw SQL pipelines often live in shared database schemas, local files, or BI tool editors. Changes are invisible until something breaks. dbt projects live in Git — every transformation change is a commit, reviewed in a pull request, tied to a ticket.

For compliance-heavy environments, this audit trail is invaluable.

Side-by-Side Comparison

Dimension	Raw SQL	dbt
Time to first result	Fast	Slower (setup required)
Lineage documentation	Manual	Automatic
Data testing	Manual / external	Built-in (schema.yml)
Code reuse	Copy-paste	ref() and macros
Version control	Optional	Enforced by design
Incremental loads	Manual	Native support
Team collaboration	Hard to scale	Git-based workflow
Learning curve	Low	Moderate
HIPAA auditability	Possible but manual	Structured by default
Best for	Ad hoc, legacy, real-time	Batch analytics, reporting

For definitions of HIPAA, claims adjudication, DRG, HCC, and other healthcare terms that appear throughout your dbt models, the mdatool Healthcare Data Dictionary is a free reference built specifically for data engineers and analysts.

🧮HCC Calculator

Calculate RAF scores and estimate risk adjustment payments for Medicare Advantage members.

Try it free

Tired of legacy complexity and high pricing?

mdatool offers instant DDL conversion, HL7 support, and AI-driven data modeling for a fraction of the cost of ER/Studio or ERwin.

Try mdatool for Free

Where Healthcare Teams Get It Wrong

Mistake 1: Choosing one approach for everything

The most mature healthcare data teams use both. Raw SQL (or stored procedures) handles transactional logic and real-time feeds. dbt handles the analytics layer — member cohort tables, claims summaries, quality measure calculations, cost reports.

Mistake 2: Starting dbt too late

Teams that delay adopting dbt often have 3–5 years of raw SQL spaghetti to untangle before they can migrate. The earlier you standardize on dbt for analytics-layer transformations, the cheaper the migration becomes.

Mistake 3: Skipping dbt tests for "known good" data

Healthcare source data is never "known good." Claims systems have duplicate IDs. Member eligibility files have overlapping date ranges. Pharmacy data has missing NDC codes. dbt tests are cheap insurance against assumptions that fail in production.

While ERwin requires a complex setup for schema generation, you can generate clean DDL in seconds using our free converter.

Convert your first 5 DDLs — No Credit Card Required

Recommended Architecture for Healthcare Data Pipelines

Raw Source (EHR, Claims, Lab, Pharmacy)
        ↓
  Ingestion Layer (Fivetran, Airbyte, Custom ETL)
        ↓
  Staging Layer     — dbt stg_ models (light cleaning only)
        ↓
  Intermediate Layer — dbt int_ models (business logic)
        ↓
  Mart Layer        — dbt fct_ / dim_ models (analytics-ready)
        ↓
  BI / Reporting (Tableau, Looker, Power BI)

Raw SQL lives in the ingestion layer and in stored procedures that handle transactional operations. dbt owns everything from staging to mart.

When drafting the DDL for your mart layer tables, use the mdatool DDL Converter to generate warehouse-specific CREATE TABLE statements for Snowflake, Redshift, or BigQuery. Before deploying, run your DDL through the mdatool Naming Auditor to catch inconsistent column naming across your fct_ and dim_ models — especially useful when multiple engineers are contributing models.

🔄DDL Converter

Convert DDL instantly between Snowflake, BigQuery, Databricks, and PostgreSQL.

Try it free

Conclusion

For most healthcare data teams building analytics pipelines, dbt wins on maintainability, auditability, and team scalability. But raw SQL isn't going away — it remains the right tool for transactional logic, real-time feeds, and legacy environments where dbt can't be introduced.

The best teams don't debate which one to use. They define the boundary clearly, apply each where it fits, and use dbt's testing and lineage features to ensure the analytics layer can be trusted in a regulated environment.

As you build out your pipeline, the mdatool SQL Linter helps validate raw SQL and stored procedures outside your dbt project, the mdatool Naming Auditor enforces consistent column naming across all your models, and the mdatool Healthcare Data Dictionary gives your full team — engineers, analysts, and PMs — a shared vocabulary for the healthcare concepts your pipelines process.

dbt vs Raw SQL for Healthcare Data Pipelines

Introduction

What Is dbt?

The Case for Raw SQL in Healthcare Pipelines

The Case for dbt in Healthcare Pipelines

Auditability and Lineage

Built-in Data Testing

Modular, Reusable Transformations

Version Control and Peer Review

Side-by-Side Comparison

Tired of legacy complexity and high pricing?

Where Healthcare Teams Get It Wrong

Mistake 1: Choosing one approach for everything

Mistake 2: Starting dbt too late

Mistake 3: Skipping dbt tests for "known good" data

Recommended Architecture for Healthcare Data Pipelines

Conclusion

More in Data Engineering

Ready to improve your data architecture?