Introduction
Healthcare data teams face a critical architectural decision when building pipelines: write raw SQL directly against source systems, or adopt dbt (data build tool) as a transformation layer. Both approaches work — but they have very different implications for maintainability, auditability, and compliance in regulated environments.
This guide breaks down both options so you can make the right call for your team.
What Is dbt?
dbt is an open-source transformation framework that sits between your raw source data and your analytics-ready tables. You write SQL SELECT statements, dbt wraps them into materialized models (tables or views), and it handles dependency ordering, incremental loading, and documentation automatically.
The key shift: in dbt, your SQL is version-controlled, tested, and documented by design. In raw SQL pipelines, those properties have to be bolted on manually.
The Case for Raw SQL in Healthcare Pipelines
Raw SQL still has a legitimate place in healthcare data work — especially for:
- One-off analyses where building a dbt model adds overhead with no reuse benefit
- Stored procedures that handle complex transactional logic (claims adjudication, eligibility checks)
- Legacy environments where dbt cannot be installed or approved by IT/security
- Near-real-time pipelines where dbt's batch-oriented model creates latency issues
- Small teams where the overhead of a dbt project outweighs the benefit
The honest truth: raw SQL is faster to write the first time. It requires no tooling setup, no project structure, and no learning curve. For a one-person team building ad hoc reports, dbt may be overkill.
The Case for dbt in Healthcare Pipelines
Healthcare data has properties that make dbt especially valuable.
Auditability and Lineage
HIPAA, HITRUST, and CMS reporting requirements all demand that you can explain where a number came from. dbt's lineage graph gives you a visual, automated map of every table dependency — from raw claims to a final KPI — without having to document it separately.
Raw SQL pipelines built over years typically have no equivalent. When a metric changes unexpectedly, hunting through dozens of stored procedures and ETL jobs is slow and error-prone.
Built-in Data Testing
dbt's schema.yml tests let you assert that:
- Every
member_idin your encounters table exists in your members table diagnosis_codevalues conform to ICD-10 formatclaim_paid_amountis never negative- Date fields like
admission_dateare never null for inpatient records
models: - name: fct_claims columns: - name: claim_id tests: - not_null - unique - name: member_id tests: - not_null - relationships: to: ref('dim_members') field: member_id - name: diagnosis_code tests: - not_null
Raw SQL pipelines can test too — but tests have to be written, scheduled, and maintained separately. dbt makes them first-class citizens.
For the raw SQL and stored procedures that sit outside your dbt project, use the mdatool SQL Linter to catch missing aliases, ambiguous column references, and healthcare-specific anti-patterns before they reach production.
Modular, Reusable Transformations
Healthcare data models tend to share a lot of logic: member eligibility windows, provider NPI lookups, ICD-10 grouper mappings, plan benefit structures. In raw SQL, this logic gets duplicated across scripts. One change requires hunting down every script that copies it.
When building NPI lookup models, validate provider NPI numbers against the NPPES registry using the mdatool NPI Lookup tool before they enter your staging layer — bad NPIs in dim_providers silently break every downstream join that depends on them.
In dbt, shared logic lives in ref() models or macros:
-- models/mart/fct_inpatient_claims.sql SELECT c.claim_id, c.member_id, m.member_name, c.admission_date, c.discharge_date, c.drg_code, c.billed_amount, c.paid_amount FROM {{ ref('stg_claims') }} c LEFT JOIN {{ ref('dim_members') }} m ON c.member_id = m.member_id WHERE c.claim_type = 'IP'
Change stg_claims once, and the update propagates through the entire downstream dependency tree automatically.
Version Control and Peer Review
Raw SQL pipelines often live in shared database schemas, local files, or BI tool editors. Changes are invisible until something breaks. dbt projects live in Git — every transformation change is a commit, reviewed in a pull request, tied to a ticket.
For compliance-heavy environments, this audit trail is invaluable.
Side-by-Side Comparison
| Dimension | Raw SQL | dbt |
|---|---|---|
| Time to first result | Fast | Slower (setup required) |
| Lineage documentation | Manual | Automatic |
| Data testing | Manual / external | Built-in (schema.yml) |
| Code reuse | Copy-paste | ref() and macros |
| Version control | Optional | Enforced by design |
| Incremental loads | Manual | Native support |
| Team collaboration | Hard to scale | Git-based workflow |
| Learning curve | Low | Moderate |
| HIPAA auditability | Possible but manual | Structured by default |
| Best for | Ad hoc, legacy, real-time | Batch analytics, reporting |
For definitions of HIPAA, claims adjudication, DRG, HCC, and other healthcare terms that appear throughout your dbt models, the mdatool Healthcare Data Dictionary is a free reference built specifically for data engineers and analysts.
Where Healthcare Teams Get It Wrong
Mistake 1: Choosing one approach for everything
The most mature healthcare data teams use both. Raw SQL (or stored procedures) handles transactional logic and real-time feeds. dbt handles the analytics layer — member cohort tables, claims summaries, quality measure calculations, cost reports.
Mistake 2: Starting dbt too late
Teams that delay adopting dbt often have 3–5 years of raw SQL spaghetti to untangle before they can migrate. The earlier you standardize on dbt for analytics-layer transformations, the cheaper the migration becomes.
Mistake 3: Skipping dbt tests for "known good" data
Healthcare source data is never "known good." Claims systems have duplicate IDs. Member eligibility files have overlapping date ranges. Pharmacy data has missing NDC codes. dbt tests are cheap insurance against assumptions that fail in production.
Recommended Architecture for Healthcare Data Pipelines
Raw Source (EHR, Claims, Lab, Pharmacy)
↓
Ingestion Layer (Fivetran, Airbyte, Custom ETL)
↓
Staging Layer — dbt stg_ models (light cleaning only)
↓
Intermediate Layer — dbt int_ models (business logic)
↓
Mart Layer — dbt fct_ / dim_ models (analytics-ready)
↓
BI / Reporting (Tableau, Looker, Power BI)
Raw SQL lives in the ingestion layer and in stored procedures that handle transactional operations. dbt owns everything from staging to mart.
When drafting the DDL for your mart layer tables, use the mdatool DDL Converter to generate warehouse-specific CREATE TABLE statements for Snowflake, Redshift, or BigQuery. Before deploying, run your DDL through the mdatool Naming Auditor to catch inconsistent column naming across your fct_ and dim_ models — especially useful when multiple engineers are contributing models.
Conclusion
For most healthcare data teams building analytics pipelines, dbt wins on maintainability, auditability, and team scalability. But raw SQL isn't going away — it remains the right tool for transactional logic, real-time feeds, and legacy environments where dbt can't be introduced.
The best teams don't debate which one to use. They define the boundary clearly, apply each where it fits, and use dbt's testing and lineage features to ensure the analytics layer can be trusted in a regulated environment.
As you build out your pipeline, the mdatool SQL Linter helps validate raw SQL and stored procedures outside your dbt project, the mdatool Naming Auditor enforces consistent column naming across all your models, and the mdatool Healthcare Data Dictionary gives your full team — engineers, analysts, and PMs — a shared vocabulary for the healthcare concepts your pipelines process.
mdatool Team
The mdatool team builds free tools for healthcare data engineers — DDL converters, SQL linters, naming auditors, and data modeling guides.
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.