mdatool
Healthcare Data Dictionary for the Modern Data Stack
LibraryBlogPricing
mdatool
mdatool

The healthcare data dictionary for dbt, Snowflake, Databricks, and BigQuery. 100,000+ ISO-11179 standard terms, free SQL tools, and AI data modeling.

HIPAA-AlignedEnterprise Ready

Tools

  • SQL Linter
  • DDL Converter
  • Bulk Sanitizer
  • Naming Auditor
  • Name Generator
  • AI Data Modeling
  • HCC Calculator
  • Data Model Canvas

Library

  • Glossary
  • Guides
  • Blog

Company

  • About
  • Contact
  • Pricing

Account

  • Sign Up Free
  • Sign In
  • Upgrade to Pro
  • Dashboard

Legal

  • Privacy Policy
  • Terms of Service

© 2026 mdatool. All rights reserved.

Built for healthcare data teams.

HomeBlogToolsHealthcare Metadata Generator: Generate Data Dictionary, Business Glossary & PHI Governance Report from Any SQL Schema
Tools

Healthcare Metadata Generator: Generate Data Dictionary, Business Glossary & PHI Governance Report from Any SQL Schema

Paste any healthcare SQL schema and instantly generate a complete data dictionary, business glossary, HIPAA PHI classification, dbt YAML, and data governance report — powered by 100,000+ healthcare term definitions.

mdatool Team·June 23, 2026·12 min read
metadataPHIdbtgovernancehealthcare data

Healthcare Metadata Generator: Generate Data Dictionary, Business Glossary & PHI Governance Report from Any SQL Schema

If you've ever inherited a healthcare data warehouse with zero documentation, you know the pain. Hundreds of columns like mbr_elig_ind, clm_adj_rsn_cd, and prvdr_credntl_sts — and not a single description in sight.

Today we're launching the Healthcare Metadata Generator — a free tool that turns any SQL schema into a complete data dictionary in under one second.


What It Does

Paste any healthcare SQL schema — a CREATE TABLE statement, a plain column list, a BigQuery DDL, or even a MongoDB JSON schema — and instantly get:

1. Business Metadata Every column gets a plain-English business description drawn from our 100,000+ ISO-11179 standard healthcare term definitions. Not generic AI guesses — real healthcare definitions written by data engineers for data engineers.

2. HIPAA PHI Classification Every column is automatically classified against all 18 HIPAA Safe Harbor identifiers:

  • 🔴 CRITICAL — SSN, MBI, MRN (encrypt immediately)
  • 🟠 HIGH — Names, DOB, phone, email (mask in non-production)
  • 🟡 MEDIUM — Member IDs, dates, ZIP codes (review required)
  • ✅ NONE — Codes, amounts, indicators (standard governance)

3. dbt YAML Ready-to-paste dbt documentation including column descriptions, PHI meta tags, and auto-generated tests (unique, not_null, accepted_values).

4. Governance Report A full Markdown governance report with PHI summary, Snowflake dynamic data masking DDL, and HIPAA compliance notes per column.

5. Data Dictionary Export Download a CSV or Markdown data dictionary with every field a data governance team needs: table name, entity name, column name, ISO standard name, attribute name, business definition, data type, PHI category, masking requirement, and confidence score.

6. ISO-11179 Rename SQL If your schema uses non-standard naming like memberEligibilityIndicator or DATE_OF_BIRTH, the tool generates the ALTER TABLE rename SQL to bring it in line with ISO-11179 healthcare data naming standards.


Who Is This For

  • Healthcare data engineers documenting new tables before they hit production
  • dbt developers who need schema.yml descriptions fast
  • Data governance teams auditing PHI exposure across a data warehouse
  • New team members trying to understand an undocumented schema
  • Healthcare analytics consultants onboarding to a client's data environment

How It Works

Step 1 — Paste Your DDL

The tool accepts any format:

CREATE TABLE dim_member (
  mbr_key        INTEGER        NOT NULL,
  mbr_id         VARCHAR(50)    NOT NULL,
  mbr_first_nm   VARCHAR(100),
  mbr_dob        DATE,
  mbr_ssn        VARCHAR(9),
  mbr_elig_ind   BOOLEAN,
  mbr_raf_scr    DECIMAL(10,3),
  load_dt        TIMESTAMP
);

It also handles Snowflake DDL with COMMENT fields, BigQuery DDL with backtick table names, Oracle VARCHAR2 and NUMBER types, MongoDB JSON Schema, plain column lists, and CSV headers.

Step 2 — Resolution Pipeline

For each column the tool runs a 9-step resolution pipeline:

  1. Composite term lookup — checks 111 known healthcare composites (dos, raf, mbi, hcc, etc.)
  2. Database lookup — queries 100,000+ mdatool term definitions by column abbreviation
  3. Entity prefix extraction — identifies mbr, prvdr, clm, rx, fac, and 150+ other healthcare entity prefixes
  4. Suffix extraction — maps _ind, _cd, _amt, _dt, _scr and 85+ other data type suffixes
  5. Attribute resolution — resolves elig, enrl, raf, dob and 280+ attribute abbreviations
  6. PHI detection — applies HIPAA 18-identifier rules with severity classification
  7. Confidence scoring — calculates 0-99% confidence using all resolution signals
  8. Description generation — builds business descriptions from DB lookup or pattern matching
  9. AI fallback — for columns scoring below 40%, Claude generates contextual descriptions using full table context

Step 3 — Get Your Output

Results appear in under 100ms. Average confidence across a standard healthcare schema is 90-99% for ISO-11179 compliant column names.


Example Output

For mbr_ssn VARCHAR(9):

Business Description: The nine-digit Social Security Number assigned to a health plan member by the Social Security Administration.

PHI Category: 🔴 CRITICAL HIPAA Identifier: Social Security Numbers (PHI #7) Masking Required: YES Confidence: 99% (DB Lookup)

Snowflake Masking DDL:

SHA2(mbr_ssn || 'SALT_KEY', 256) AS mbr_ssn

dbt YAML:

- name: mbr_ssn
  description: "The nine-digit Social Security Number..."
  meta:
    phi: true
    phi_category: "CRITICAL"
    masking_required: true

For mbr_raf_scr DECIMAL(10,3):

Business Description: The Risk Adjustment Factor score calculated for an individual health plan member representing their predicted healthcare cost relative to the average Medicare beneficiary.

PHI Category: ✅ NONE Confidence: 99% (DB Lookup)


Supported Platforms

The tool auto-detects your database platform from DDL syntax and normalizes accordingly:

PlatformDetection SignalType Normalization
SnowflakeTIMESTAMP_NTZ, VARIANT, CREATE OR REPLACESTRING → VARCHAR
BigQueryBacktick tables, INT64, FLOAT64, STRUCTINT64 → BIGINT
OracleVARCHAR2, NUMBER, CLOBNUMBER → DECIMAL
SQL ServerNVARCHAR, DATETIME2, UNIQUEIDENTIFIERNVARCHAR → VARCHAR
PostgreSQLSERIAL, JSONB, BYTEASERIAL → INTEGER
MongoDBJSON Schema with type/properties keysJSON types → SQL types

What Makes This Different From ChatGPT

You can ask ChatGPT to describe healthcare columns. It will give you something reasonable. But it won't:

  • Pull from 100,000+ curated healthcare-specific definitions
  • Apply HIPAA 18-identifier PHI rules with masking DDL
  • Generate Snowflake column-level security policies
  • Produce ISO-11179 compliant rename scripts
  • Give you a confidence score so you know what to review
  • Export a complete data dictionary CSV with 18 fields per column
  • Flag mbr_mbi_id as CRITICAL PHI and explain exactly why

The Healthcare Metadata Generator is purpose-built for healthcare data engineering — not a general-purpose AI that happens to know some medical terms.


Supported Schema Types

Beyond SQL DDL, the tool handles:

camelCase (FHIR / MongoDB / Java)

memberId → member_id → "Member identifier"
dateOfBirth → date_of_birth → "Date of birth — PHI"
eligibilityIndicator → elig_ind → "Eligibility indicator"

UPPER_SNAKE (Oracle legacy)

MBR_ELIG_IND → mbr_elig_ind → "Member eligibility indicator"
PRVDR_NPI → prvdr_npi → "National Provider Identifier"

Plain column list (no DDL)

mbr_id
mbr_first_nm
dos
raf
dual

CSV headers

member_id,first_name,date_of_birth,ssn,elig_ind,zip_code

The Data Dictionary Export

The CSV export includes every field your data governance team needs:

FieldExample
Table Namedim_member
Entity NameMember
Column Namembr_ssn
ISO Standard Namembr_ssn
Attribute NameSocial Security Number
Business DefinitionThe nine-digit SSN assigned...
Data TypeVARCHAR
Size / FormatVariable
PHI / PIIYES
PHI CategoryCRITICAL
PHI ExplanationContains SSN — highest risk PHI...
Masking RequiredYES
Governance TagPHI:CRITICAL
Confidence Score99%
Resolution MethodDB Lookup (mdatool)
Needs ReviewNO

This CSV opens directly in Excel and can be imported into data catalog tools like Alation, Collibra, or Atlan.


What's Coming Next

  • Auto-learning: Columns that get AI-generated descriptions are tracked. The most common unknown columns are automatically added to the term database so future users get 99% confidence instead of AI fallback.
  • Batch processing: Upload multiple DDL files at once for warehouse-wide documentation
  • Atlan / Collibra export: Direct push to your data catalog via API
  • dbt package: Run the metadata generator directly from your dbt project

Try the Healthcare Metadata Generator

The Healthcare Metadata Generator is completely free — no login required. Paste your schema and get results in under one second.

Try the Healthcare Metadata Generator →


M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free

Get weekly healthcare data engineering tips

Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.

No spam. Unsubscribe any time.

On this page

  • What It Does
  • Who Is This For
  • How It Works
  • Step 1 — Paste Your DDL
  • Step 2 — Resolution Pipeline
  • Step 3 — Get Your Output
  • Example Output
  • Supported Platforms
  • What Makes This Different From ChatGPT
  • Supported Schema Types
  • The Data Dictionary Export
  • What's Coming Next
  • Try the Healthcare Metadata Generator

Share

Share on XShare on LinkedIn

Engineering Tools

Convert DDL, lint SQL, and audit naming conventions — free.

Explore Tools