Healthcare Metadata Generator: Generate Data Dictionary, Business Glossary & PHI Governance Report from Any SQL Schema
If you've ever inherited a healthcare data warehouse with zero documentation, you know the pain. Hundreds of columns like mbr_elig_ind, clm_adj_rsn_cd, and prvdr_credntl_sts — and not a single description in sight.
Today we're launching the Healthcare Metadata Generator — a free tool that turns any SQL schema into a complete data dictionary in under one second.
What It Does
Paste any healthcare SQL schema — a CREATE TABLE statement, a plain column list, a BigQuery DDL, or even a MongoDB JSON schema — and instantly get:
1. Business Metadata Every column gets a plain-English business description drawn from our 100,000+ ISO-11179 standard healthcare term definitions. Not generic AI guesses — real healthcare definitions written by data engineers for data engineers.
2. HIPAA PHI Classification Every column is automatically classified against all 18 HIPAA Safe Harbor identifiers:
- 🔴 CRITICAL — SSN, MBI, MRN (encrypt immediately)
- 🟠 HIGH — Names, DOB, phone, email (mask in non-production)
- 🟡 MEDIUM — Member IDs, dates, ZIP codes (review required)
- ✅ NONE — Codes, amounts, indicators (standard governance)
3. dbt YAML
Ready-to-paste dbt documentation including column descriptions, PHI meta tags, and auto-generated tests (unique, not_null, accepted_values).
4. Governance Report A full Markdown governance report with PHI summary, Snowflake dynamic data masking DDL, and HIPAA compliance notes per column.
5. Data Dictionary Export Download a CSV or Markdown data dictionary with every field a data governance team needs: table name, entity name, column name, ISO standard name, attribute name, business definition, data type, PHI category, masking requirement, and confidence score.
6. ISO-11179 Rename SQL
If your schema uses non-standard naming like memberEligibilityIndicator or DATE_OF_BIRTH, the tool generates the ALTER TABLE rename SQL to bring it in line with ISO-11179 healthcare data naming standards.
Who Is This For
- Healthcare data engineers documenting new tables before they hit production
- dbt developers who need
schema.ymldescriptions fast - Data governance teams auditing PHI exposure across a data warehouse
- New team members trying to understand an undocumented schema
- Healthcare analytics consultants onboarding to a client's data environment
How It Works
Step 1 — Paste Your DDL
The tool accepts any format:
CREATE TABLE dim_member (
mbr_key INTEGER NOT NULL,
mbr_id VARCHAR(50) NOT NULL,
mbr_first_nm VARCHAR(100),
mbr_dob DATE,
mbr_ssn VARCHAR(9),
mbr_elig_ind BOOLEAN,
mbr_raf_scr DECIMAL(10,3),
load_dt TIMESTAMP
);
It also handles Snowflake DDL with COMMENT fields, BigQuery DDL with backtick table names, Oracle VARCHAR2 and NUMBER types, MongoDB JSON Schema, plain column lists, and CSV headers.
Step 2 — Resolution Pipeline
For each column the tool runs a 9-step resolution pipeline:
- Composite term lookup — checks 111 known healthcare composites (dos, raf, mbi, hcc, etc.)
- Database lookup — queries 100,000+ mdatool term definitions by column abbreviation
- Entity prefix extraction — identifies mbr, prvdr, clm, rx, fac, and 150+ other healthcare entity prefixes
- Suffix extraction — maps _ind, _cd, _amt, _dt, _scr and 85+ other data type suffixes
- Attribute resolution — resolves elig, enrl, raf, dob and 280+ attribute abbreviations
- PHI detection — applies HIPAA 18-identifier rules with severity classification
- Confidence scoring — calculates 0-99% confidence using all resolution signals
- Description generation — builds business descriptions from DB lookup or pattern matching
- AI fallback — for columns scoring below 40%, Claude generates contextual descriptions using full table context
Step 3 — Get Your Output
Results appear in under 100ms. Average confidence across a standard healthcare schema is 90-99% for ISO-11179 compliant column names.
Example Output
For mbr_ssn VARCHAR(9):
Business Description: The nine-digit Social Security Number assigned to a health plan member by the Social Security Administration.
PHI Category: 🔴 CRITICAL HIPAA Identifier: Social Security Numbers (PHI #7) Masking Required: YES Confidence: 99% (DB Lookup)
Snowflake Masking DDL:
SHA2(mbr_ssn || 'SALT_KEY', 256) AS mbr_ssn
dbt YAML:
- name: mbr_ssn
description: "The nine-digit Social Security Number..."
meta:
phi: true
phi_category: "CRITICAL"
masking_required: true
For mbr_raf_scr DECIMAL(10,3):
Business Description: The Risk Adjustment Factor score calculated for an individual health plan member representing their predicted healthcare cost relative to the average Medicare beneficiary.
PHI Category: ✅ NONE Confidence: 99% (DB Lookup)
Supported Platforms
The tool auto-detects your database platform from DDL syntax and normalizes accordingly:
| Platform | Detection Signal | Type Normalization |
|---|---|---|
| Snowflake | TIMESTAMP_NTZ, VARIANT, CREATE OR REPLACE | STRING → VARCHAR |
| BigQuery | Backtick tables, INT64, FLOAT64, STRUCT | INT64 → BIGINT |
| Oracle | VARCHAR2, NUMBER, CLOB | NUMBER → DECIMAL |
| SQL Server | NVARCHAR, DATETIME2, UNIQUEIDENTIFIER | NVARCHAR → VARCHAR |
| PostgreSQL | SERIAL, JSONB, BYTEA | SERIAL → INTEGER |
| MongoDB | JSON Schema with type/properties keys | JSON types → SQL types |
What Makes This Different From ChatGPT
You can ask ChatGPT to describe healthcare columns. It will give you something reasonable. But it won't:
- Pull from 100,000+ curated healthcare-specific definitions
- Apply HIPAA 18-identifier PHI rules with masking DDL
- Generate Snowflake column-level security policies
- Produce ISO-11179 compliant rename scripts
- Give you a confidence score so you know what to review
- Export a complete data dictionary CSV with 18 fields per column
- Flag
mbr_mbi_idas CRITICAL PHI and explain exactly why
The Healthcare Metadata Generator is purpose-built for healthcare data engineering — not a general-purpose AI that happens to know some medical terms.
Supported Schema Types
Beyond SQL DDL, the tool handles:
camelCase (FHIR / MongoDB / Java)
memberId → member_id → "Member identifier"
dateOfBirth → date_of_birth → "Date of birth — PHI"
eligibilityIndicator → elig_ind → "Eligibility indicator"
UPPER_SNAKE (Oracle legacy)
MBR_ELIG_IND → mbr_elig_ind → "Member eligibility indicator"
PRVDR_NPI → prvdr_npi → "National Provider Identifier"
Plain column list (no DDL)
mbr_id
mbr_first_nm
dos
raf
dual
CSV headers
member_id,first_name,date_of_birth,ssn,elig_ind,zip_code
The Data Dictionary Export
The CSV export includes every field your data governance team needs:
| Field | Example |
|---|---|
| Table Name | dim_member |
| Entity Name | Member |
| Column Name | mbr_ssn |
| ISO Standard Name | mbr_ssn |
| Attribute Name | Social Security Number |
| Business Definition | The nine-digit SSN assigned... |
| Data Type | VARCHAR |
| Size / Format | Variable |
| PHI / PII | YES |
| PHI Category | CRITICAL |
| PHI Explanation | Contains SSN — highest risk PHI... |
| Masking Required | YES |
| Governance Tag | PHI:CRITICAL |
| Confidence Score | 99% |
| Resolution Method | DB Lookup (mdatool) |
| Needs Review | NO |
This CSV opens directly in Excel and can be imported into data catalog tools like Alation, Collibra, or Atlan.
What's Coming Next
- Auto-learning: Columns that get AI-generated descriptions are tracked. The most common unknown columns are automatically added to the term database so future users get 99% confidence instead of AI fallback.
- Batch processing: Upload multiple DDL files at once for warehouse-wide documentation
- Atlan / Collibra export: Direct push to your data catalog via API
- dbt package: Run the metadata generator directly from your dbt project
Try the Healthcare Metadata Generator
The Healthcare Metadata Generator is completely free — no login required. Paste your schema and get results in under one second.
Try the Healthcare Metadata Generator →
mdatool Team
The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.
Ready to improve your data architecture?
Free tools for DDL conversion, SQL analysis, naming standards, and more.
Get weekly healthcare data engineering tips
Practical guides on data modeling, SQL standards, and healthcare domain conventions — straight to your inbox.
No spam. Unsubscribe any time.