BlogData GovernanceHow to Build a HIPAA-Compliant Data Governance Framework from Scratch
Data Governance

How to Build a HIPAA-Compliant Data Governance Framework from Scratch

Most healthcare organizations discover their governance gaps during an audit. This step-by-step guide walks you through building a HIPAA data governance framework before that happens.

mdatool Team·April 21, 2026·10 min read
HIPAAdata governance frameworkPHIcompliancedata stewardship

Introduction

Most healthcare data governance programs are built backwards. A breach happens, OCR sends a letter, and suddenly the organization is scrambling to produce a data inventory it never built, access logs it never retained, and policies it never enforced. The HIPAA data governance framework described here is built forward — starting with inventory and ending with a program that can survive an audit.

This is not a compliance checklist disguised as a guide. It is a practical architecture for governing healthcare data the way a senior engineer would design it: systematic, auditable, and maintainable without a dedicated governance team of ten.


Step 1: Build Your Data Inventory

You cannot govern what you cannot find. Before classifying PHI or assigning stewards, you need a complete inventory of every data store in your environment.

What to capture per asset

For each data store (table, file, API endpoint, data feed), capture:

  • Asset name and location (database, schema, table; S3 bucket and prefix; API endpoint)
  • Data type (structured, semi-structured, unstructured)
  • Source system (Epic, Facets, 835 clearinghouse, lab vendor, etc.)
  • Owner (system team, business unit)
  • PHI flag (does this asset contain any of HIPAA's 18 PHI identifiers?)
  • Access controls (who has access today, through what mechanism)
  • Retention requirement (6 years for HIPAA; longer for state-specific rules)

How to build it

Start with your cloud infrastructure. Run automated discovery (BigID, Microsoft Purview Scan, or open-source tools like Apache Atlas + custom scanners) across your warehouses, lakes, and databases. Do not rely on manual inventory — it will be incomplete within 90 days.

Then layer in your HL7 and [FHIR](/terms/FHIR) feeds, your EDI transaction stores (837P, 837I, 835), and your operational databases. These are frequently missed in catalog scans because they sit behind integration middleware.


Step 2: Classify PHI and Sensitivity Levels

Once inventoried, every asset needs a classification. Define at minimum three tiers:

ClassificationDefinitionExample
PHI — RestrictedContains one or more HIPAA identifiersMember SSN, DOB + diagnosis combination
PHI — SensitiveDe-identified but re-identification risk existsZip code + age + rare condition code
InternalNo PHI, but not publicActuarial models, provider contract rates
PublicSafe for external accessAggregated quality metrics

HIPAA defines 18 specific identifiers that make data PHI. Build a classification policy that checks for these explicitly — do not rely on field names alone. A column named member_key can be PHI if it is linkable to a person.


Step 3: Define Access Controls

Classification drives access. For each PHI tier, define:

  • Who can access it: Role-based access control (RBAC) groups, not individuals
  • How they access it: Query interface, direct DB access, API, file export
  • Under what conditions: Minimum necessary standard — no wildcard SELECT * on PHI tables

In Snowflake, this looks like:

-- Grant access to de-identified view only, not base table
GRANT SELECT ON VIEW analytics.member_deidentified TO ROLE analyst_role;
REVOKE SELECT ON TABLE phi.member_demographics FROM ROLE analyst_role;

Implement column-level masking on PHI fields for roles that need partial access (e.g., a fraud analyst needs member state but not full address):

CREATE MASKING POLICY phi_ssn_mask AS (val STRING)
  RETURNS STRING ->
    CASE
      WHEN CURRENT_ROLE() IN ('phi_admin_role') THEN val
      ELSE '***-**-' || RIGHT(val, 4)
    END;

ALTER TABLE phi.member_demographics
  MODIFY COLUMN ssn SET MASKING POLICY phi_ssn_mask;

Step 4: Implement Audit Logging

HIPAA's Security Rule requires audit controls — mechanisms that record and examine activity in systems containing ePHI. Audit logs must answer:

  • Who accessed PHI?
  • When did they access it?
  • What did they access?
  • What actions did they take (read, modify, export)?

What to log

  • All queries against PHI tables (Snowflake Query History, BigQuery Data Access Logs)
  • All data exports or file downloads
  • All access control changes (role grants, policy modifications)
  • All failed access attempts

Retain logs for a minimum of 6 years. Store them in a separate, tamper-evident location — not the same system they are logging.

A minimal audit log schema

CREATE TABLE governance.phi_access_log (
  log_id         BIGINT GENERATED ALWAYS AS IDENTITY,
  event_ts       TIMESTAMP NOT NULL,
  user_id        VARCHAR(100) NOT NULL,
  user_role      VARCHAR(100),
  asset_name     VARCHAR(500) NOT NULL,   -- schema.table or file path
  action         VARCHAR(50) NOT NULL,    -- SELECT, UPDATE, EXPORT, LOGIN_FAIL
  rows_accessed  INT,
  client_ip      VARCHAR(45),
  query_hash     VARCHAR(64),             -- SHA-256 of the query text
  session_id     VARCHAR(200),
  PRIMARY KEY (log_id)
);

Step 5: Define Policies and Enforce Them

A policy that lives in a PDF is not a governance control. Every policy must have a technical enforcement mechanism.

PolicyTechnical Control
No PHI in non-production environmentsData masking in ETL pipelines; automated scan on dev/staging
Minimum necessary accessColumn-level masking; view-based access over base tables
PHI retention scheduleAutomated delete jobs; lifecycle policies on S3
Schema change approvalDDL review in CI/CD; PR gates on PHI tables
Naming standards for PHI columnsPre-deployment naming audit

Step 6: Assign Data Stewardship Roles

Governance fails when nobody owns the data. Define three roles explicitly:

  • Data Owner: Business executive accountable for a data domain (e.g., VP of Claims Operations owns claims data). Approves access requests. Does not need to be technical.
  • Data Steward: Operational manager responsible for quality, definitions, and policy compliance within the domain. Maintains the business glossary. Reviews access anomalies.
  • Data Custodian: The technical team (data engineering, platform engineering) responsible for implementing the controls the owner and steward define.

Governance Checklist

Before calling your framework production-ready, verify:

  • Data inventory completed and reviewed for every data store
  • PHI classification applied to all assets
  • RBAC defined and enforced; no individual user has direct PHI access
  • Column-level masking active on all 18 PHI identifier fields
  • Audit logging active on all PHI tables; retained for 6 years
  • Non-production environments masked or synthetic
  • Data owners and stewards assigned to every PHI domain
  • Schema change approval process defined (PR review, naming standards gate)
  • Retention delete jobs scheduled and tested
  • BAAs signed with every data vendor and tool vendor with PHI access

Key Takeaways

  • Build the data inventory first — you cannot classify, control, or audit what you have not found.
  • Every policy needs a technical enforcement mechanism. Documented policies without controls fail audits.
  • Audit logging must be tamper-evident and retained for 6 years minimum.
  • Stewardship roles must be assigned at the domain level, not the table level — the granularity becomes unmanageable.
  • Schema governance is part of HIPAA compliance. Use the Naming Auditor to enforce PHI column naming standards before data reaches your governed environment.
M

mdatool Team

The mdatool team builds free engineering tools for healthcare data architects, analysts, and engineers working across payer, provider, and life sciences data.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free