Introduction

Most healthcare data governance programs are built backwards. A breach happens, OCR sends a letter, and suddenly the organization is scrambling to produce a data inventory it never built, access logs it never retained, and policies it never enforced. The HIPAA data governance framework described here is built forward — starting with inventory and ending with a program that can survive an audit.

This is not a compliance checklist disguised as a guide. It is a practical architecture for governing healthcare data the way a senior engineer would design it: systematic, auditable, and maintainable without a dedicated governance team of ten.

Step 1: Build Your Data Inventory

You cannot govern what you cannot find. Before classifying PHI or assigning stewards, you need a complete inventory of every data store in your environment.

What to capture per asset

For each data store (table, file, API endpoint, data feed), capture:

Asset name and location (database, schema, table; S3 bucket and prefix; API endpoint)
Data type (structured, semi-structured, unstructured)
Source system (Epic, Facets, 835 clearinghouse, lab vendor, etc.)
Owner (system team, business unit)
PHI flag (does this asset contain any of HIPAA's 18 PHI identifiers?)
Access controls (who has access today, through what mechanism)
Retention requirement (6 years for HIPAA; longer for state-specific rules)

How to build it

Start with your cloud infrastructure. Run automated discovery (BigID, Microsoft Purview Scan, or open-source tools like Apache Atlas + custom scanners) across your warehouses, lakes, and databases. Do not rely on manual inventory — it will be incomplete within 90 days.

Then layer in your HL7 and [FHIR](/terms/FHIR) feeds, your EDI transaction stores (837P, 837I, 835), and your operational databases. These are frequently missed in catalog scans because they sit behind integration middleware.

📋HL7 Parser

Parse and validate HL7 v2 messages with segment-level field breakdowns.

Try it free

Step 2: Classify PHI and Sensitivity Levels

Once inventoried, every asset needs a classification. Define at minimum three tiers:

Classification	Definition	Example
PHI — Restricted	Contains one or more HIPAA identifiers	Member SSN, DOB + diagnosis combination
PHI — Sensitive	De-identified but re-identification risk exists	Zip code + age + rare condition code
Internal	No PHI, but not public	Actuarial models, provider contract rates
Public	Safe for external access	Aggregated quality metrics

HIPAA defines 18 specific identifiers that make data PHI. Build a classification policy that checks for these explicitly — do not rely on field names alone. A column named member_key can be PHI if it is linkable to a person.

Step 3: Define Access Controls

Classification drives access. For each PHI tier, define:

Who can access it: Role-based access control (RBAC) groups, not individuals
How they access it: Query interface, direct DB access, API, file export
Under what conditions: Minimum necessary standard — no wildcard SELECT * on PHI tables

In Snowflake, this looks like:

-- Grant access to de-identified view only, not base table
GRANT SELECT ON VIEW analytics.member_deidentified TO ROLE analyst_role;
REVOKE SELECT ON TABLE phi.member_demographics FROM ROLE analyst_role;

Implement column-level masking on PHI fields for roles that need partial access (e.g., a fraud analyst needs member state but not full address):

CREATE MASKING POLICY phi_ssn_mask AS (val STRING)
  RETURNS STRING ->
    CASE
      WHEN CURRENT_ROLE() IN ('phi_admin_role') THEN val
      ELSE '***-**-' || RIGHT(val, 4)
    END;

ALTER TABLE phi.member_demographics
  MODIFY COLUMN ssn SET MASKING POLICY phi_ssn_mask;

Step 4: Implement Audit Logging

HIPAA's Security Rule requires audit controls — mechanisms that record and examine activity in systems containing ePHI. Audit logs must answer:

Who accessed PHI?
When did they access it?
What did they access?
What actions did they take (read, modify, export)?

What to log

All queries against PHI tables (Snowflake Query History, BigQuery Data Access Logs)
All data exports or file downloads
All access control changes (role grants, policy modifications)
All failed access attempts

Retain logs for a minimum of 6 years. Store them in a separate, tamper-evident location — not the same system they are logging.

A minimal audit log schema

CREATE TABLE governance.phi_access_log (
  log_id         BIGINT GENERATED ALWAYS AS IDENTITY,
  event_ts       TIMESTAMP NOT NULL,
  user_id        VARCHAR(100) NOT NULL,
  user_role      VARCHAR(100),
  asset_name     VARCHAR(500) NOT NULL,   -- schema.table or file path
  action         VARCHAR(50) NOT NULL,    -- SELECT, UPDATE, EXPORT, LOGIN_FAIL
  rows_accessed  INT,
  client_ip      VARCHAR(45),
  query_hash     VARCHAR(64),             -- SHA-256 of the query text
  session_id     VARCHAR(200),
  PRIMARY KEY (log_id)
);

Step 5: Define Policies and Enforce Them

A policy that lives in a PDF is not a governance control. Every policy must have a technical enforcement mechanism.

Policy	Technical Control
No PHI in non-production environments	Data masking in ETL pipelines; automated scan on dev/staging
Minimum necessary access	Column-level masking; view-based access over base tables
PHI retention schedule	Automated delete jobs; lifecycle policies on S3
Schema change approval	DDL review in CI/CD; PR gates on PHI tables
Naming standards for PHI columns	Pre-deployment naming audit

Step 6: Assign Data Stewardship Roles

Governance fails when nobody owns the data. Define three roles explicitly:

Data Owner: Business executive accountable for a data domain (e.g., VP of Claims Operations owns claims data). Approves access requests. Does not need to be technical.
Data Steward: Operational manager responsible for quality, definitions, and policy compliance within the domain. Maintains the business glossary. Reviews access anomalies.
Data Custodian: The technical team (data engineering, platform engineering) responsible for implementing the controls the owner and steward define.

Governance Checklist

Before calling your framework production-ready, verify:

Key Takeaways

Build the data inventory first — you cannot classify, control, or audit what you have not found.
Every policy needs a technical enforcement mechanism. Documented policies without controls fail audits.
Audit logging must be tamper-evident and retained for 6 years minimum.
Stewardship roles must be assigned at the domain level, not the table level — the granularity becomes unmanageable.
Schema governance is part of HIPAA compliance. Use the Naming Auditor to enforce PHI column naming standards before data reaches your governed environment.

How to Build a HIPAA-Compliant Data Governance Framework from Scratch