About This Talk
At Data Fest Online 2020, I introduced MetaDWH — the concept of building a data warehouse on the metadata of the warehouse itself. A “warehouse about the warehouse” that provides operational intelligence, governance capabilities, and data-driven platform management.
The Problem
DWH teams manage hundreds of tables, thousands of columns, and complex dependency graphs — but often lack visibility into their own platform. Which tables are unused? Which ETL jobs are slowing down? Which datasets are most accessed? Without this information, platform decisions are based on tribal knowledge rather than data.
Key Ideas
The Warehouse About the Warehouse — MetaDWH collects and analyzes metadata from multiple sources: table statistics from system catalogs, query logs from the database engine, pipeline execution history from Airflow, schema change history, data freshness metrics, and access patterns from BI tool logs.
Operational Intelligence — With MetaDWH, you can identify unused tables consuming storage, detect ETL jobs that are gradually slowing down, find bottleneck datasets that block downstream processing, and understand which data products deliver the most value based on actual usage.
Data Governance Foundation — MetaDWH provides the building blocks for governance: data lineage (where does this data come from?), ownership tracking (who is responsible for this table?), freshness monitoring (when was this data last updated?), and usage analytics (who uses this data and how?).
Practical Implementation — The talk covered a concrete implementation: extracting metadata from Greenplum system catalogs, collecting Airflow execution logs, parsing query history, and building Tableau dashboards for platform monitoring. The entire MetaDWH can be built with the tools you already have.
Why It Matters
The best-run data platforms treat their own infrastructure as a data problem. MetaDWH turns platform management from a manual, intuition-based activity into a data-driven discipline — enabling better decisions, faster problem detection, and stronger governance.