Check out our latest project — dmp-af.cloud, an open-source orchestration platform for dbt →
Conference Talk

MetaDWH — A Data Warehouse About Your Data Warehouse

About This Talk At Data Fest Online 2020, I introduced MetaDWH — the concept of building a data warehouse on the metadata of the warehouse itself. A “warehouse about the warehouse” that provides operational intelligence, governance capabilities, and data-driven platform management.

  • Author

    Evgeny Ermakov

  • Category

    Conference Talk

  • Read Time

    2 min read

  • Last updated

    October 24, 2020

About This Talk

At Data Fest Online 2020, I introduced MetaDWH — the concept of building a data warehouse on the metadata of the warehouse itself. A “warehouse about the warehouse” that provides operational intelligence, governance capabilities, and data-driven platform management.

The Problem

DWH teams manage hundreds of tables, thousands of columns, and complex dependency graphs — but often lack visibility into their own platform. Which tables are unused? Which ETL jobs are slowing down? Which datasets are most accessed? Without this information, platform decisions are based on tribal knowledge rather than data.

Key Ideas

The Warehouse About the Warehouse — MetaDWH collects and analyzes metadata from multiple sources: table statistics from system catalogs, query logs from the database engine, pipeline execution history from Airflow, schema change history, data freshness metrics, and access patterns from BI tool logs.

Operational Intelligence — With MetaDWH, you can identify unused tables consuming storage, detect ETL jobs that are gradually slowing down, find bottleneck datasets that block downstream processing, and understand which data products deliver the most value based on actual usage.

Data Governance Foundation — MetaDWH provides the building blocks for governance: data lineage (where does this data come from?), ownership tracking (who is responsible for this table?), freshness monitoring (when was this data last updated?), and usage analytics (who uses this data and how?).

Practical Implementation — The talk covered a concrete implementation: extracting metadata from Greenplum system catalogs, collecting Airflow execution logs, parsing query history, and building Tableau dashboards for platform monitoring. The entire MetaDWH can be built with the tools you already have.

Why It Matters

The best-run data platforms treat their own infrastructure as a data problem. MetaDWH turns platform management from a manual, intuition-based activity into a data-driven discipline — enabling better decisions, faster problem detection, and stronger governance.

Watch

Watch the full talk on YouTube →

Call to Action Background
Free discovery call

Ready to Make Data Work for Your Business?

Join companies that trust iJKos & partners to build reliable data infrastructure and turn complexity into clear, confident decisions.