About This Talk
At Highload++ 2021 — Russia’s largest conference for professionals working with high-load systems — I argued that data warehousing should be treated as a product discipline, with measurable outcomes, user feedback, and continuous improvement driven by metrics rather than intuition.
The Problem
Most DWH teams operate without metrics about their own platform. They know the data inside the warehouse intimately but can’t answer basic questions about the warehouse itself: How fresh is the data? How reliable are the pipelines? How fast are queries? How many people actually use what we build? Without these answers, investment decisions are based on gut feeling.
Key Ideas
DWH as a Product — The warehouse has users (analysts, data scientists, business stakeholders), features (tables, views, reports), a roadmap (new data sources, model improvements), and measurable outcomes (adoption, satisfaction, business impact). Treating it as a product means applying product management principles.
What to Measure — Four categories of DWH metrics: data freshness (how quickly new data appears), pipeline reliability (how often ETL jobs succeed), query performance (how fast users get answers), and adoption (how many people use the data and how often).
How to Implement — Practical implementation approach: extracting metrics from Airflow execution logs, warehouse system catalogs, and query history; building a dedicated metrics data mart; creating operational dashboards that the team reviews daily.
The Business Benefits — With metrics in place, you can enforce SLAs with stakeholders, justify infrastructure investments with data, identify optimization opportunities systematically, and demonstrate the value of the data platform to leadership.
Why It Matters
Data teams that measure their own performance improve faster, make better investment decisions, and build more trust with stakeholders. The “data as a product” mindset is the foundation of modern data platform engineering — and metrics are where it starts.