Meetup

Anchor Modeling and Greenplum — How Dreams Are Ruined by Reality

About This Talk At the Greenplum Community Meetup in 2021, I shared a candid story about implementing Anchor Modeling on Greenplum — and the unexpected challenges that emerged when an elegant theoretical methodology met the realities of an MPP database engine. The talk covered what worked, what didn’t, and the practical workarounds we developed.

Author
Evgeny Ermakov
Category
Meetup
Read Time
2 min read
Last updated
August 25, 2021

About This Talk

At the Greenplum Community Meetup in 2021, I shared a candid story about implementing Anchor Modeling on Greenplum — and the unexpected challenges that emerged when an elegant theoretical methodology met the realities of an MPP database engine. The talk covered what worked, what didn’t, and the practical workarounds we developed.

The Problem

Anchor Modeling is beautiful on paper: 6th normal form, zero-impact schema evolution, full historization by default. But Greenplum is optimized for wide, denormalized tables with predictable JOIN patterns — the exact opposite of what Anchor Modeling produces. The clash between methodology and technology created performance challenges that weren’t obvious until production.

Key Ideas

The Promise — Why we chose Anchor Modeling: the business required extreme schema flexibility, new data sources arrived weekly, and the cost of ALTER TABLE operations on large Greenplum tables was prohibitive. Anchor Modeling’s table-per-attribute approach seemed like the perfect solution.

The Reality — What happened in production: the extreme number of JOINs required by Anchor Modeling overwhelmed Greenplum’s query optimizer. Queries that were simple in dimensional models became 20-way JOINs. Query planning time sometimes exceeded query execution time.

The Workarounds — How we adapted: materialized views for common query patterns, hybrid approaches that kept Anchor Modeling for loading but pre-joined data for consumption, query rewriting strategies, and Greenplum-specific optimizations (distribution key selection for Anchor tables, partition strategies for temporal attributes).

The Lessons — Methodology and technology aren’t independent choices. The best modeling approach depends not just on business requirements but also on the capabilities and limitations of your database engine. Always prototype at scale before committing.

Why It Matters

This talk is a cautionary tale for anyone evaluating DWH methodologies. The theoretical elegance of an approach doesn’t guarantee practical success. Understanding the interaction between your modeling methodology and your database engine is essential.