About This Talk
At DE or DIE 2020, I took the audience behind the scenes of the data warehouse powering Yandex.Taxi (Yandex Go) — one of the largest ride-hailing services in Eastern Europe. The talk covered the technical architecture, organizational structure, and the unique challenges of building a data platform at this scale.
Key Ideas
Scale of the DWH — Millions of daily trips, dozens of events per trip, real-time pricing, surge detection, driver-rider matching, and route optimization. Every trip generates a rich stream of events that feed into the analytical warehouse. The data volume and velocity are enormous.
Organizational Structure — The data platform team includes data engineers, analytics engineers, analysts, and domain specialists. The talk covered how ownership is distributed, how teams interact, and the evolution from a centralized model toward domain-oriented data ownership.
Technology Stack — ClickHouse for real-time analytics with sub-second query response times. Greenplum for the analytical warehouse handling complex transformations and historical analysis. Custom ETL frameworks optimized for the specific data patterns of a ride-hailing platform.
Roles and Responsibilities — How different roles contribute to the data platform: data engineers build and maintain the infrastructure, analytics engineers design the semantic layer, analysts create insights, and the data partner role (covered in my Data Fest 2021 talk) bridges the gaps between all of them.
Why It Matters
Large-scale data platforms are built by teams, not tools. Understanding how an organization like Yandex.Taxi structures its data team, distributes ownership, and chooses technology provides a blueprint that others can adapt to their own scale and context.