Educational Course

Data Engineer Course (Karpov.Courses)

A 5-month intensive Data Engineering course for Junior+/Middle level professionals. Covers DWH design, relational & MPP databases, ETL automation with Airflow, Big Data, cloud infrastructure, data visualization, and data governance.

Category
Educational Course
Role
Course Author / Lecturer
Duration
5 months
Technologies
PostgreSQL, Greenplum, Airflow, Spark

About the Course

The Data Engineer course by Karpov.Courses is a comprehensive 5-month programme designed for professionals who already have foundational experience with data and want to level up to designing end-to-end data systems. It is aimed at Junior+ and Middle-level practitioners — data analysts, backend developers, BI engineers, and aspiring data engineers — who want to move from solving isolated tasks to architecting scalable data platforms.

The course was created by a team of industry practitioners from leading tech companies including Yandex Go, VK, Toloka AI, SberMarket, and others. Every module is built around real-world scenarios, not toy examples.

What You Will Learn

The programme covers the full spectrum of modern data engineering:

DWH Design — logical architecture of a data warehouse, dimensional modelling, Data Vault, Anchor Modeling, and choosing the right approach for your use case
Relational & MPP Databases — PostgreSQL, Greenplum, understanding distributed systems and when MPP databases outperform traditional ones
ETL Automation — principles of ETL/ELT pipeline design, deep dive into Apache Airflow, building and orchestrating automated data pipelines
Big Data — distributed storage with Hadoop, data processing with Apache Spark and Hive, stream processing with Kafka, monitoring and profiling Spark jobs
Cloud Infrastructure — building DWH and Data Lake in the cloud, Kubernetes for data workloads, JupyterHub and Spark on Kubernetes
Data Visualisation — Tableau, Superset, DataLens — building interactive dashboards for DWH platform monitoring
Big ML — distributed machine learning with Spark ML, training and deploying models at scale
Model Management — MLflow, dataset versioning, model tracking, and ML pipeline orchestration
Data Governance — data quality, lineage tracking, and managing complex data ecosystems

Course Structure

The course runs for 5 months with 3 sessions per week. Each lesson includes video lectures, detailed notes, and hands-on assignments with a two-week soft deadline. Students work on a dedicated remote server with the full data engineering stack pre-installed.

A mid-course project recreates the ETL processes of a large two-tier data platform, giving students a realistic experience of working with Airflow, Spark, S3, and Greenplum in a production-like environment.

Students typically invest 10-15 hours per week, and the format is designed to be compatible with full-time employment.

Tools & Technologies

PostgreSQL, Greenplum, Hadoop, Spark, Hive, Kafka, Airflow, S3, Python, SQL, Kubernetes, Tableau, Superset, DataLens, Spark ML, MLflow

My Role

I am the course author and lecturer for the DWH Design module, where I share over 10 years of experience building and architecting data warehouses at VK and Yandex.Taxi. The module covers the full journey from understanding business requirements to selecting the right modelling approach and designing a maintainable, scalable DWH.