A 5-month intensive Data Engineering course for Junior+/Middle level professionals. Covers DWH design, relational & MPP databases, ETL automation with Airflow, Big Data, cloud infrastructure, data visualization, and data governance.
About the Course
The Data Engineer course by Karpov.Courses is a comprehensive 5-month programme designed for professionals who already have foundational experience with data and want to level up to designing end-to-end data systems. It is aimed at Junior+ and Middle-level practitioners โ data analysts, backend developers, BI engineers, and aspiring data engineers โ who want to move from solving isolated tasks to architecting scalable data platforms.
The course was created by a team of industry practitioners from leading tech companies including Yandex Go, VK, Toloka AI, SberMarket, and others. Every module is built around real-world scenarios, not toy examples.
What You Will Learn
The programme covers the full spectrum of modern data engineering:
- DWH Design โ logical architecture of a data warehouse, dimensional modelling, Data Vault, Anchor Modeling, and choosing the right approach for your use case
- Relational & MPP Databases โ PostgreSQL, Greenplum, understanding distributed systems and when MPP databases outperform traditional ones
- ETL Automation โ principles of ETL/ELT pipeline design, deep dive into Apache Airflow, building and orchestrating automated data pipelines
- Big Data โ distributed storage with Hadoop, data processing with Apache Spark and Hive, stream processing with Kafka, monitoring and profiling Spark jobs
- Cloud Infrastructure โ building DWH and Data Lake in the cloud, Kubernetes for data workloads, JupyterHub and Spark on Kubernetes
- Data Visualisation โ Tableau, Superset, DataLens โ building interactive dashboards for DWH platform monitoring
- Big ML โ distributed machine learning with Spark ML, training and deploying models at scale
- Model Management โ MLflow, dataset versioning, model tracking, and ML pipeline orchestration
- Data Governance โ data quality, lineage tracking, and managing complex data ecosystems
Course Structure
The course runs for 5 months with 3 sessions per week. Each lesson includes video lectures, detailed notes, and hands-on assignments with a two-week soft deadline. Students work on a dedicated remote server with the full data engineering stack pre-installed.
A mid-course project recreates the ETL processes of a large two-tier data platform, giving students a realistic experience of working with Airflow, Spark, S3, and Greenplum in a production-like environment.
Students typically invest 10-15 hours per week, and the format is designed to be compatible with full-time employment.
Tools & Technologies
PostgreSQL, Greenplum, Hadoop, Spark, Hive, Kafka, Airflow, S3, Python, SQL, Kubernetes, Tableau, Superset, DataLens, Spark ML, MLflow
My Role
I am the course author and lecturer for the DWH Design module, where I share over 10 years of experience building and architecting data warehouses at VK and Yandex.Taxi. The module covers the full journey from understanding business requirements to selecting the right modelling approach and designing a maintainable, scalable DWH.