flowchart LR
ADLS2[("Azure ADLS2")]
ADLS2 --> DB["Databricks<br>PySpark + Delta Lake<br>Bronze → Silver → Gold"]
ADLS2 --> SF["Snowflake<br>SQL + Snowpark<br>Bronze → Silver → Gold"]
ADLS2 --> DBT["dbt<br>SQL models<br>Bronze → Silver → Gold"]
DB --> PBI["Power BI"]
SF --> PBI
DBT --> PBI
DB --> ML["ML<br>(Optional)"]
SF --> ML
KAFKA[("Kafka<br>(Optional)")] --> DB
KAFKA --> SF
style ADLS2 fill:#0057b8,color:#fff,stroke:#003d82
style PBI fill:#107c10,color:#fff,stroke:#0a5c0a
style ML fill:#6d28d9,color:#fff,stroke:#5b21b6
style KAFKA fill:#d97706,color:#fff,stroke:#b45309
style DB fill:#01065c,color:#fff,stroke:#000940
style SF fill:#01065c,color:#fff,stroke:#000940
style DBT fill:#01065c,color:#fff,stroke:#000940
MHP Data Engineer Masterclass 2026
YellowLine NYC · Databricks, Snowflake & dbt
One dataset, three pipelines, one decision.
Tech stack
Azure ADLS2
Databricks
Snowflake
dbt
Power BI
Three complete Bronze → Silver → Gold medallion pipelines, each using a different tool:
- Databricks: PySpark notebooks + Unity Catalog + Delta Lake
- Snowflake: SQL + Snowpark Python + External Stages
- dbt: SQL models running on Snowflake (with optional Databricks target)
Plus two optional advanced modules (after the main day or as standalone sessions):
- Module 8 — Streaming: Live user-activity events via Aiven Kafka → Databricks Structured Streaming / Snowflake Dynamic Tables / dbt
dynamic_table - Module 9 — Machine Learning: Tip prediction with Databricks sklearn + MLflow, Snowflake Cortex ML & Snowpark ML, dbt feature engineering
What’s New in 2026
Compared to 2025, this year’s masterclass has been significantly updated:
From half-day to full-day
The training has been expanded from a half-day session to a full-day format, allowing deeper coverage of each tool, more hands-on practice time, and dedicated modules for production patterns and AI features that were previously only touched on briefly.
Microsoft Fabric — now a separate training
Fabric is no longer included in this masterclass. It has been split into its own dedicated training, so this year’s content focuses exclusively on Databricks, Snowflake, and dbt — the three core data engineering platforms.
New modules added
Several new modules have been introduced, including Production Patterns (scheduling, monitoring, CI/CD), AI Features (Cortex ML, Snowpark ML, Databricks Genie), and a structured Comparison & Wrap-up with open discussion and architecture decision exercises.
Updated terminology & concepts
All professional terms, platform features, and operational procedures have been updated to reflect the 2026 platform releases — including Unity Catalog, Snowflake Dynamic Tables, dbt 1.8+, Snowpark Python, Delta Lake improvements, and the latest AI/ML integrations across both platforms.
The training day covers 9 modules (7 core + 2 optional advanced):
| # | Module | Focus |
|---|---|---|
| — | Story: Use Case & Characters | YellowLine NYC narrative & Three Constraints |
| 1 | DE Fundamentals | Medallion architecture, Spark & Delta Lake theory |
| 2 | Databricks Pipeline | PySpark notebooks + Unity Catalog |
| 3 | Snowflake Pipeline | SQL + Snowpark Python + External Stages |
| 4 | dbt Pipeline | SQL models, tests & lineage |
| 5 | Production Patterns | Scheduling, monitoring, CI/CD |
| 6 | AI Features | Cortex ML, Snowpark ML, Databricks ML |
| 7 | Comparison & Wrap-up | Tool evaluation, Three Constraints discussion |
| 8 | Streaming (Optional) | Aiven Kafka → Structured Streaming / Dynamic Tables |
| 9 | Machine Learning (Optional) | Tip prediction, MLflow, feature engineering |
Architecture Overview
All three pipelines process the same NYC Taxi data from a shared Azure ADLS2 source:
See the Architecture Reference for the full data flow detail.
Quick Links
- Briefing — environment check, architecture overview, and everything you need before we start
- Prerequisites & Setup
- Architecture Reference
- Data Model & KPIs
- Glossary
About
This training is developed and delivered by MHP — a Porsche subsidiary and management & IT consultancy specializing in digital transformation.