MHP Data Engineer Masterclass 2026

YellowLine NYC · Databricks, Snowflake & dbt

One dataset, three pipelines, one decision.

Tech stack

Azure ADLS2 Databricks Snowflake dbt Power BI

NoteWhat You’ll Build

Three complete Bronze → Silver → Gold medallion pipelines, each using a different tool:

  • Databricks: PySpark notebooks + Unity Catalog + Delta Lake
  • Snowflake: SQL + Snowpark Python + External Stages
  • dbt: SQL models running on Snowflake (with optional Databricks target)

Plus two optional advanced modules (after the main day or as standalone sessions):

  • Module 8 — Streaming: Live user-activity events via Aiven Kafka → Databricks Structured Streaming / Snowflake Dynamic Tables / dbt dynamic_table
  • Module 9 — Machine Learning: Tip prediction with Databricks sklearn + MLflow, Snowflake Cortex ML & Snowpark ML, dbt feature engineering

What’s New in 2026

Compared to 2025, this year’s masterclass has been significantly updated:

From half-day to full-day

The training has been expanded from a half-day session to a full-day format, allowing deeper coverage of each tool, more hands-on practice time, and dedicated modules for production patterns and AI features that were previously only touched on briefly.

Microsoft Fabric — now a separate training

Fabric is no longer included in this masterclass. It has been split into its own dedicated training, so this year’s content focuses exclusively on Databricks, Snowflake, and dbt — the three core data engineering platforms.

New modules added

Several new modules have been introduced, including Production Patterns (scheduling, monitoring, CI/CD), AI Features (Cortex ML, Snowpark ML, Databricks Genie), and a structured Comparison & Wrap-up with open discussion and architecture decision exercises.

Updated terminology & concepts

All professional terms, platform features, and operational procedures have been updated to reflect the 2026 platform releases — including Unity Catalog, Snowflake Dynamic Tables, dbt 1.8+, Snowpark Python, Delta Lake improvements, and the latest AI/ML integrations across both platforms.

TipModule overview

The training day covers 9 modules (7 core + 2 optional advanced):

# Module Focus
Story: Use Case & Characters YellowLine NYC narrative & Three Constraints
1 DE Fundamentals Medallion architecture, Spark & Delta Lake theory
2 Databricks Pipeline PySpark notebooks + Unity Catalog
3 Snowflake Pipeline SQL + Snowpark Python + External Stages
4 dbt Pipeline SQL models, tests & lineage
5 Production Patterns Scheduling, monitoring, CI/CD
6 AI Features Cortex ML, Snowpark ML, Databricks ML
7 Comparison & Wrap-up Tool evaluation, Three Constraints discussion
8 Streaming (Optional) Aiven Kafka → Structured Streaming / Dynamic Tables
9 Machine Learning (Optional) Tip prediction, MLflow, feature engineering

Architecture Overview

All three pipelines process the same NYC Taxi data from a shared Azure ADLS2 source:

flowchart LR
    ADLS2[("Azure ADLS2")]

    ADLS2 --> DB["Databricks<br>PySpark + Delta Lake<br>Bronze → Silver → Gold"]
    ADLS2 --> SF["Snowflake<br>SQL + Snowpark<br>Bronze → Silver → Gold"]
    ADLS2 --> DBT["dbt<br>SQL models<br>Bronze → Silver → Gold"]

    DB  --> PBI["Power BI"]
    SF  --> PBI
    DBT --> PBI

    DB  --> ML["ML<br>(Optional)"]
    SF  --> ML

    KAFKA[("Kafka<br>(Optional)")] --> DB
    KAFKA --> SF

    style ADLS2 fill:#0057b8,color:#fff,stroke:#003d82
    style PBI   fill:#107c10,color:#fff,stroke:#0a5c0a
    style ML    fill:#6d28d9,color:#fff,stroke:#5b21b6
    style KAFKA fill:#d97706,color:#fff,stroke:#b45309
    style DB    fill:#01065c,color:#fff,stroke:#000940
    style SF    fill:#01065c,color:#fff,stroke:#000940
    style DBT   fill:#01065c,color:#fff,stroke:#000940

See the Architecture Reference for the full data flow detail.

About

This training is developed and delivered by MHP — a Porsche subsidiary and management & IT consultancy specializing in digital transformation.