Facilitator Guide — Day-of Runbook

Facilitator-only — not shown to trainees during labs

TRAINER ONLY

YellowLine NYC masterclass · MHP Data Engineer Masterclass 2026

Audience: Lead trainer + co-trainer
Related: pre-class-checklist.qmd · module-delivery-pattern.qmd · whiteboard-prompts.qmd · reflection-prompts.qmd


Pre-class infrastructure setup (1–2 weeks before)

These tasks are done once by the trainer before the workshop day. They cannot be done on the morning of the class — plan ahead.

ADLS2 Storage

Set this up first — Databricks 00_setup.py requires the Storage Account Key to read Parquet data from ADLS2. The key must be available before any Databricks notebook can run.

  1. Storage account: mhpdeworkshopsa (shared across all attendees)
  2. Generate a fresh Storage Account Key — Azure Portal → Storage account → Access keys → regenerate key1 before each workshop
  3. Generate a SAS Token for Snowflake External Stage:
    • Azure Portal → Storage account → Shared access signature
    • Permissions: Read + List (minimum)
    • Allowed resource types: Container + Object
    • Expiry: set to workshop date + 2 days buffer
    • Copy the SAS token string (starts with ?sv=...)
  4. Verify Parquet files are accessible at nyc-taxi-data/ container

Databricks Workspace

  1. Create the workspace (if not already provisioned) in Azure — region West Europe, Premium tier recommended for Unity Catalog.

  2. Invite attendeesAdmin Console → Users → Add user for each attendee email. Assign standard access level. Each attendee receives an activation email — remind them to accept before class.

  3. Create clusters — one per attendee:

    Setting Value
    Cluster name de-workshop-{ATTENDEE_ID} (e.g., de-workshop-01_alice)
    Policy (none — or create a Workshop policy to limit cost)
    Runtime 15.4 LTS ML
    Node type Standard_DS3_v2 (4 cores, 14 GB)
    Workers 1
    Auto-terminate 30 minutes
    Photon Disabled (not needed for workshop volume)

    Create clusters in Terminated state — attendees start them during Module 2.

    Cluster Policy (recommended): Create a policy named Workshop that limits node types to Standard_DS3_v2 and max workers to 1. Assign this policy to all attendee clusters to prevent accidental cost overruns.

  4. Unity Catalog permissions — run these GRANT statements in SQL Editor (or a setup notebook) for each attendee after their schemas are created by 00_setup.py:

    -- Grant catalog access (run once per attendee)
    GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;
    GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;
    GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_bronze TO `attendee@example.com`;
    GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_silver TO `attendee@example.com`;
    GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `attendee@example.com`;

    Alternatively, if all attendees share a group (workshop-2026), grant to the group:

    GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop-2026`;
    GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop-2026`;
  5. Share notebooks — place the 4 pipeline notebooks in a shared folder:

    • Workspace → Shared → de-masterclass-2026 → notebooks/
    • 01_bronze_ingestion.py, 02_silver_cleaning.py, 03_gold_kpis.py, 04_ai_features.py
    • Set folder permission to CAN RUN for all attendees (read-only + run, no edit)
  6. Databricks SQL Warehouse (only needed if dbt uses Databricks target):

    • SQL Editor → SQL Warehouses → Create SQL Warehouse
    • Name: de-workshop-wh, Size: 2X-Small, Auto-suspend: 5 minutes
    • If dbt only targets Snowflake, this step can be skipped
  7. Databricks CLI authentication (needed for Module 8 Aiven Secrets):

    The trainer needs to create the workshop-scope secrets scope before class or guide attendees through it during Module 8. Two options:

    Option A — Trainer creates scope via CLI (recommended):

    # Authenticate CLI to workspace
    databricks configure --token
    # Prompt: Databricks Host: https://<workspace>.azuredatabricks.net
    # Prompt: Token: <generate PAT from Settings → User Settings → Access Tokens>
    
    # Create scope (once, accessible to all attendees)
    databricks secrets create-scope --scope workshop-scope
    
    # Add secrets (values from Aiven Console)
    databricks secrets put --scope workshop-scope --key aiven-bootstrap-servers
    databricks secrets put --scope workshop-scope --key aiven-ca-cert
    databricks secrets put --scope workshop-scope --key aiven-client-cert
    databricks secrets put --scope workshop-scope --key aiven-client-key
    databricks secrets put --scope workshop-scope --key aiven-topic

    Option B — Attendees create their own scope during Module 8 (requires each attendee to have a PAT and Databricks CLI installed).

Aiven Kafka (Module 8 only)

  1. Create Aiven Kafka cluster — see aiven-streaming-setup.qmd
  2. Start User Activity generator (4 hours max on free tier — start on the morning of Module 8, not before)
  3. Download SSL certificates: ca.pem, service.cert, service.key
  4. Note the Service URI: kafka-xxxxx.aivencloud.com:12345
  5. Start the relay consumer (streaming/snowflake/00_relay_consumer.py) — requires ADLS2 key

Power BI (Module 7 demo)

  1. Open powerbi/YellowLine_NYC_Dashboard.pbix and verify it connects to Snowflake Gold tables
  2. Test DirectQuery refresh
  3. Have .pbix file ready on the demo machine

Before the room opens (T-30 min)

Task Owner Done
Test projector / second screen for animations Lead [ ]
Open story site: quarto preview workshop-2026-v2/ port 4201 Co-trainer [ ]
Verify Databricks workspace + catalog — clusters Terminated, notebooks in Shared folder (see Pre-class setup) Co-trainer [ ]
Verify Snowflake on trainer’s own trial account (students create theirs during class) Co-trainer [ ]
Prepare credentials to distribute: SAS token, ADLS2 storage key, attendee IDs Co-trainer [ ]
Print architecture decision matrix (1 per trainee) Co-trainer [ ]
Open Google Form URLs — QR / short links ready per module Co-trainer [ ]
Photo / save blank whiteboard space for Story sketch Lead [ ]
Power BI .pbix open (Module 7 demo) Co-trainer [ ]
.env / Codespaces tested on one machine Co-trainer [ ]

Credentials & Materials to Distribute

Each attendee needs the following credentials and materials during the workshop. Prepare these before class and distribute at the appropriate module.

Item When to distribute Format Notes
ATTENDEE_ID Start of day (Module 1) Printed card or slide e.g., 01_alice, 02_bob — used in every schema/table name
Databricks workspace URL Module 2 Invite link via email Trainer sends workspace invite to each attendee’s email before class
ADLS2 Storage Account Key Module 2 Verbal or printed Used by Databricks 00_setup.py to read Parquet from ADLS2. Never commit to Git.
SAS Token Module 3 Printed card or slide Used by Snowflake 02_external_stage.sql to create External Stage. Has an expiry date — generate fresh before each workshop.
Databricks Personal Access Token Only if using dbt with Databricks target Self-service Attendee generates their own via Settings → Access Tokens. Not needed if dbt only targets Snowflake.
Snowflake account Self-service (before or during Module 3) Attendee creates own trial at signup.snowflake.com Attendee is ACCOUNTADMIN on their own account; 00_account_setup.sql creates DE_WORKSHOP_ROLE
ImportantSnowflake is self-service

Unlike Databricks (trainer-managed workspace), each attendee creates their own Snowflake trial account. This means: - Attendees are ACCOUNTADMIN on their own accounts - They run 00_account_setup.sql themselves during Module 3 — this creates the database (DE_MASTERCLASS), warehouse (DE_WORKSHOP_WH), role (DE_WORKSHOP_ROLE), and personal schemas - The trainer cannot pre-verify attendee Snowflake accounts — only the trainer’s own account can be verified beforehand - dbt connects to Snowflake using the attendee’s own credentials (username/password + DE_WORKSHOP_ROLE)

Databricks Workspace

  1. Login — Open the workspace URL (e.g., https://<workspace>.azuredatabricks.net). Confirm you land on the landing page with the left sidebar visible (Workspace, Catalog, Compute, SQL Editor, AI/ML, Genie, Jobs & Pipelines icons).
  2. Compute — Click Compute in the left sidebar. Verify:
    • At least one cluster per attendee exists (named de-workshop-{attendee_id} — see Pre-class cluster creation)
    • Clusters are in Terminated state (ready for attendees to start) — if any show Error, investigate before class
    • Click Create compute to verify the form loads (Runtime dropdown shows 15.4 LTS ML, Node type available)
  3. Workspace — Click Workspace icon → navigate to Shared → de-masterclass-2026 → notebooks. Confirm all 4 pipeline notebooks exist: 01_bronze_ingestion.py, 02_silver_cleaning.py, 03_gold_kpis.py, 04_ai_features.py.
  4. Unity Catalog — Click Catalog icon → verify the main catalog mhpdeworkshop_databricks_2026 and attendee schemas are accessible. Confirm GRANT statements from pre-class setup were applied.
  5. SQL Editor — Click SQL Editor icon → click + New query → run a test query: SELECT 1 AS test → confirm results appear.
  6. Secrets (if Module 8) — Verify the workshop-scope scope exists: run databricks secrets list --scope workshop-scope or check via Databricks CLI.

Snowflake Snowsight (trainer’s own account)

These checks run on the trainer’s own Snowflake trial account to verify the UI paths work correctly. Attendees create their own accounts during Module 3.

  1. Login — Open your Snowsight URL. Confirm the left sidebar shows: Projects, Data, Compute, Admin sections.
  2. Warehouse — At the top-right, verify DE_WORKSHOP_WH warehouse exists and is selectable. If suspended, click to resume and confirm it shows Started within ~10 seconds.
  3. Role — Verify DE_WORKSHOP_ROLE is available in the role selector dropdown (created by 00_account_setup.sql).
  4. Worksheets — Navigate to Projects → Worksheets. Create a test worksheet → run SELECT CURRENT_VERSION() → confirm the result appears.
  5. Databases — Navigate to Data → DatabasesDE_MASTERCLASS. Confirm your own schemas (00_trainer_BRONZE, _SILVER, _GOLD) exist.
  6. External Stage — Run LIST @00_trainer_BRONZE.nyc_taxi_trips_stage to verify the SAS token works and Parquet files are listed.

dbt (Docker / Codespaces)

  1. Codespaces — Open a Codespace from the fork → run dbt --version in the terminal → confirm Core 1.8.x with adapters snowflake and databricks.
  2. Docker — Pull the workshop image: docker pull ghcr.io/mhp-data-engineer/workshop-dbt:2026 → run docker run --rm ghcr.io/mhp-data-engineer/workshop-dbt:2026 dbt --version.
  3. Connection test — Inside the environment: cd dbt_project && dbt debug --target snowflake → confirm All checks passed!.

Fallback if MP4 missing: Read animation beat from voiceover scripts while showing module story callout on screen.


Trainer roles

Role Responsibility
Lead Story narration, reflection facilitation, theory, Module 7 discussion
Co-trainer Lab roaming, environment issues, timing nudges, Power BI demo
Both Never leave a stuck pair >5 min without a hint or checkpoint offer

Main-day schedule

Time Module Focus Watch for
09:00 Story Welcome Design worksheet Save whiteboard photo
09:30 1 Fundamentals Medallion + KPIs Keep to 35 min
10:00 2 Databricks Core lab Do not steal lab time for discussion
11:30 3 Snowflake Core lab Same KPIs narrative
12:45 Lunch 45 min
13:30 4 dbt Core lab dbt ≠ warehouse
15:00 5 Production Scheduling / CI LSDP naming
15:45 6 AI Cortex LLM only Not Module 9 ML
16:30 7 Wrap-up Discussion + optional PBI Matrix handout at silent write
17:00 End

Hard stops: Start Module 2 by 10:00 · Start Module 3 by 11:30 · Start Module 4 by 13:30.


Per-module checklist (repeat every module)

Module-specific notes:

Module Trainer note
Story Capture design whiteboard — revisit at 16:30
2 Sofia voice: prototype before SQL simplification
3 “Same architecture. Different implementation philosophy.”
4 Elena: dbt on Snowflake
6 Do not demo ML.FORECAST
7 Theory ≤5 min · open discussion guide

Per-module UI checkpoints (co-trainer verifies before each module)

Module Attendee UI should show Co-trainer check
2 Databricks Cluster Running (green dot) in Compute page; notebooks visible in Workspace Confirm all attendee clusters started; notebooks accessible in Shared folder
3 Snowflake Snowsight open on attendee’s own trial account; 00_account_setup.sql completed; warehouse Started Walk around — confirm each attendee has DE_MASTERCLASS database and DE_WORKSHOP_ROLE created; SAS token distributed and working
4 dbt Terminal open in Codespaces or Docker with dbt_project/ directory; dbt debug --target snowflake passing Walk around — check terminals for green All checks passed!; confirm profiles.yml uses DE_WORKSHOP_ROLE and DE_MASTERCLASS
5 Production Jobs & Pipelines page accessible in Databricks (formerly Workflows → Delta Live Tables; renamed to Lakeflow Declarative Pipelines); Snowflake worksheets with Task SQL ready Pre-create one Lakeflow pipeline as demo; verify TASK_HISTORY() returns data
6 AI Features Genie icon visible in Databricks sidebar (under SQL section); Snowflake worksheets ready for Cortex SQL Confirm AI_COMPLETE returns results (run test query); Genie page loads
7 Wrap-up No portal needed — whiteboard and discussion only Print architecture decision matrix handouts
8 Streaming Databricks cluster with Kafka Maven libs installed; Snowflake warehouse running Verify Kafka libs on cluster (Compute → Libraries tab); Aiven topic has events flowing
9 ML Databricks AI/ML → Experiments page accessible; Snowflake worksheets ready for ML.FORECAST Confirm USE AI FUNCTIONS privilege + CORTEX_USER role granted; ML Runtime cluster available

Module 7 runbook (30 min block)

Min Activity Doc
0–3 Animation mod-07-wrapup.mp4 voiceovers
3–5 Silent write + decision matrix matrix
5–10 Short theory: Objectives, PBI demo notes, When to Use What Module 7
10–28 Open discussion (Rounds 1–4) discussion guide
28–30 Close: three constraints + “Technology is a decision…”
+10 Optional Power BI live demo powerbi/README.md

If running PBI demo before discussion, cut Round 3 synthesis to 5 min.


Common classroom fixes

Situation Response
Pair stuck on Bronze ingest Point to checkpoint data / co-trainer pairs in
“dbt replaces Snowflake” Draw platform box; dbt inside as transform layer
Reflection runs long 5-min timer; capture 3 bullets max on whiteboard
Running late in Module 2–4 Cut discussion to 5 min — never cut lab
Vendor debate in Module 7 “We’re advising Marcus, not picking a winner for MHP.”
Attendee cannot find notebooks in Databricks Guide to Workspace icon → Shared → de-masterclass-2026 → notebooks; or use search bar at top of sidebar
Attendee’s Databricks cluster won’t start Check Compute page for error message; try Restart; if stuck >5 min, assign a buddy cluster
Snowflake trial signup fails Suggest using a different email; check spam folder for verification email; trial creation can take 5–10 min
Snowflake 00_account_setup.sql fails Check attendee is using ACCOUNTADMIN role (default for trial); confirm SET attendee_id = '...' was run first
Snowflake DE_WORKSHOP_ROLE not found The role is created by 00_account_setup.sql — re-run the script; or manually: CREATE ROLE DE_WORKSHOP_ROLE;
Snowflake External Stage cannot list files SAS token may be expired or have extra spaces; re-copy from trainer handout; verify stage URL matches mhpdeworkshopsa.blob.core.windows.net
Snowflake warehouse shows “Suspended” Click warehouse name at top-right → click Resume; wait ~10 seconds for Started status
Snowflake worksheet shows “No results” Check session variable: SELECT $attendee_id; — if null, re-run the SET statement at the top
dbt debug fails with connection error Check profiles.yml — confirm database: DE_MASTERCLASS, role: DE_WORKSHOP_ROLE, and correct Snowflake account/user/password
Databricks Experiments page is empty The experiment appears after the first mlflow.start_run() call — run the training notebook first
Databricks “Cannot see catalog” error Unity Catalog GRANT was not applied — re-run the GRANT statements from Pre-class setup; or grant from Catalog → Permissions UI
Databricks CLI 403 Forbidden PAT may be expired or lack admin scope — generate a new token: Settings → User Settings → Access Tokens; ensure workspace has admin consent for CLI apps
Module 8: workshop-scope secrets scope missing Trainer must create scope before class (see Pre-class setup); or guide attendee: databricks secrets create-scope --scope workshop-scope
ML.FORECAST returns error in Snowflake Verify GOLD_TRIPS_BY_HOUR table exists and has PICKUP_HOUR_TS timestamp column; check Cortex role
Power BI cannot connect to Snowflake Verify warehouse is Started; check server URL matches <account>.snowflakecomputing.com; use DirectQuery mode

End-of-day close (2 min script)

One dataset. Three implementations. Priya’s dashboard didn’t care which engine built Gold.

Three constraints — cost, performance, compliance. Three decisions — platform, transform, consumption.

Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.

Technology is a decision. Architecture is responsibility.

Optional: 1–5 finger poll — “I could defend my tool choice to a client.”


After class

Task Owner
Save Story + Module 7 whiteboard photos Lead
Note timing overruns for next delivery Both
Log environment issues (catalog, warehouse, dbt target) Co-trainer
Share tool comparison deep dive link for self-study Lead

Full dry-run checklist: docs/dry-run-checklist.md · pre-class-checklist.qmd


Document history

Date Change
2026-06-06 Reordered pre-class setup: moved ADLS2 Storage before Databricks Workspace (Databricks depends on ADLS2 Storage Account Key)
2026-06-05 Updated per-module checklist for five-step rhythm; fixed aiven-kafka → workshop-scope secret scope
2026-06-04 Added pre-class infrastructure setup section (Databricks workspace, ADLS2, Aiven, Power BI); added Unity Catalog GRANT statements, cluster creation guide, Databricks CLI auth, SQL Warehouse setup
2026-05-24 Initial day-of facilitator runbook