Facilitator Guide — Day-of Runbook
Facilitator-only — not shown to trainees during labs
YellowLine NYC masterclass · MHP Data Engineer Masterclass 2026
Audience: Lead trainer + co-trainer
Related: pre-class-checklist.qmd · module-delivery-pattern.qmd · whiteboard-prompts.qmd · reflection-prompts.qmd
Pre-class infrastructure setup (1–2 weeks before)
These tasks are done once by the trainer before the workshop day. They cannot be done on the morning of the class — plan ahead.
ADLS2 Storage
Set this up first — Databricks
00_setup.pyrequires the Storage Account Key to read Parquet data from ADLS2. The key must be available before any Databricks notebook can run.
- Storage account:
mhpdeworkshopsa(shared across all attendees) - Generate a fresh Storage Account Key — Azure Portal → Storage account → Access keys → regenerate key1 before each workshop
- Generate a SAS Token for Snowflake External Stage:
- Azure Portal → Storage account → Shared access signature
- Permissions: Read + List (minimum)
- Allowed resource types: Container + Object
- Expiry: set to workshop date + 2 days buffer
- Copy the SAS token string (starts with
?sv=...)
- Verify Parquet files are accessible at
nyc-taxi-data/container
Databricks Workspace
Create the workspace (if not already provisioned) in Azure — region
West Europe, Premium tier recommended for Unity Catalog.Invite attendees — Admin Console → Users → Add user for each attendee email. Assign standard access level. Each attendee receives an activation email — remind them to accept before class.
Create clusters — one per attendee:
Setting Value Cluster name de-workshop-{ATTENDEE_ID}(e.g.,de-workshop-01_alice)Policy (none — or create a Workshoppolicy to limit cost)Runtime 15.4 LTS ML Node type Standard_DS3_v2(4 cores, 14 GB)Workers 1 Auto-terminate 30 minutes Photon Disabled (not needed for workshop volume) Create clusters in Terminated state — attendees start them during Module 2.
Cluster Policy (recommended): Create a policy named
Workshopthat limits node types toStandard_DS3_v2and max workers to 1. Assign this policy to all attendee clusters to prevent accidental cost overruns.Unity Catalog permissions — run these GRANT statements in SQL Editor (or a setup notebook) for each attendee after their schemas are created by
00_setup.py:-- Grant catalog access (run once per attendee) GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`; GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`; GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_bronze TO `attendee@example.com`; GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_silver TO `attendee@example.com`; GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `attendee@example.com`;Alternatively, if all attendees share a group (
workshop-2026), grant to the group:GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop-2026`; GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop-2026`;Share notebooks — place the 4 pipeline notebooks in a shared folder:
- Workspace → Shared → de-masterclass-2026 → notebooks/
01_bronze_ingestion.py,02_silver_cleaning.py,03_gold_kpis.py,04_ai_features.py- Set folder permission to CAN RUN for all attendees (read-only + run, no edit)
Databricks SQL Warehouse (only needed if dbt uses Databricks target):
- SQL Editor → SQL Warehouses → Create SQL Warehouse
- Name:
de-workshop-wh, Size: 2X-Small, Auto-suspend: 5 minutes - If dbt only targets Snowflake, this step can be skipped
Databricks CLI authentication (needed for Module 8 Aiven Secrets):
The trainer needs to create the
workshop-scopesecrets scope before class or guide attendees through it during Module 8. Two options:Option A — Trainer creates scope via CLI (recommended):
# Authenticate CLI to workspace databricks configure --token # Prompt: Databricks Host: https://<workspace>.azuredatabricks.net # Prompt: Token: <generate PAT from Settings → User Settings → Access Tokens> # Create scope (once, accessible to all attendees) databricks secrets create-scope --scope workshop-scope # Add secrets (values from Aiven Console) databricks secrets put --scope workshop-scope --key aiven-bootstrap-servers databricks secrets put --scope workshop-scope --key aiven-ca-cert databricks secrets put --scope workshop-scope --key aiven-client-cert databricks secrets put --scope workshop-scope --key aiven-client-key databricks secrets put --scope workshop-scope --key aiven-topicOption B — Attendees create their own scope during Module 8 (requires each attendee to have a PAT and Databricks CLI installed).
Aiven Kafka (Module 8 only)
- Create Aiven Kafka cluster — see aiven-streaming-setup.qmd
- Start User Activity generator (4 hours max on free tier — start on the morning of Module 8, not before)
- Download SSL certificates:
ca.pem,service.cert,service.key - Note the Service URI:
kafka-xxxxx.aivencloud.com:12345 - Start the relay consumer (
streaming/snowflake/00_relay_consumer.py) — requires ADLS2 key
Power BI (Module 7 demo)
- Open
powerbi/YellowLine_NYC_Dashboard.pbixand verify it connects to Snowflake Gold tables - Test DirectQuery refresh
- Have
.pbixfile ready on the demo machine
Before the room opens (T-30 min)
| Task | Owner | Done |
|---|---|---|
| Test projector / second screen for animations | Lead | [ ] |
Open story site: quarto preview workshop-2026-v2/ port 4201 |
Co-trainer | [ ] |
| Verify Databricks workspace + catalog — clusters Terminated, notebooks in Shared folder (see Pre-class setup) | Co-trainer | [ ] |
| Verify Snowflake on trainer’s own trial account (students create theirs during class) | Co-trainer | [ ] |
| Prepare credentials to distribute: SAS token, ADLS2 storage key, attendee IDs | Co-trainer | [ ] |
| Print architecture decision matrix (1 per trainee) | Co-trainer | [ ] |
| Open Google Form URLs — QR / short links ready per module | Co-trainer | [ ] |
| Photo / save blank whiteboard space for Story sketch | Lead | [ ] |
Power BI .pbix open (Module 7 demo) |
Co-trainer | [ ] |
.env / Codespaces tested on one machine |
Co-trainer | [ ] |
Credentials & Materials to Distribute
Each attendee needs the following credentials and materials during the workshop. Prepare these before class and distribute at the appropriate module.
| Item | When to distribute | Format | Notes |
|---|---|---|---|
| ATTENDEE_ID | Start of day (Module 1) | Printed card or slide | e.g., 01_alice, 02_bob — used in every schema/table name |
| Databricks workspace URL | Module 2 | Invite link via email | Trainer sends workspace invite to each attendee’s email before class |
| ADLS2 Storage Account Key | Module 2 | Verbal or printed | Used by Databricks 00_setup.py to read Parquet from ADLS2. Never commit to Git. |
| SAS Token | Module 3 | Printed card or slide | Used by Snowflake 02_external_stage.sql to create External Stage. Has an expiry date — generate fresh before each workshop. |
| Databricks Personal Access Token | Only if using dbt with Databricks target | Self-service | Attendee generates their own via Settings → Access Tokens. Not needed if dbt only targets Snowflake. |
| Snowflake account | Self-service (before or during Module 3) | Attendee creates own trial at signup.snowflake.com | Attendee is ACCOUNTADMIN on their own account; 00_account_setup.sql creates DE_WORKSHOP_ROLE |
Unlike Databricks (trainer-managed workspace), each attendee creates their own Snowflake trial account. This means: - Attendees are ACCOUNTADMIN on their own accounts - They run 00_account_setup.sql themselves during Module 3 — this creates the database (DE_MASTERCLASS), warehouse (DE_WORKSHOP_WH), role (DE_WORKSHOP_ROLE), and personal schemas - The trainer cannot pre-verify attendee Snowflake accounts — only the trainer’s own account can be verified beforehand - dbt connects to Snowflake using the attendee’s own credentials (username/password + DE_WORKSHOP_ROLE)
Databricks Workspace
- Login — Open the workspace URL (e.g.,
https://<workspace>.azuredatabricks.net). Confirm you land on the landing page with the left sidebar visible (Workspace, Catalog, Compute, SQL Editor, AI/ML, Genie, Jobs & Pipelines icons). - Compute — Click Compute in the left sidebar. Verify:
- At least one cluster per attendee exists (named
de-workshop-{attendee_id}— see Pre-class cluster creation) - Clusters are in Terminated state (ready for attendees to start) — if any show Error, investigate before class
- Click Create compute to verify the form loads (Runtime dropdown shows 15.4 LTS ML, Node type available)
- At least one cluster per attendee exists (named
- Workspace — Click Workspace icon → navigate to Shared → de-masterclass-2026 → notebooks. Confirm all 4 pipeline notebooks exist:
01_bronze_ingestion.py,02_silver_cleaning.py,03_gold_kpis.py,04_ai_features.py. - Unity Catalog — Click Catalog icon → verify the main catalog
mhpdeworkshop_databricks_2026and attendee schemas are accessible. Confirm GRANT statements from pre-class setup were applied. - SQL Editor — Click SQL Editor icon → click + New query → run a test query:
SELECT 1 AS test→ confirm results appear. - Secrets (if Module 8) — Verify the
workshop-scopescope exists: rundatabricks secrets list --scope workshop-scopeor check via Databricks CLI.
Snowflake Snowsight (trainer’s own account)
These checks run on the trainer’s own Snowflake trial account to verify the UI paths work correctly. Attendees create their own accounts during Module 3.
- Login — Open your Snowsight URL. Confirm the left sidebar shows: Projects, Data, Compute, Admin sections.
- Warehouse — At the top-right, verify
DE_WORKSHOP_WHwarehouse exists and is selectable. If suspended, click to resume and confirm it shows Started within ~10 seconds. - Role — Verify
DE_WORKSHOP_ROLEis available in the role selector dropdown (created by00_account_setup.sql). - Worksheets — Navigate to Projects → Worksheets. Create a test worksheet → run
SELECT CURRENT_VERSION()→ confirm the result appears. - Databases — Navigate to Data → Databases →
DE_MASTERCLASS. Confirm your own schemas (00_trainer_BRONZE,_SILVER,_GOLD) exist. - External Stage — Run
LIST @00_trainer_BRONZE.nyc_taxi_trips_stageto verify the SAS token works and Parquet files are listed.
dbt (Docker / Codespaces)
- Codespaces — Open a Codespace from the fork → run
dbt --versionin the terminal → confirmCore 1.8.xwith adapterssnowflakeanddatabricks. - Docker — Pull the workshop image:
docker pull ghcr.io/mhp-data-engineer/workshop-dbt:2026→ rundocker run --rm ghcr.io/mhp-data-engineer/workshop-dbt:2026 dbt --version. - Connection test — Inside the environment:
cd dbt_project && dbt debug --target snowflake→ confirmAll checks passed!.
Fallback if MP4 missing: Read animation beat from voiceover scripts while showing module story callout on screen.
Trainer roles
| Role | Responsibility |
|---|---|
| Lead | Story narration, reflection facilitation, theory, Module 7 discussion |
| Co-trainer | Lab roaming, environment issues, timing nudges, Power BI demo |
| Both | Never leave a stuck pair >5 min without a hint or checkpoint offer |
Main-day schedule
| Time | Module | Focus | Watch for |
|---|---|---|---|
| 09:00 | Story Welcome | Design worksheet | Save whiteboard photo |
| 09:30 | 1 Fundamentals | Medallion + KPIs | Keep to 35 min |
| 10:00 | 2 Databricks | Core lab | Do not steal lab time for discussion |
| 11:30 | 3 Snowflake | Core lab | Same KPIs narrative |
| 12:45 | Lunch | 45 min | |
| 13:30 | 4 dbt | Core lab | dbt ≠ warehouse |
| 15:00 | 5 Production | Scheduling / CI | LSDP naming |
| 15:45 | 6 AI | Cortex LLM only | Not Module 9 ML |
| 16:30 | 7 Wrap-up | Discussion + optional PBI | Matrix handout at silent write |
| 17:00 | End |
Hard stops: Start Module 2 by 10:00 · Start Module 3 by 11:30 · Start Module 4 by 13:30.
Per-module checklist (repeat every module)
Module-specific notes:
| Module | Trainer note |
|---|---|
| Story | Capture design whiteboard — revisit at 16:30 |
| 2 | Sofia voice: prototype before SQL simplification |
| 3 | “Same architecture. Different implementation philosophy.” |
| 4 | Elena: dbt on Snowflake |
| 6 | Do not demo ML.FORECAST |
| 7 | Theory ≤5 min · open discussion guide |
Per-module UI checkpoints (co-trainer verifies before each module)
| Module | Attendee UI should show | Co-trainer check |
|---|---|---|
| 2 Databricks | Cluster Running (green dot) in Compute page; notebooks visible in Workspace | Confirm all attendee clusters started; notebooks accessible in Shared folder |
| 3 Snowflake | Snowsight open on attendee’s own trial account; 00_account_setup.sql completed; warehouse Started |
Walk around — confirm each attendee has DE_MASTERCLASS database and DE_WORKSHOP_ROLE created; SAS token distributed and working |
| 4 dbt | Terminal open in Codespaces or Docker with dbt_project/ directory; dbt debug --target snowflake passing |
Walk around — check terminals for green All checks passed!; confirm profiles.yml uses DE_WORKSHOP_ROLE and DE_MASTERCLASS |
| 5 Production | Jobs & Pipelines page accessible in Databricks (formerly Workflows → Delta Live Tables; renamed to Lakeflow Declarative Pipelines); Snowflake worksheets with Task SQL ready | Pre-create one Lakeflow pipeline as demo; verify TASK_HISTORY() returns data |
| 6 AI Features | Genie icon visible in Databricks sidebar (under SQL section); Snowflake worksheets ready for Cortex SQL | Confirm AI_COMPLETE returns results (run test query); Genie page loads |
| 7 Wrap-up | No portal needed — whiteboard and discussion only | Print architecture decision matrix handouts |
| 8 Streaming | Databricks cluster with Kafka Maven libs installed; Snowflake warehouse running | Verify Kafka libs on cluster (Compute → Libraries tab); Aiven topic has events flowing |
| 9 ML | Databricks AI/ML → Experiments page accessible; Snowflake worksheets ready for ML.FORECAST |
Confirm USE AI FUNCTIONS privilege + CORTEX_USER role granted; ML Runtime cluster available |
Module 7 runbook (30 min block)
| Min | Activity | Doc |
|---|---|---|
| 0–3 | Animation mod-07-wrapup.mp4 |
voiceovers |
| 3–5 | Silent write + decision matrix | matrix |
| 5–10 | Short theory: Objectives, PBI demo notes, When to Use What | Module 7 |
| 10–28 | Open discussion (Rounds 1–4) | discussion guide |
| 28–30 | Close: three constraints + “Technology is a decision…” | |
| +10 | Optional Power BI live demo | powerbi/README.md |
If running PBI demo before discussion, cut Round 3 synthesis to 5 min.
Common classroom fixes
| Situation | Response |
|---|---|
| Pair stuck on Bronze ingest | Point to checkpoint data / co-trainer pairs in |
| “dbt replaces Snowflake” | Draw platform box; dbt inside as transform layer |
| Reflection runs long | 5-min timer; capture 3 bullets max on whiteboard |
| Running late in Module 2–4 | Cut discussion to 5 min — never cut lab |
| Vendor debate in Module 7 | “We’re advising Marcus, not picking a winner for MHP.” |
| Attendee cannot find notebooks in Databricks | Guide to Workspace icon → Shared → de-masterclass-2026 → notebooks; or use search bar at top of sidebar |
| Attendee’s Databricks cluster won’t start | Check Compute page for error message; try Restart; if stuck >5 min, assign a buddy cluster |
| Snowflake trial signup fails | Suggest using a different email; check spam folder for verification email; trial creation can take 5–10 min |
Snowflake 00_account_setup.sql fails |
Check attendee is using ACCOUNTADMIN role (default for trial); confirm SET attendee_id = '...' was run first |
Snowflake DE_WORKSHOP_ROLE not found |
The role is created by 00_account_setup.sql — re-run the script; or manually: CREATE ROLE DE_WORKSHOP_ROLE; |
| Snowflake External Stage cannot list files | SAS token may be expired or have extra spaces; re-copy from trainer handout; verify stage URL matches mhpdeworkshopsa.blob.core.windows.net |
| Snowflake warehouse shows “Suspended” | Click warehouse name at top-right → click Resume; wait ~10 seconds for Started status |
| Snowflake worksheet shows “No results” | Check session variable: SELECT $attendee_id; — if null, re-run the SET statement at the top |
dbt debug fails with connection error |
Check profiles.yml — confirm database: DE_MASTERCLASS, role: DE_WORKSHOP_ROLE, and correct Snowflake account/user/password |
| Databricks Experiments page is empty | The experiment appears after the first mlflow.start_run() call — run the training notebook first |
| Databricks “Cannot see catalog” error | Unity Catalog GRANT was not applied — re-run the GRANT statements from Pre-class setup; or grant from Catalog → Permissions UI |
Databricks CLI 403 Forbidden |
PAT may be expired or lack admin scope — generate a new token: Settings → User Settings → Access Tokens; ensure workspace has admin consent for CLI apps |
Module 8: workshop-scope secrets scope missing |
Trainer must create scope before class (see Pre-class setup); or guide attendee: databricks secrets create-scope --scope workshop-scope |
ML.FORECAST returns error in Snowflake |
Verify GOLD_TRIPS_BY_HOUR table exists and has PICKUP_HOUR_TS timestamp column; check Cortex role |
| Power BI cannot connect to Snowflake | Verify warehouse is Started; check server URL matches <account>.snowflakecomputing.com; use DirectQuery mode |
End-of-day close (2 min script)
One dataset. Three implementations. Priya’s dashboard didn’t care which engine built Gold.
Three constraints — cost, performance, compliance. Three decisions — platform, transform, consumption.
Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.
Technology is a decision. Architecture is responsibility.
Optional: 1–5 finger poll — “I could defend my tool choice to a client.”
After class
| Task | Owner |
|---|---|
| Save Story + Module 7 whiteboard photos | Lead |
| Note timing overruns for next delivery | Both |
| Log environment issues (catalog, warehouse, dbt target) | Co-trainer |
| Share tool comparison deep dive link for self-study | Lead |
Full dry-run checklist: docs/dry-run-checklist.md · pre-class-checklist.qmd
Document history
| Date | Change |
|---|---|
| 2026-06-06 | Reordered pre-class setup: moved ADLS2 Storage before Databricks Workspace (Databricks depends on ADLS2 Storage Account Key) |
| 2026-06-05 | Updated per-module checklist for five-step rhythm; fixed aiven-kafka → workshop-scope secret scope |
| 2026-06-04 | Added pre-class infrastructure setup section (Databricks workspace, ADLS2, Aiven, Power BI); added Unity Catalog GRANT statements, cluster creation guide, Databricks CLI auth, SQL Warehouse setup |
| 2026-05-24 | Initial day-of facilitator runbook |