Facilitator Guide — Day-of Runbook

Facilitator-only — not shown to trainees during labs

YellowLine NYC masterclass · MHP Data Engineer Masterclass 2026

Audience: Lead trainer + co-trainer
Related: pre-class-checklist.qmd · module-delivery-pattern.qmd · whiteboard-prompts.qmd · reflection-prompts.qmd

Pre-class infrastructure setup (1–2 weeks before)

These tasks are done once by the trainer before the workshop day. They cannot be done on the morning of the class — plan ahead.

Shared Azure resource group

ADLS2 (mhpdeworkshopsa) and Databricks (mhpdeworkshop_databricks) are both pre-provisioned under the same resource group:

1000_data_engineering_workshop · subscription MHP Resort Consulting Services · tenant mhpdev.onmicrosoft.com

Databricks Workspace

Pre-provisioned by MHP — do not create a new workspace

The workshop Databricks workspace already exists in resource group 1000_data_engineering_workshop. Your job is to configure users, Git folders, Unity Catalog grants, and clusters — not to provision a new workspace.

Item	Value
Workspace name	`mhpdeworkshop_databricks`
Workspace ID	`3359135813781456`
Resource group	`1000_data_engineering_workshop`
Subscription	MHP Resort Consulting Services (`ba826c91-8e52-4e07-ac7c-538858bbc813`)
Azure tenant	`mhpdev.onmicrosoft.com`
Your role	Workspace admin (provisioned by MHP IT)
Unity Catalog	`mhpdeworkshop_databricks_2026` — confirm in Catalog explorer (name must match `CATALOG_NAME` in `databricks/notebooks/00_setup.py`)
Source repo	github.com/jinjuewei/MHPDataEngineerWorkshop
Notebook path in repo	`databricks/notebooks/` (`00_setup.py` … `04_ai_features.py`)

Find it in Azure Portal

Open resource group 1000_data_engineering_workshop.
Under Resources, click Azure Databricks / mhpdeworkshop_databricks.
Click Launch workspace (or copy the workspace URL from Overview).
Workspace URL looks like https://adb-<id>.<random>.azuredatabricks.net — also under Settings → Workspace settings in the Databricks UI.

Official references: Add users (Azure Databricks), Create Git folders, Unity Catalog get started.

This workshop has two trainers (lead + co-trainer). Each trainer creates their own Git folder in their own Home, runs 00_setup.py with a trainer-specific ATTENDEE_ID, then shares that folder to all trainees (Can Run). Trainees open the shared folder to run notebooks; they still use their own {attendee_id} in 00_setup.py (clone that notebook to Home first — see Databricks setup).

Trainer and trainee ATTENDEE_ID naming

Role	Pattern	Examples
Trainers	`00_{firstname}` (lowercase)	2026 delivery: `00_juewei`, `00_alisa` · other cohorts: `00_sam`, `00_taylor`
Trainees	`01_{name}`, `02_{name}`, … (lowercase)	`01_alice`, `02_bob`

Databricks / Unity Catalog — schemas are lowercase: 00_juewei_bronze, 00_juewei_silver, 00_juewei_gold
Snowflake — same ID stem, schemas uppercase: 00_JUEWEI_BRONZE, 00_JUEWEI_SILVER, 00_JUEWEI_GOLD (set in each trainer’s 00_account_setup.sql)
Agree both trainer IDs in pre-class; use the same IDs for Databricks demos, Snowflake dry-runs, and Power BI trainer Gold

Step 1 — Confirm workspace admin access

Open the workspace URL and sign in with your MHP account.
Click your username (top bar) → Settings.
Confirm you can open Admin settings (workspace admin) or Identity and access without errors.
Open Catalog → verify catalog mhpdeworkshop_databricks_2026 exists and you can browse it.

2026 catalog vs 2025

The workspace name stays mhpdeworkshop_databricks (shared Azure resource). Last year’s Unity Catalog is mhpdeworkshop_databricks_2025. For this cohort, create a new catalog mhpdeworkshop_databricks_2026 if it does not exist yet (Catalog → Create catalog), then apply the GRANTs in Step 5. Notebooks use CATALOG_NAME = "mhpdeworkshop_databricks_2026" in 00_setup.py — do not point trainees at the 2025 catalog.

Step 2 — Link Git credentials (trainer + attendees)

Required before creating a Git folder. See Set up Git credentials.

Username → Settings → Linked accounts → Git integration → Add credential.
Provider: GitHub.
Use a Personal Access Token (classic) with at least repo scope, or GitHub fine-grained access to the fork.
Tell attendees to complete the same step before Module 2 (documented in Databricks setup guide).

Step 3 — Each trainer creates a Git folder (lead + co-trainer)

Both trainers repeat this in their own Databricks Home (same steps, different ATTENDEE_ID). Use a Git folder.

Trainer	`ATTENDEE_ID` in `00_setup.py`	Schemas created (lowercase)
Lead trainer	`00_{firstname}` — 2026: `00_juewei`	e.g. `00_juewei_bronze`, `_silver`, `_gold`
Co-trainer	`00_{firstname}` — 2026: `00_alisa`	e.g. `00_alisa_bronze`, `_silver`, `_gold`

Replace {firstname} with each trainer’s assigned ID (naming convention above). Other deliveries pick any two 00_* IDs — keep them unique and agreed before class.

Workspace → Home (/Users/<your-email>/).
⋮ → Create → Git folder.

Fill in:

Field	Value
Git repository URL	`https://github.com/jinjuewei/MHPDataEngineerWorkshop.git`
Git provider	GitHub
Git folder name	`MHP-DE-Workshop-2026`
Sparse checkout mode	✅ Enable
Cone patterns	`databricks/notebooks`

Optional before Modules 8–9: add cone patterns streaming/databricks and ml/databricks.

Create Git folder → wait for clone.
Open databricks/notebooks/00_setup.py → set ATTENDEE_ID to your trainer ID (e.g. 00_juewei or 00_alisa) → attach/start cluster → Run all.
Catalog → confirm your three schemas exist under mhpdeworkshop_databricks_2026.
Confirm 01_bronze_ingestion.py … 04_ai_features.py are listed under the Git folder.

Sparse checkout must be enabled at creation; you cannot disable sparse mode afterward. Cone patterns can be edited later: Git folder → Settings → Advanced → Cone patterns (Configure sparse checkout).

Step 3b — Sync Git folder with GitHub (Pull — not automatic)

A Git folder is a workspace checkout of the remote repo. Changes on GitHub do not appear in the workspace until someone Pulls.

flowchart LR
    GH["GitHub\njinjuewei/MHPDataEngineerWorkshop"]
    T1["Lead trainer Git folder"]
    T2["Co-trainer Git folder"]
    ST["Trainees\nshared Can Run"]
    GH -->|"git push (CI / developer)"| GH
    T1 -->|"Git → Pull (manual)"| GH
    T2 -->|"Git → Pull (manual)"| GH
    T1 --> ST
    T2 --> ST
    GH -.->|"No auto sync"| T1

Question	Answer
Does GitHub auto-update the workspace?	No — click Pull in the Git dialog, or automate via Repos API / CI/CD (Pull changes)
Who Pulls for the shared-folder model?	Both trainers only — trainees with Can Run cannot run Git operations (permissions)
When to Pull?	After any push to `main`; morning of class; after notebook/doc fixes land in GitHub
Trainee with own Git fork?	Trainee creates their own Git folder and Pulls there (collaborate in Git folders)

Pull procedure (each trainer) — UI steps per Access the Git dialog and Pull changes:

Option A — from Workspace (recommended before class)

Left sidebar → Workspace.
Expand Users → your email → MHP-DE-Workshop-2026.
Beside the folder name, click Git (Git icon / Git link). A full-screen Git operations dialog opens.
At the top, confirm the branch dropdown shows main (or your workshop branch). If not, select main before pulling.
Click Pull (in the dialog toolbar — sync/download from remote).
Wait for the dialog to finish. Files under databricks/notebooks/ update to match GitHub.
Close the Git dialog (click outside or X).
Verify: open databricks/notebooks/00_setup.py → check PEP 723 header / recent edits match GitHub main.

Option B — from an open notebook

Open any notebook in the Git folder (e.g. 01_bronze_ingestion.py).
At the top of the notebook, next to the notebook title, click the branch name button (shows current branch, e.g. main).
The same Git operations dialog opens → click Pull → confirm branch main.
Close the dialog and re-open the notebook if cells look stale (Pull can clear notebook session state).

UI element	Where to find it
Git button	Workspace tree: beside `MHP-DE-Workshop-2026` folder name
Branch button	Top bar inside a notebook opened from the Git folder
Pull	Git operations dialog toolbar (downloads from remote — no commit message needed)
Commit & Push	Same dialog — only use if you intentionally changed files in the Git folder

If Pull fails or is disabled

Symptom	Action
Merge conflict after Pull	Git dialog offers Keep all current / Take all incoming or manual edit — see Resolve merge conflicts. For workshop notebooks, prefer incoming unless you have local edits to keep.
Pull grayed out / Git ops disabled	Workspace may need serverless compute for Git UI (Git CLI folders) — Git CLI compute requirements. Fallback: Repos API or ask workspace admin.
Uncommitted local changes in folder	Commit or discard before Pull; sparse-checkout folders block pattern changes while files have uncommitted edits.

Official notes from Databricks:

Pull is manual — “click Pull in the Git operations dialog” (source).
Pull clears notebook state — warn attendees if they have unsaved notebook session state before you Pull mid-class.
One Git operator per folder — Databricks recommends only one user performs Git ops per folder; trainees use Can Run on the shared copy (collaborate).
Git UI + serverless — if Git → Pull is disabled, the workspace may need serverless compute (required for Git CLI-enabled folders) — see Git CLI compute requirements.

Never Push secrets from the workspace — STORAGE_ACCOUNT_KEY belongs only in each attendee’s Home clone of 00_setup.py, not in the shared Git folder commit. Source .py notebook outputs are not committed by default (commit and push).

Step 4 — Create workshop groups and invite attendees

Create two groups for the 2026 cohort:

Group	Members	Workspace access level
`workshop_trainer_2026`	Lead + co-trainer	User (trainers are also workspace admins — see Step 1)
`workshop_trainees_2026`	All attendees	User (not Admin)

Settings → Admin settings → Identity and access → Groups → Add group — create both groups.
Add trainer emails to workshop_trainer_2026; add each attendee to workshop_trainees_2026 (or bulk-import).
Users → Add user for anyone not yet in the workspace. Each invitee receives an activation email — confirm all accepted before class.

Entitlements — open each group → Entitlements tab:

Group	Enable
`workshop_trainees_2026`	Workspace access · Databricks SQL (Module 6 Genie + SQL Editor)
`workshop_trainer_2026`	Workspace access · Databricks SQL (trainer dry-runs)

Trainees do not need Allow unrestricted cluster creation — the Workshop cluster policy (Step 6) grants Can use, which is enough to self-create clusters within policy limits.

See Manage users · Manage entitlements.

Step 5 — Unity Catalog permissions (trainees)

Run in SQL Editor (or from 00_setup.py as admin). Grants must exist before attendees run 00_setup.py (CREATE SCHEMA).

Option A — grant to trainee group (recommended):

-- Catalog-level (once, before class) — trainees only
GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop_trainees_2026`;
GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop_trainees_2026`;

Option B — grant per attendee (if no group):

GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;
GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;

After each attendee runs 00_setup.py, grant schema access (schemas are lowercase):

GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_bronze TO `workshop_trainees_2026`;
GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_silver TO `workshop_trainees_2026`;
GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `workshop_trainees_2026`;
GRANT SELECT ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `workshop_trainees_2026`;

Replace {attendee_id} with e.g. 01_alice. The SELECT on _gold is required before Module 6 (ai_query(), Genie). Alternatively use Catalog → catalog → Permissions UI.

Trainers (workshop_trainer_2026) use workspace-admin privileges for demos — no separate UC group grants needed unless you prefer explicit grants.

Step 6 — Cluster policy (recommended)

Limit cost and standardise specs. See Cluster policies.

Compute → Policies → Create policy.
Name: Workshop.
Restrict: max workers 1, allowed node types include Standard_DS3_v2, auto-termination 30 min.
Assign policy Can use to workshop_trainees_2026 and workshop_trainer_2026.

Step 7 — Compute for attendees

Option A — attendees create their own cluster (less prep, more day-of support):

Setting	Value
Policy	`Workshop`
Runtime	15.4 LTS (Modules 2–6) · 15.4 LTS ML for optional Module 9
Node type	`Standard_DS3_v2`
Workers	1
Auto-terminate	30 minutes

Trainee self-create cluster (Option A — brief for class)

Regular users in workshop_trainees_2026 can create a cluster when the Workshop policy has Can use (Step 6). Trainees do not need workspace admin.

Compute → Create compute → Cluster.
Policy: select Workshop (required — limits size and cost).
Databricks runtime: 15.4 LTS (or latest LTS).
Node type: Standard_DS3_v2 · Workers: 1 · name e.g. de-workshop-01_alice.
Create → wait until Running (green).
Open a notebook → compute dropdown (top bar) → attach this cluster → Run all on your Home copy of 00_setup.py first.

Pre-class verify: sign in as a test trainee (or co-trainer in workshop_trainees_2026) and confirm Create compute shows policy Workshop and the cluster starts.

Option B — pre-create one cluster per attendee (faster Module 2 start):

Setting	Value
Cluster name	`de-workshop-{ATTENDEE_ID}` (e.g. `de-workshop-01_alice`)
Policy	`Workshop`
Runtime	15.4 LTS (ML for Module 9 optional labs)
Node type	`Standard_DS3_v2`
Workers	1
Auto-terminate	30 minutes

Create in Terminated state. Share each cluster: Compute → cluster → Permissions → add attendee (or workshop_trainees_2026) with Can restart.

Step 8 — Share Git folder with trainees (required — both trainers)

After each trainer finishes Step 3 (Git folder created + 00_setup.py dry-run), share the folder before class:

Workspace → Home → MHP-DE-Workshop-2026.
Click Share (or ⋮ → Share).
Add principals:
- Group workshop_trainees_2026 → Can Run (recommended — all trainees)
- Or add individual trainee emails → Can Run
Click Save.
Verify: co-trainer or a test user opens Workspace → Users → <your-email> → MHP-DE-Workshop-2026 → opens 01_bronze_ingestion.py without error.

Permission	Trainee can	Trainee cannot
Can Run	Open and run notebooks	Edit or delete files in your folder

Trainee path to notebooks:

Workspace → Users → <trainer-email> → MHP-DE-Workshop-2026 → databricks/notebooks

Both trainers share their folders — trainees may use either trainer’s copy (same repo content). Tell the class which trainer to contact for lab support (optional split by row/seat).

00_setup.py and trainee IDs: shared folder is read-only for trainees. They must clone 00_setup.py to their own Home to set their {attendee_id} (e.g. 01_alice), then run it once before 01–04. Steps in Databricks setup.

Fallback: trainee Git folder from their fork (Option A) or manual import (Option B) if sharing is blocked.

Step 9 — What trainees do in Module 2

Brief attendees on:

Task	Where
Accept workspace invite	Email link
Open trainer notebooks	Workspace → Users → <trainer-email> → `MHP-DE-Workshop-2026`
Clone `00_setup.py` to Home	⋮ → Clone to [your Home] — then set your `ATTENDEE_ID` (e.g. `01_alice`)
Run `00_setup.py`	Your Home copy — creates your `{id}_bronze/silver/gold` schemas
Run `01`–`04`	Trainer shared folder (or your clone)
Paste ADLS2 key (verbal from trainer)	Your `00_setup.py` only — never commit
Create/start cluster	Compute → Create compute → policy `Workshop` → Start (see Step 7)

Step 10 — Databricks SQL Warehouse (Module 6 AI + optional dbt target)

Required for Genie and ai_query() in Module 6 — even when dbt targets Snowflake only.

SQL Editor → SQL Warehouses → Create SQL Warehouse
Type: Pro or Serverless (not Classic — ai_query() is unsupported on Classic)
Name: de-workshop-wh, Size: 2X-Small, Auto-suspend: 5 minutes
Permissions → add workshop_trainees_2026 and workshop_trainer_2026 → Can use
Trainer dry-run: SQL Editor → attach de-workshop-wh → SELECT 1

Official reference: Create a SQL warehouse

Step 10b — Databricks AI features (Module 6 — account + workspace admin)

Module 6 uses Genie Code (notebook/SQL AI assistant — formerly Databricks Assistant), ai_query() (SQL), and Genie Spaces (natural language over Gold). Configure once before class.

Account admin (Account console → Settings → Feature enablement):

Setting	Value	Why
Enable partner-powered AI features	On	Powers Genie Code, Genie Spaces, and related assistive features (Azure OpenAI / Anthropic on Databricks)
Enforce data processing within workspace Geography	Review if AI features fail to enable	Workspaces outside US/EU (e.g. Germany West Central) may need cross-geo processing disabled — see Partner-powered AI features

Workspace admin (username → Settings → Workspace admin → Advanced):

Setting	Value
Partner-powered AI features	On (unless account enforces Off)

No separate “AI-powered assistive features” toggle? That is normal in current Azure Databricks UI. Microsoft consolidated admin control under Partner-powered AI features; Genie Code and other assistive features are enabled when partner-powered AI is On (or use Databricks-hosted models when it is Off in supported regions). Do not block Module 6 prep looking for a second toggle — run the functional checks below instead.

##### Verify AI is activated (workspace admin — 5 min)

Check	Pass criteria
Workspace Advanced	Partner-powered AI features = On
Genie Code	Open a notebook on a cluster → `Ctrl+I` / `Cmd+I` (or Genie Code icon) → prompt returns a suggestion
Genie Spaces	Sidebar Genie → New → add a Gold table → question returns SQL or an answer
`ai_query()`	SQL Editor on `de-workshop-wh` → one-row test query succeeds (verify model name)

User entitlements — configured in Step 4 (workshop_trainees_2026: Workspace access + Databricks SQL).

Unity Catalog — GRANT SELECT ON SCHEMA …_gold is in Step 5 (run after each 00_setup.py). Required before Module 6 ai_query() and Genie.

Trainer pre-class dry-run (after Gold tables exist from 03_gold_kpis.py):

Genie Code — open 04_ai_features.py on a cluster → Ctrl+I / Cmd+I → prompt “Show top 5 pickup zones by revenue”
ai_query() — SQL Editor → warehouse de-workshop-wh → run a one-row test (verify model name in workspace first):
```
SELECT ai_query(
  'databricks-meta-llama-3-3-70b-instruct',
  'Reply with exactly: OK'
) AS test;
```
Genie — sidebar Genie → New → add your trainer Gold tables (e.g. 00_juewei_gold.kpi_*) → set default warehouse de-workshop-wh → ask “What hour has the most taxi trips?”

Model IDs drift — list available foundation models in the workspace before class. Do not hardcode names from last year’s delivery.

Official references: AI assistive features · Genie setup · Genie Code · ai_query

Step 11 — Databricks CLI authentication (Module 8 Aiven Secrets)

The trainer needs to create the workshop-scope secrets scope before class or guide attendees through it during Module 8. Two options:

Option A — Trainer creates scope via CLI (recommended): ```bash # Authenticate CLI to workspace databricks configure –token # Prompt: Databricks Host: https://.azuredatabricks.net # Prompt: Token: <generate PAT from Settings → User Settings → Access Tokens>

# Create scope (once) databricks secrets create-scope –scope workshop-scope

# Add secrets (values from Aiven Console) databricks secrets put –scope workshop-scope –key aiven-bootstrap-servers databricks secrets put –scope workshop-scope –key aiven-ca-cert databricks secrets put –scope workshop-scope –key aiven-client-cert databricks secrets put –scope workshop-scope –key aiven-client-key databricks secrets put –scope workshop-scope –key aiven-topic

# Allow trainees to read secrets in notebooks (Module 8) databricks secrets put-acl –scope workshop-scope –principal workshop_trainees_2026 –permission READ databricks secrets put-acl –scope workshop-scope –principal workshop_trainer_2026 –permission READ ```

Option B — Attendees create their own scope during Module 8 (requires each attendee to have a PAT and Databricks CLI installed).

Verify: as a test trainee, databricks secrets list --scope workshop-scope lists five keys (trainees need READ ACL — not scope admin).

ADLS2 Storage (`mhpdeworkshopsa`)

Pre-provisioned by MHP — do not create a new account

The workshop storage account already exists. Your job is to upload TLC data, rotate keys, and create SAS tokens — not to provision Azure storage.

Item	Value
Storage account	`mhpdeworkshopsa`
Resource group	`1000_data_engineering_workshop`
Location	Germany West Central (`germanywestcentral`)
Subscription	MHP Resort Consulting Services
Subscription ID	`ba826c91-8e52-4e07-ac7c-538858bbc813`
Azure tenant	`mhpdev.onmicrosoft.com`
Container	`nyc-taxi-data` (should already exist)

Find it in Azure Portal

Sign in to Azure Portal with your MHP account (mhpdev.onmicrosoft.com).
Open the resource group directly: 1000_data_engineering_workshop.
Under Resources, click storage account mhpdeworkshopsa.
Confirm Location shows Germany West Central on the Overview blade.

Shortcut: search mhpdeworkshopsa in the portal top bar if you are already in the correct subscription.

Same resource group also contains Databricks workspace mhpdeworkshop_databricks — see Databricks Workspace above.

Shared storage for all attendees. Two credentials — do not mix them up:

Credential	Used by	Module	Never commit to Git
Storage account key (key1)	Databricks `00_setup.py`	2	Distribute verbally
SAS token (query string)	Snowflake `02_external_stage.sql`	3	Printed card / slide

Data layout (container nyc-taxi-data):

Path	Content
`raw/trips/`	Parquet trip files (`yellow_tripdata_YYYY-MM.parquet`)
`raw/lookup/taxi_zone_lookup.csv`	Zone lookup CSV (265 zones)
`streaming/user-activity/`	Module 8 relay output (optional)

Official reference: Grant limited access with SAS.

0 — Download from TLC and upload to ADLS2

Workshop pipelines read raw files from ADLS2 — they do not download from the internet at runtime. The trainer (or MHP ops) must download once from NYC TLC and upload to mhpdeworkshopsa before rotating keys or distributing SAS tokens.

Source: NYC TLC Trip Record Data — Yellow Taxi Trip Records, Parquet format.

File	TLC download (direct)	ADLS2 destination
Trip data	`https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-10.parquet` (~61 MB)	`nyc-taxi-data/raw/trips/yellow_tripdata_2024-10.parquet`
Zone lookup	`https://d37ci6vzurychx.cloudfront.net/misc/taxi_zone_lookup.csv` (~12 KB)	`nyc-taxi-data/raw/lookup/taxi_zone_lookup.csv`

Workshop default month: October 2024 (dbt_project/dbt_project.yml sets data_year: 2024, data_month: 10). One month (~3M trips) is enough for all labs. Optional: add yellow_tripdata_2024-09.parquet and yellow_tripdata_2024-11.parquet for richer time-series KPIs — keep original TLC filenames.

Step 1 — Download locally

Open the TLC trip record page.
Under Yellow Taxi Trip Records, choose Parquet (not CSV) for the trip file.
Download October 2024 Parquet (link above or TLC table row yellow_tripdata_2024-10.parquet).
Download Taxi Zone Lookup Table CSV (taxi_zone_lookup.csv — link above or TLC Auxiliary data section).
Confirm locally: trip file is .parquet; lookup is .csv with header LocationID,Borough,Zone,service_zone.

Step 2 — Upload to ADLS2

Use any method below. Create folders raw/trips/ and raw/lookup/ if they do not exist.

Option A — Azure Portal (no extra tools)

Azure Portal → Storage accounts → mhpdeworkshopsa.
Data storage → Containers → nyc-taxi-data.
Open or create raw/trips/ → Upload → select yellow_tripdata_2024-10.parquet.
Open or create raw/lookup/ → Upload → select taxi_zone_lookup.csv (exact filename — Snowflake stage and Databricks LOOKUP_DATA_PATH expect this name).

Option B — Azure Storage Explorer

Install Azure Storage Explorer.
Connect with your MHP Azure account → mhpdeworkshopsa → nyc-taxi-data.
Drag Parquet into raw/trips/ and CSV into raw/lookup/.

Option C — Azure CLI (trainer workstation with az logged in)

# Set variables — use key1 from Portal → Access keys (trainer only; never commit)
ACCOUNT=mhpdeworkshopsa
KEY="<storage-account-key1>"
CONTAINER=nyc-taxi-data

az storage blob upload \
  --account-name "$ACCOUNT" --account-key "$KEY" \
  --container-name "$CONTAINER" \
  --file ./yellow_tripdata_2024-10.parquet \
  --name raw/trips/yellow_tripdata_2024-10.parquet \
  --overwrite

az storage blob upload \
  --account-name "$ACCOUNT" --account-key "$KEY" \
  --container-name "$CONTAINER" \
  --file ./taxi_zone_lookup.csv \
  --name raw/lookup/taxi_zone_lookup.csv \
  --overwrite

Step 3 — Sanity-check before key/SAS distribution

Check	Expected
`raw/trips/`	At least one `yellow_tripdata_*.parquet` visible
`raw/lookup/taxi_zone_lookup.csv`	File present; ~265 data rows (+ header)
Bronze ingest (after key)	`spark.read.parquet(TRIPS_DATA_PATH).count()` ≈ 3M for Oct 2024
Lookup ingest	Zone count 265 in `01_bronze_ingestion`

Do this upload before sections A–C below (key rotation and SAS creation).

A — Regenerate storage account key (Databricks)

Use a fresh key before each workshop cohort.

Sign in to Azure Portal.
Search Storage accounts → open mhpdeworkshopsa.
Left menu → Security + networking → Access keys.
Under key1, click Rotate key (or Regenerate — regenerates key1 and invalidates the old one).
Click Show next to key1 → copy the key value.
Store in your password manager — distribute to class verbally during Module 2 only.

Databricks notebooks use STORAGE_ACCOUNT_KEY with the abfss:// path to nyc-taxi-data.

B — Create SAS token in Azure Portal (Snowflake)

Snowflake external stages need Read + List on blobs in nyc-taxi-data. Microsoft documents SAS creation in the portal here: Create SAS tokens (Azure portal).

Recommended: container-scoped SAS (least privilege — only nyc-taxi-data):

Azure Portal → storage account mhpdeworkshopsa.
Left menu → Data storage → Containers → click nyc-taxi-data.
Top menu → Generate SAS (or ⋯ → Generate SAS).

Set fields:

Field	Workshop value
Signing method	Account key (Snowflake `AZURE_SAS_TOKEN` expects key-signed SAS on trial accounts)
Permissions	✅ Read, ✅ List only — leave Write / Delete / Add unchecked
Start	Today (or workshop morning)
Expiry	Workshop date + 2 days buffer
Allowed IP addresses	(leave empty for classroom)
Allowed protocols	HTTPS only
Signing key	key1

Click Generate SAS token and URL.
Copy only the SAS token field (query string like sv=2024-11-04&ss=b&srt=sco&sp=rl&se=...&sig=...).
- For Snowflake CREDENTIALS = (AZURE_SAS_TOKEN = '...'), paste the token without a leading ?.
- The portal shows the token once — save it immediately; you cannot retrieve it later.
Optional: copy Blob SAS URL to test in a browser or Azure Storage Explorer.

Alternative: account-level SAS (broader scope — use only if container SAS is unavailable):

Storage account mhpdeworkshopsa → Security + networking → Shared access signature.

Configure:

Field	Workshop value
Allowed services	Blob only
Allowed resource types	Container + Object
Allowed permissions	Read + List
Start / Expiry	Workshop date → +2 days
Allowed protocols	HTTPS only
Signing key	key1

Generate SAS and connection string → copy the SAS token query string (same rules as above).

C — Verify before class

Azure Portal / Storage Explorer

Containers → nyc-taxi-data → raw/trips/ — Parquet files visible.
Open raw/lookup/taxi_zone_lookup.csv exists.

Snowflake (trainer account)

Run after pasting SAS into snowflake/setup/02_external_stage.sql:

-- Replace 00_JUEWEI with your trainer ID (uppercase in Snowflake)
LIST @00_JUEWEI_BRONZE.nyc_taxi_trips_stage;
LIST @00_JUEWEI_BRONZE.nyc_taxi_lookup_stage;

Both commands must return file names (not Access denied or empty error).

Databricks (trainer dry-run)

After 00_setup.py with ADLS2 key:

# In a notebook cell — should print a row count, not auth error
spark.read.parquet(TRIPS_DATA_PATH).count()

D — Distribute to attendees

Item	When	Format
Storage account key	Module 2	Verbal only
SAS token	Module 3	Printed card — warn about expiry date

Common failures: expired SAS, extra spaces when copy-pasting token, using account key in Snowflake stage SQL, or regenerating key1 after Databricks setup without telling the class.

Aiven Kafka (Module 8 only)

Create Aiven Kafka cluster — see aiven-streaming-setup.qmd
Start User Activity generator (4 hours max on free tier — start on the morning of Module 8, not before)
Download SSL certificates: ca.pem, service.cert, service.key
Note the Service URI: kafka-xxxxx.aivencloud.com:12345
Start the relay consumer (streaming/snowflake/00_relay_consumer.py) — requires ADLS2 key

Power BI (Module 7 demo + optional trainee self-paced)

Trainer — build the dashboard (Desktop, once before class)

Use a Windows machine with Power BI Desktop (free to install)
Follow Exercise: Power BI or powerbi/README.md — connect to trainer Gold: Snowflake {trainer_id}_GOLD or Databricks {trainer_id}_gold
Connect via Snowflake or Azure Databricks — load all 12 kpi_* tables · choose Import for the main workshop demo
Build all five pages: Overview, Map, Time Analysis, Revenue, Efficiency — see Module 7 §3.1
File → Save As → YellowLine-NYC-KPIs.pbix (keep a local copy for offline demo fallback)

Trainer — publish to cloud workspace (if you have Power BI Pro / Fabric)

Reports are authored in Desktop; your cloud workspace hosts, refreshes, and shares them. You cannot realistically build this five-page Snowflake/Databricks dashboard from scratch in the browser alone.

Step	Where	Action
1	Desktop	Sign in with your work Microsoft account (same tenant as the workspace)
2	Desktop	Home → Publish → select your workshop workspace (not My workspace unless you have no shared workspace)
3	Service	Open Power BI → workspace → confirm report + semantic model appear
4	Service	Refresh now on the dataset — Snowflake warehouse `DE_WORKSHOP_WH` must be Started
5	Service	Open each of the five report pages — especially Map (Azure Maps geocoding needs network)
6	Service (optional)	Settings → Scheduled refresh — daily refresh before class if using Import

Sharing and licensing

Your setup	What trainees need to view your published report
Workspace on Premium / Fabric capacity	Often no Pro — share workspace or report link (viewer)
Workspace without Premium capacity	Viewers typically need Power BI Pro (or you screen-share only)
No org workspace	Demo from Desktop screen share — still works; no attendee license needed

Co-trainer access: add them as Member or Contributor on the workspace so they can open the report before Module 7.

Module 7 demo (optional, ~10 min) — pick one path:

Path	When to use
A — Service (browser)	You published to a cloud workspace; maps and refresh tested in `app.powerbi.com`
B — Desktop (local)	Fallback if Service refresh fails, or you have no shared workspace
C — Skip live demo	Point trainees to Exercise: Power BI; animation already showed the dashboard

Trainees — self-paced after Module 4 (optional)

Not part of main-day timing — no classroom block required
Prerequisites: Gold KPI tables from Modules 2–4; Windows + free Desktop only
Point trainees to Power BI setup and Exercise: Power BI after dbt lab
macOS/Linux attendees: read-only / defer to post-workshop Windows machine

Say once after Module 4:

“Priya’s dashboard is optional self-paced work — if you have Windows, install free Power BI Desktop and connect to the same Gold tables you just built. Full steps are in Setup → Power BI.”

Before the room opens (T-30 min)

Task	Owner	Done
Test projector / second screen for animations	Lead	[ ]
Open story site: https://mhp-data-engineer-2026.pages.dev/ (mirror: Vercel; fallback `quarto preview` port 4201)	Co-trainer	[ ]
Verify Databricks workspace `mhpdeworkshop_databricks` — Git folder `MHP-DE-Workshop-2026`, catalog grants, clusters Terminated (see Pre-class setup)	Co-trainer	[ ]
Verify Snowflake on trainer’s own trial account (students create theirs during class)	Co-trainer	[ ]
Prepare credentials to distribute: SAS token, ADLS2 storage key, attendee IDs	Co-trainer	[ ]
Print architecture decision matrix (1 per trainee)	Co-trainer	[ ]
Open Google Form URLs — QR / short links ready per module	Co-trainer	[ ]
Photo / save blank whiteboard space for Story sketch	Lead	[ ]
Power BI: published report opens in cloud workspace or local `.pbix` on Desktop (Module 7 demo)	Co-trainer	[ ]
`.env` / Codespaces tested on one machine	Co-trainer	[ ]

Credentials & Materials to Distribute

Each attendee needs the following credentials and materials during the workshop. Prepare these before class and distribute at the appropriate module.

Item	When to distribute	Format	Notes
ATTENDEE_ID	Start of day (Module 1)	Printed card or slide	e.g., `01_alice`, `02_bob` — used in every schema/table name
Databricks workspace URL	Module 2	Invite link via email	Trainer sends workspace invite to each attendee’s email before class
Trainer notebook paths	Module 2	Slide or printed	Workspace → Users → <lead or co-trainer email> → `MHP-DE-Workshop-2026` — both trainers share Can Run
ADLS2 Storage Account Key	Module 2	Verbal or printed	Used by Databricks `00_setup.py` to read Parquet from ADLS2. Never commit to Git.
SAS Token	Module 3	Printed card or slide	Used by Snowflake `02_external_stage.sql` to create External Stage. Has an expiry date — generate fresh before each workshop.
Databricks Personal Access Token	Only if using dbt with Databricks target	Self-service	Attendee generates their own via Settings → Access Tokens. Not needed if dbt only targets Snowflake.
Snowflake account	Self-service (before or during Module 3)	Attendee creates own trial at signup.snowflake.com	Attendee is ACCOUNTADMIN on their own account; `00_account_setup.sql` creates `DE_WORKSHOP_ROLE`

Snowflake is self-service

Unlike Databricks (trainer-managed workspace), each attendee creates their own Snowflake trial account. This means: - Attendees are ACCOUNTADMIN on their own accounts - They run 00_account_setup.sql themselves during Module 3 — this creates the database (DE_MASTERCLASS), warehouse (DE_WORKSHOP_WH), role (DE_WORKSHOP_ROLE), and personal schemas - The trainer cannot pre-verify attendee Snowflake accounts — only the trainer’s own account can be verified beforehand - dbt connects to Snowflake using the attendee’s own credentials (username/password + DE_WORKSHOP_ROLE)

Databricks Workspace

Login — Open workspace mhpdeworkshop_databricks (ID 3359135813781456). Confirm sidebar shows Workspace, Catalog, Compute, SQL Editor.
Git folders (both trainers) — each Home → MHP-DE-Workshop-2026, both trainer schemas exist (e.g. 00_juewei_*, 00_alisa_*), Share shows workshop_trainees_2026 Can Run.
Compute — Compute page: clusters Terminated (not Error); Workshop policy visible; attendees can start or attach to pre-created de-workshop-{id} clusters.
Unity Catalog — Catalog → mhpdeworkshop_databricks_2026. Confirm workshop_trainees_2026 has USE CATALOG + CREATE SCHEMA; test schemas exist after dry-run of 00_setup.py.
SQL Editor — Run SELECT 1 AS test on a SQL warehouse or cluster.
Secrets (Module 8) — databricks secrets list --scope workshop-scope returns five keys.

Snowflake Snowsight (trainer’s own account)

These checks run on the trainer’s own Snowflake trial account to verify the UI paths work correctly. Attendees create their own accounts during Module 3.

Login — Open your Snowsight URL. Confirm the left sidebar shows: Projects, Data, Compute, Admin sections.
Warehouse — At the top-right, verify DE_WORKSHOP_WH warehouse exists and is selectable. If suspended, click to resume and confirm it shows Started within ~10 seconds.
Role — Verify DE_WORKSHOP_ROLE is available in the role selector dropdown (created by 00_account_setup.sql).
Worksheets — Navigate to Projects → Worksheets. Create a test worksheet → run SELECT CURRENT_VERSION() → confirm the result appears.
Databases — Navigate to Data → Databases → DE_MASTERCLASS. Confirm your own schemas exist (e.g. 00_JUEWEI_BRONZE, _SILVER, _GOLD — uppercase stem from your ATTENDEE_ID).
External Stage — Run LIST @00_JUEWEI_BRONZE.nyc_taxi_trips_stage (replace with your trainer ID) to verify the SAS token works and Parquet files are listed.

dbt (Docker / Codespaces)

Codespaces — Open a Codespace from the fork → run dbt --version in the terminal → confirm Core 1.8.x with adapters snowflake and databricks.
Docker — Pull the workshop image: docker pull ghcr.io/mhp-data-engineer/workshop-dbt:2026 → run docker run --rm ghcr.io/mhp-data-engineer/workshop-dbt:2026 dbt --version.
Connection test — Inside the environment: cd dbt_project && dbt debug --target snowflake → confirm All checks passed!.

Fallback if MP4 missing: Read animation beat from voiceover scripts while showing module story callout on screen.

Trainer roles

Role	Responsibility
Lead	Story narration, reflection facilitation, theory, Module 7 discussion
Co-trainer	Lab roaming, environment issues, timing nudges, Power BI demo
Both	Never leave a stuck pair >5 min without a hint or checkpoint offer

Main-day schedule

Time	Module	Focus	Watch for
09:00	Story Welcome	Design worksheet	Save whiteboard photo
09:30	1 Fundamentals	Medallion + KPIs	Keep to 35 min
10:00	2 Databricks	Core lab	Do not steal lab time for discussion
11:30	3 Snowflake	Core lab	Same KPIs narrative
12:45	Lunch	45 min
13:30	4 dbt	Core lab	dbt ≠ warehouse
15:00	5 Production	Scheduling / CI	LSDP naming
15:45	6 AI	Cortex LLM only	Not Module 9 ML
16:30	7 Wrap-up	Discussion + optional PBI	Matrix handout at silent write
17:00	End

Hard stops: Start Module 2 by 10:00 · Start Module 3 by 11:30 · Start Module 4 by 13:30.

Per-module checklist (repeat every module)

Play animation (or voiceover fallback)
Think & Discuss — oral discussion using reflection prompts (trainer answer keys in each module section); whiteboard 3–5 bullets
Bridge to theory with whiteboard callback
Theory — stay within module time budget
Share Google Form quiz (~2–3 min, auto-scored)
Release to practice / exercise
Note Priya dashboard beat (if applicable)

Module-specific notes:

Module	Trainer note
Story	Capture design whiteboard — revisit at 16:30
2	Sofia voice: prototype before SQL simplification
3	“Same architecture. Different implementation philosophy.”
4	Elena: dbt on Snowflake
6	Do not demo `ML.FORECAST`
7	Theory ≤5 min · open discussion guide

Per-module UI checkpoints (co-trainer verifies before each module)

Module	Attendee UI should show	Co-trainer check
2 Databricks	Cluster Running (green dot) in Compute page; notebooks visible in Workspace	Confirm all attendee clusters started; notebooks accessible in Shared folder
3 Snowflake	Snowsight open on attendee’s own trial account; `00_account_setup.sql` completed; warehouse Started	Walk around — confirm each attendee has `DE_MASTERCLASS` database and `DE_WORKSHOP_ROLE` created; SAS token distributed and working
4 dbt	Terminal open in Codespaces or Docker with `dbt_project/` directory; `dbt debug --target snowflake` passing	Walk around — check terminals for green `All checks passed!`; confirm `profiles.yml` uses `DE_WORKSHOP_ROLE` and `DE_MASTERCLASS`
5 Production	Jobs & Pipelines page accessible in Databricks (formerly Workflows → Delta Live Tables; renamed to Lakeflow Declarative Pipelines); Snowflake worksheets with Task SQL ready	Pre-create one Lakeflow pipeline as demo; verify `TASK_HISTORY()` returns data
6 AI Features	Genie icon visible in Databricks sidebar (under SQL section); Snowflake worksheets ready for Cortex SQL	Confirm `AI_COMPLETE` returns results (run test query); Genie page loads
7 Wrap-up	No portal needed — whiteboard and discussion only	Print architecture decision matrix handouts

Module 6 — Databricks AI prerequisites

Complete Step 10 and Step 10b during pre-class setup. Co-trainer verifies the table below before Module 6 (after attendees have Gold tables from Module 2–4).

Check	Pass criteria
Partner-powered AI	On at account + workspace (Step 10b)
SQL warehouse	`de-workshop-wh` (Pro or Serverless) Started; trainees have Can use
Databricks SQL entitlement	Enabled for `workshop_trainees_2026` (Step 4)
UC data access	`SELECT` on `{attendee_id}_gold` / `kpi_*` tables
Gold tables exist	`03_gold_kpis.py` completed — Module 6 builds on Gold, not Bronze
Assistant	`Ctrl+I` in a notebook returns a code suggestion
`ai_query()`	Test query on `de-workshop-wh` returns a response (model name verified)
Genie	Sidebar icon loads; trainer test space answers a question on Gold KPIs
Snowflake Cortex	`AI_COMPLETE` test query returns on trainer Snowflake account

Azure-specific note: ai_query() on Pro SQL warehouses requires Azure Private Link enabled for the workspace. If the test query fails with a Private Link error, use a Serverless SQL warehouse instead or work with MHP IT to enable Private Link.

Throughput (classroom scale): Genie defaults to ~20 questions/min per workspace — sufficient for ~20–30 trainees. Do not run a Genie stress test during the module. | 8 Streaming | Databricks cluster with Kafka Maven libs installed; Snowflake warehouse running | Verify Kafka libs on cluster (Compute → Libraries tab); Aiven topic has events flowing | | 9 ML | Databricks AI/ML → Experiments page accessible; Snowflake worksheets ready for ML.FORECAST | Confirm USE AI FUNCTIONS privilege + CORTEX_USER role granted; ML Runtime cluster available |

Module 7 runbook (30 min block)

Min	Activity	Doc
0–3	Animation `mod-07-wrapup.mp4`	voiceovers
3–5	Silent write + decision matrix	matrix
5–10	Short theory: Objectives, PBI demo notes, When to Use What	Module 7
10–28	Open discussion (Rounds 1–4)	discussion guide
28–30	Close: three constraints + “Technology is a decision…”
+10	Optional Power BI live demo	§ Power BI — Service or Desktop

If running PBI demo before discussion, cut Round 3 synthesis to 5 min.

Common classroom fixes

Situation	Response
Pair stuck on Bronze ingest	Point to checkpoint data / co-trainer pairs in
“dbt replaces Snowflake”	Draw platform box; dbt inside as transform layer
Reflection runs long	5-min timer; capture 3 bullets max on whiteboard
Running late in Module 2–4	Cut discussion to 5 min — never cut lab
Vendor debate in Module 7	“We’re advising Marcus, not picking a winner for MHP.”
Attendee cannot find notebooks in Databricks	Confirm you Shared `MHP-DE-Workshop-2026` → `workshop_trainees_2026` Can Run; guide to Workspace → Users → <trainer-email> → `MHP-DE-Workshop-2026`
Trainee cannot Create compute / no `Workshop` policy	Grant `Workshop` policy Can use to `workshop_trainees_2026` (Step 6)
Module 8: `PermissionDenied` on `dbutils.secrets.get`	Run `secrets put-acl` READ for `workshop_trainees_2026` on `workshop-scope` (Step 11)
Attendee cannot change `ATTENDEE_ID`	Shared folder is read-only — Clone `00_setup.py` to Home
Trainees see old notebook content after GitHub update	Git folders do not auto-sync — both trainers must Git → Pull on `MHP-DE-Workshop-2026`; trainees on shared folder inherit trainer’s checkout
“Environment configurations are not saved” on Git `.py` notebook	Expected until PEP 723 metadata is in the file — repo notebooks include it after pull. Tell attendees to attach a cluster (not Serverless); do not add PySpark in the Environment panel
Attendee’s Databricks cluster won’t start	Check Compute page for error message; try Restart; if stuck >5 min, assign a buddy cluster
Snowflake trial signup fails	Suggest using a different email; check spam folder for verification email; trial creation can take 5–10 min
Snowflake `00_account_setup.sql` fails	Check attendee is using ACCOUNTADMIN role (default for trial); confirm `SET attendee_id = '...'` was run first
Snowflake `DE_WORKSHOP_ROLE` not found	The role is created by `00_account_setup.sql` — re-run the script; or manually: `CREATE ROLE DE_WORKSHOP_ROLE;`
Snowflake External Stage cannot list files	SAS token may be expired or have extra spaces; re-copy from trainer handout; verify stage URL matches `mhpdeworkshopsa.blob.core.windows.net`
Snowflake warehouse shows “Suspended”	Click warehouse name at top-right → click Resume; wait ~10 seconds for Started status
Snowflake worksheet shows “No results”	Check session variable: `SELECT $attendee_id;` — if null, re-run the `SET` statement at the top
dbt `debug` fails with connection error	Check `profiles.yml` — confirm `database: DE_MASTERCLASS`, `role: DE_WORKSHOP_ROLE`, and correct Snowflake account/user/password
Databricks Experiments page is empty	The experiment appears after the first `mlflow.start_run()` call — run the training notebook first
Databricks “Cannot see catalog” error	Re-run GRANT on `mhpdeworkshop_databricks_2026` from Step 5; or Catalog → Permissions UI
Git folder clone fails / repo too large	Enable sparse checkout with cone pattern `databricks/notebooks` only; see Git folders
Git folder commit rejected	New folder outside cone pattern — add pattern under Git folder Settings → Advanced → Cone patterns
Databricks CLI `403 Forbidden`	PAT may be expired or lack admin scope — generate a new token: Settings → User Settings → Access Tokens; ensure workspace has admin consent for CLI apps
Module 8: `workshop-scope` secrets scope missing	Trainer must create scope before class (see Pre-class setup); or guide attendee: `databricks secrets create-scope --scope workshop-scope`
`ML.FORECAST` returns error in Snowflake	Verify `GOLD_TRIPS_BY_HOUR` table exists and has `PICKUP_HOUR_TS` timestamp column; check Cortex role
Genie icon visible but spaces won’t open	Enable Partner-powered AI features at account level — see Step 10b
`ai_query()` fails or “not supported”	Attach Pro or Serverless warehouse (not Classic); on Azure Pro, check Private Link or switch to Serverless
`ai_query()` model not found	List foundation models in workspace; update endpoint name in lab — see Module 6 model drift callout
Genie returns empty / permission error	Grant `SELECT` on Gold `kpi_*` tables in UC; confirm Genie space default warehouse is `de-workshop-wh`
Genie Code / Assistant missing in notebooks	Confirm Partner-powered AI features = On (workspace + account); try `Ctrl+I` in a notebook — no separate assistive toggle in current UI
Power BI cannot connect to Snowflake	Verify warehouse is Started; check server URL matches `<account>.snowflakecomputing.com`; use DirectQuery mode

End-of-day close (2 min script)

One dataset. Three implementations. Priya’s dashboard didn’t care which engine built Gold.

Three constraints — cost, performance, compliance. Three decisions — platform, transform, consumption.

Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.

Technology is a decision. Architecture is responsibility.

Optional: 1–5 finger poll — “I could defend my tool choice to a client.”

After class

Task	Owner
Save Story + Module 7 whiteboard photos	Lead
Note timing overruns for next delivery	Both
Log environment issues (catalog, warehouse, dbt target)	Co-trainer
Share tool comparison deep dive link for self-study	Lead

Full dry-run checklist: docs/dry-run-checklist.md · pre-class-checklist.qmd

Document history

Date	Change
2026-06-05	Git folder sync: manual Pull required (Step 3b); cluster vs Serverless; environment banner
2026-06-05	Added Power BI cloud workspace publish workflow for trainers (Desktop build → Service demo); aligned with five-page / 12-KPI exercise
2026-06-05	Updated per-module checklist for five-step rhythm; fixed aiven-kafka → workshop-scope secret scope
2026-06-05	Databricks + ADLS2 both documented under RG `1000_data_engineering_workshop`
2026-06-05	ADLS2: direct Azure Portal link to RG `1000_data_engineering_workshop` (`mhpdev.onmicrosoft.com`)
2026-06-05	ADLS2: document pre-provisioned `mhpdeworkshopsa` (RG `1000_data_engineering_workshop`, Germany West Central)
2026-06-05	ADLS2: TLC download + upload to `raw/trips/` and `raw/lookup/` before key/SAS distribution
2026-06-05	ADLS2: Azure Portal steps for storage key + container SAS; two-trainer Git folder model
2026-06-05	Groups `workshop_trainees_2026` / `workshop_trainer_2026`; Step 7 trainee self-create cluster; SELECT + secrets READ ACL; entitlements in Step 4
2026-06-05	Added Module 6 Databricks AI prerequisites (Steps 10/10b, pre-class checks, classroom fixes)
2026-06-05	Step 10b: Partner-powered AI is sole workspace toggle; Genie Code verify checklist (no separate assistive toggle)
2026-06-05	Trainer IDs `00_{firstname}` (2026: `00_juewei`/`00_alisa`); reusable naming convention for other cohorts
2026-06-05	Two-trainer model, mandatory Git folder share to trainees; Cloudflare URL
2026-06-05	Expanded Databricks workspace admin guide (`mhpdeworkshop_databricks`, Git folders, UC grants); Cloudflare production URL
2026-06-04	Added pre-class infrastructure setup section (Databricks workspace, ADLS2, Aiven, Power BI); added Unity Catalog GRANT statements, cluster creation guide, Databricks CLI auth, SQL Warehouse setup
2026-05-24	Initial day-of facilitator runbook

Pre-class infrastructure setup (1–2 weeks before)

Databricks Workspace

Step 1 — Confirm workspace admin access

Step 2 — Link Git credentials (trainer + attendees)

Step 3 — Each trainer creates a Git folder (lead + co-trainer)

Step 3b — Sync Git folder with GitHub (Pull — not automatic)

Step 4 — Create workshop groups and invite attendees

Step 5 — Unity Catalog permissions (trainees)

Step 6 — Cluster policy (recommended)

Step 7 — Compute for attendees

Trainee self-create cluster (Option A — brief for class)

Step 8 — Share Git folder with trainees (required — both trainers)

Step 9 — What trainees do in Module 2

Step 10 — Databricks SQL Warehouse (Module 6 AI + optional dbt target)

Step 10b — Databricks AI features (Module 6 — account + workspace admin)

Step 11 — Databricks CLI authentication (Module 8 Aiven Secrets)

ADLS2 Storage (mhpdeworkshopsa)

0 — Download from TLC and upload to ADLS2

A — Regenerate storage account key (Databricks)

B — Create SAS token in Azure Portal (Snowflake)

C — Verify before class

D — Distribute to attendees

Aiven Kafka (Module 8 only)

Power BI (Module 7 demo + optional trainee self-paced)

Before the room opens (T-30 min)

Credentials & Materials to Distribute

Databricks Workspace

Snowflake Snowsight (trainer’s own account)

dbt (Docker / Codespaces)

Trainer roles

Main-day schedule

Per-module checklist (repeat every module)

Per-module UI checkpoints (co-trainer verifies before each module)

Module 6 — Databricks AI prerequisites

Module 7 runbook (30 min block)

Common classroom fixes

End-of-day close (2 min script)

After class

Document history

ADLS2 Storage (`mhpdeworkshopsa`)