flowchart LR
GH["GitHub\njinjuewei/MHPDataEngineerWorkshop"]
T1["Lead trainer Git folder"]
T2["Co-trainer Git folder"]
ST["Trainees\nshared Can Run"]
GH -->|"git push (CI / developer)"| GH
T1 -->|"Git → Pull (manual)"| GH
T2 -->|"Git → Pull (manual)"| GH
T1 --> ST
T2 --> ST
GH -.->|"No auto sync"| T1
Facilitator Guide — Day-of Runbook
Facilitator-only — not shown to trainees during labs
YellowLine NYC masterclass · MHP Data Engineer Masterclass 2026
Audience: Lead trainer + co-trainer
Related: pre-class-checklist.qmd · module-delivery-pattern.qmd · whiteboard-prompts.qmd · reflection-prompts.qmd
Pre-class infrastructure setup (1–2 weeks before)
These tasks are done once by the trainer before the workshop day. They cannot be done on the morning of the class — plan ahead.
ADLS2 (mhpdeworkshopsa) and Databricks (mhpdeworkshop_databricks) are both pre-provisioned under the same resource group:
1000_data_engineering_workshop · subscription MHP Resort Consulting Services · tenant mhpdev.onmicrosoft.com
Databricks Workspace
The workshop Databricks workspace already exists in resource group 1000_data_engineering_workshop. Your job is to configure users, Git folders, Unity Catalog grants, and clusters — not to provision a new workspace.
| Item | Value |
|---|---|
| Workspace name | mhpdeworkshop_databricks |
| Workspace ID | 3359135813781456 |
| Resource group | 1000_data_engineering_workshop |
| Subscription | MHP Resort Consulting Services (ba826c91-8e52-4e07-ac7c-538858bbc813) |
| Azure tenant | mhpdev.onmicrosoft.com |
| Your role | Workspace admin (provisioned by MHP IT) |
| Unity Catalog | mhpdeworkshop_databricks_2026 — confirm in Catalog explorer (name must match CATALOG_NAME in databricks/notebooks/00_setup.py) |
| Source repo | github.com/jinjuewei/MHPDataEngineerWorkshop |
| Notebook path in repo | databricks/notebooks/ (00_setup.py … 04_ai_features.py) |
Find it in Azure Portal
- Open resource group 1000_data_engineering_workshop.
- Under Resources, click Azure Databricks /
mhpdeworkshop_databricks. - Click Launch workspace (or copy the workspace URL from Overview).
- Workspace URL looks like
https://adb-<id>.<random>.azuredatabricks.net— also under Settings → Workspace settings in the Databricks UI.
Official references: Add users (Azure Databricks), Create Git folders, Unity Catalog get started.
This workshop has two trainers (lead + co-trainer). Each trainer creates their own Git folder in their own Home, runs 00_setup.py with a trainer-specific ATTENDEE_ID, then shares that folder to all trainees (Can Run). Trainees open the shared folder to run notebooks; they still use their own {attendee_id} in 00_setup.py (clone that notebook to Home first — see Databricks setup).
ATTENDEE_ID naming
| Role | Pattern | Examples |
|---|---|---|
| Trainers | 00_{firstname} (lowercase) |
2026 delivery: 00_juewei, 00_alisa · other cohorts: 00_sam, 00_taylor |
| Trainees | 01_{name}, 02_{name}, … (lowercase) |
01_alice, 02_bob |
- Databricks / Unity Catalog — schemas are lowercase:
00_juewei_bronze,00_juewei_silver,00_juewei_gold - Snowflake — same ID stem, schemas uppercase:
00_JUEWEI_BRONZE,00_JUEWEI_SILVER,00_JUEWEI_GOLD(set in each trainer’s00_account_setup.sql) - Agree both trainer IDs in pre-class; use the same IDs for Databricks demos, Snowflake dry-runs, and Power BI trainer Gold
Step 1 — Confirm workspace admin access
- Open the workspace URL and sign in with your MHP account.
- Click your username (top bar) → Settings.
- Confirm you can open Admin settings (workspace admin) or Identity and access without errors.
- Open Catalog → verify catalog
mhpdeworkshop_databricks_2026exists and you can browse it.
The workspace name stays mhpdeworkshop_databricks (shared Azure resource). Last year’s Unity Catalog is mhpdeworkshop_databricks_2025. For this cohort, create a new catalog mhpdeworkshop_databricks_2026 if it does not exist yet (Catalog → Create catalog), then apply the GRANTs in Step 5. Notebooks use CATALOG_NAME = "mhpdeworkshop_databricks_2026" in 00_setup.py — do not point trainees at the 2025 catalog.
Step 2 — Link Git credentials (trainer + attendees)
Required before creating a Git folder. See Set up Git credentials.
- Username → Settings → Linked accounts → Git integration → Add credential.
- Provider: GitHub.
- Use a Personal Access Token (classic) with at least
reposcope, or GitHub fine-grained access to the fork. - Tell attendees to complete the same step before Module 2 (documented in Databricks setup guide).
Step 3 — Each trainer creates a Git folder (lead + co-trainer)
Both trainers repeat this in their own Databricks Home (same steps, different ATTENDEE_ID). Use a Git folder.
| Trainer | ATTENDEE_ID in 00_setup.py |
Schemas created (lowercase) |
|---|---|---|
| Lead trainer | 00_{firstname} — 2026: 00_juewei |
e.g. 00_juewei_bronze, _silver, _gold |
| Co-trainer | 00_{firstname} — 2026: 00_alisa |
e.g. 00_alisa_bronze, _silver, _gold |
Replace
{firstname}with each trainer’s assigned ID (naming convention above). Other deliveries pick any two00_*IDs — keep them unique and agreed before class.
Workspace → Home (
/Users/<your-email>/).⋮ → Create → Git folder.
Fill in:
Field Value Git repository URL https://github.com/jinjuewei/MHPDataEngineerWorkshop.gitGit provider GitHub Git folder name MHP-DE-Workshop-2026Sparse checkout mode ✅ Enable Cone patterns databricks/notebooksOptional before Modules 8–9: add cone patterns
streaming/databricksandml/databricks.Create Git folder → wait for clone.
Open
databricks/notebooks/00_setup.py→ setATTENDEE_IDto your trainer ID (e.g.00_jueweior00_alisa) → attach/start cluster → Run all.Catalog → confirm your three schemas exist under
mhpdeworkshop_databricks_2026.Confirm
01_bronze_ingestion.py…04_ai_features.pyare listed under the Git folder.
Sparse checkout must be enabled at creation; you cannot disable sparse mode afterward. Cone patterns can be edited later: Git folder → Settings → Advanced → Cone patterns (Configure sparse checkout).
Step 3b — Sync Git folder with GitHub (Pull — not automatic)
A Git folder is a workspace checkout of the remote repo. Changes on GitHub do not appear in the workspace until someone Pulls.
| Question | Answer |
|---|---|
| Does GitHub auto-update the workspace? | No — click Pull in the Git dialog, or automate via Repos API / CI/CD (Pull changes) |
| Who Pulls for the shared-folder model? | Both trainers only — trainees with Can Run cannot run Git operations (permissions) |
| When to Pull? | After any push to main; morning of class; after notebook/doc fixes land in GitHub |
| Trainee with own Git fork? | Trainee creates their own Git folder and Pulls there (collaborate in Git folders) |
Pull procedure (each trainer) — UI steps per Access the Git dialog and Pull changes:
Option A — from Workspace (recommended before class)
- Left sidebar → Workspace.
- Expand Users → your email →
MHP-DE-Workshop-2026. - Beside the folder name, click Git (Git icon / Git link). A full-screen Git operations dialog opens.
- At the top, confirm the branch dropdown shows
main(or your workshop branch). If not, selectmainbefore pulling. - Click Pull (in the dialog toolbar — sync/download from remote).
- Wait for the dialog to finish. Files under
databricks/notebooks/update to match GitHub. - Close the Git dialog (click outside or X).
- Verify: open
databricks/notebooks/00_setup.py→ check PEP 723 header / recent edits match GitHubmain.
Option B — from an open notebook
- Open any notebook in the Git folder (e.g.
01_bronze_ingestion.py). - At the top of the notebook, next to the notebook title, click the branch name button (shows current branch, e.g.
main). - The same Git operations dialog opens → click Pull → confirm branch
main. - Close the dialog and re-open the notebook if cells look stale (Pull can clear notebook session state).
| UI element | Where to find it |
|---|---|
| Git button | Workspace tree: beside MHP-DE-Workshop-2026 folder name |
| Branch button | Top bar inside a notebook opened from the Git folder |
| Pull | Git operations dialog toolbar (downloads from remote — no commit message needed) |
| Commit & Push | Same dialog — only use if you intentionally changed files in the Git folder |
If Pull fails or is disabled
| Symptom | Action |
|---|---|
| Merge conflict after Pull | Git dialog offers Keep all current / Take all incoming or manual edit — see Resolve merge conflicts. For workshop notebooks, prefer incoming unless you have local edits to keep. |
| Pull grayed out / Git ops disabled | Workspace may need serverless compute for Git UI (Git CLI folders) — Git CLI compute requirements. Fallback: Repos API or ask workspace admin. |
| Uncommitted local changes in folder | Commit or discard before Pull; sparse-checkout folders block pattern changes while files have uncommitted edits. |
Official notes from Databricks:
- Pull is manual — “click Pull in the Git operations dialog” (source).
- Pull clears notebook state — warn attendees if they have unsaved notebook session state before you Pull mid-class.
- One Git operator per folder — Databricks recommends only one user performs Git ops per folder; trainees use Can Run on the shared copy (collaborate).
- Git UI + serverless — if Git → Pull is disabled, the workspace may need serverless compute (required for Git CLI-enabled folders) — see Git CLI compute requirements.
Never Push secrets from the workspace —
STORAGE_ACCOUNT_KEYbelongs only in each attendee’s Home clone of00_setup.py, not in the shared Git folder commit. Source.pynotebook outputs are not committed by default (commit and push).
Step 4 — Create workshop groups and invite attendees
Create two groups for the 2026 cohort:
| Group | Members | Workspace access level |
|---|---|---|
workshop_trainer_2026 |
Lead + co-trainer | User (trainers are also workspace admins — see Step 1) |
workshop_trainees_2026 |
All attendees | User (not Admin) |
- Settings → Admin settings → Identity and access → Groups → Add group — create both groups.
- Add trainer emails to
workshop_trainer_2026; add each attendee toworkshop_trainees_2026(or bulk-import). - Users → Add user for anyone not yet in the workspace. Each invitee receives an activation email — confirm all accepted before class.
Entitlements — open each group → Entitlements tab:
| Group | Enable |
|---|---|
workshop_trainees_2026 |
Workspace access · Databricks SQL (Module 6 Genie + SQL Editor) |
workshop_trainer_2026 |
Workspace access · Databricks SQL (trainer dry-runs) |
Trainees do not need Allow unrestricted cluster creation — the
Workshopcluster policy (Step 6) grants Can use, which is enough to self-create clusters within policy limits.
See Manage users · Manage entitlements.
Step 5 — Unity Catalog permissions (trainees)
Run in SQL Editor (or from 00_setup.py as admin). Grants must exist before attendees run 00_setup.py (CREATE SCHEMA).
Option A — grant to trainee group (recommended):
-- Catalog-level (once, before class) — trainees only
GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop_trainees_2026`;
GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `workshop_trainees_2026`;Option B — grant per attendee (if no group):
GRANT USE CATALOG ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;
GRANT CREATE SCHEMA ON CATALOG mhpdeworkshop_databricks_2026 TO `attendee@example.com`;After each attendee runs 00_setup.py, grant schema access (schemas are lowercase):
GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_bronze TO `workshop_trainees_2026`;
GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_silver TO `workshop_trainees_2026`;
GRANT USE SCHEMA ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `workshop_trainees_2026`;
GRANT SELECT ON SCHEMA mhpdeworkshop_databricks_2026.{attendee_id}_gold TO `workshop_trainees_2026`;Replace {attendee_id} with e.g. 01_alice. The SELECT on _gold is required before Module 6 (ai_query(), Genie). Alternatively use Catalog → catalog → Permissions UI.
Trainers (
workshop_trainer_2026) use workspace-admin privileges for demos — no separate UC group grants needed unless you prefer explicit grants.
Step 6 — Cluster policy (recommended)
Limit cost and standardise specs. See Cluster policies.
- Compute → Policies → Create policy.
- Name:
Workshop. - Restrict: max workers 1, allowed node types include
Standard_DS3_v2, auto-termination 30 min. - Assign policy Can use to
workshop_trainees_2026andworkshop_trainer_2026.
Step 7 — Compute for attendees
Option A — attendees create their own cluster (less prep, more day-of support):
| Setting | Value |
|---|---|
| Policy | Workshop |
| Runtime | 15.4 LTS (Modules 2–6) · 15.4 LTS ML for optional Module 9 |
| Node type | Standard_DS3_v2 |
| Workers | 1 |
| Auto-terminate | 30 minutes |
Trainee self-create cluster (Option A — brief for class)
Regular users in workshop_trainees_2026 can create a cluster when the Workshop policy has Can use (Step 6). Trainees do not need workspace admin.
- Compute → Create compute → Cluster.
- Policy: select
Workshop(required — limits size and cost). - Databricks runtime: 15.4 LTS (or latest LTS).
- Node type:
Standard_DS3_v2· Workers: 1 · name e.g.de-workshop-01_alice. - Create → wait until Running (green).
- Open a notebook → compute dropdown (top bar) → attach this cluster → Run all on your Home copy of
00_setup.pyfirst.
Pre-class verify: sign in as a test trainee (or co-trainer in workshop_trainees_2026) and confirm Create compute shows policy Workshop and the cluster starts.
Option B — pre-create one cluster per attendee (faster Module 2 start):
| Setting | Value |
|---|---|
| Cluster name | de-workshop-{ATTENDEE_ID} (e.g. de-workshop-01_alice) |
| Policy | Workshop |
| Runtime | 15.4 LTS (ML for Module 9 optional labs) |
| Node type | Standard_DS3_v2 |
| Workers | 1 |
| Auto-terminate | 30 minutes |
Create in Terminated state. Share each cluster: Compute → cluster → Permissions → add attendee (or workshop_trainees_2026) with Can restart.
Step 9 — What trainees do in Module 2
Brief attendees on:
| Task | Where |
|---|---|
| Accept workspace invite | Email link |
| Open trainer notebooks | Workspace → Users → <trainer-email> → MHP-DE-Workshop-2026 |
Clone 00_setup.py to Home |
⋮ → Clone to [your Home] — then set your ATTENDEE_ID (e.g. 01_alice) |
Run 00_setup.py |
Your Home copy — creates your {id}_bronze/silver/gold schemas |
Run 01–04 |
Trainer shared folder (or your clone) |
| Paste ADLS2 key (verbal from trainer) | Your 00_setup.py only — never commit |
| Create/start cluster | Compute → Create compute → policy Workshop → Start (see Step 7) |
Step 10 — Databricks SQL Warehouse (Module 6 AI + optional dbt target)
Required for Genie and ai_query() in Module 6 — even when dbt targets Snowflake only.
- SQL Editor → SQL Warehouses → Create SQL Warehouse
- Type: Pro or Serverless (not Classic —
ai_query()is unsupported on Classic) - Name:
de-workshop-wh, Size: 2X-Small, Auto-suspend: 5 minutes - Permissions → add
workshop_trainees_2026andworkshop_trainer_2026→ Can use - Trainer dry-run: SQL Editor → attach
de-workshop-wh→SELECT 1
Official reference: Create a SQL warehouse
Step 10b — Databricks AI features (Module 6 — account + workspace admin)
Module 6 uses Genie Code (notebook/SQL AI assistant — formerly Databricks Assistant), ai_query() (SQL), and Genie Spaces (natural language over Gold). Configure once before class.
Account admin (Account console → Settings → Feature enablement):
| Setting | Value | Why |
|---|---|---|
| Enable partner-powered AI features | On | Powers Genie Code, Genie Spaces, and related assistive features (Azure OpenAI / Anthropic on Databricks) |
| Enforce data processing within workspace Geography | Review if AI features fail to enable | Workspaces outside US/EU (e.g. Germany West Central) may need cross-geo processing disabled — see Partner-powered AI features |
Workspace admin (username → Settings → Workspace admin → Advanced):
| Setting | Value |
|---|---|
| Partner-powered AI features | On (unless account enforces Off) |
No separate “AI-powered assistive features” toggle? That is normal in current Azure Databricks UI. Microsoft consolidated admin control under Partner-powered AI features; Genie Code and other assistive features are enabled when partner-powered AI is On (or use Databricks-hosted models when it is Off in supported regions). Do not block Module 6 prep looking for a second toggle — run the functional checks below instead.
##### Verify AI is activated (workspace admin — 5 min)
| Check | Pass criteria |
|---|---|
| Workspace Advanced | Partner-powered AI features = On |
| Genie Code | Open a notebook on a cluster → Ctrl+I / Cmd+I (or Genie Code icon) → prompt returns a suggestion |
| Genie Spaces | Sidebar Genie → New → add a Gold table → question returns SQL or an answer |
ai_query() |
SQL Editor on de-workshop-wh → one-row test query succeeds (verify model name) |
User entitlements — configured in Step 4 (workshop_trainees_2026: Workspace access + Databricks SQL).
Unity Catalog — GRANT SELECT ON SCHEMA …_gold is in Step 5 (run after each 00_setup.py). Required before Module 6 ai_query() and Genie.
Trainer pre-class dry-run (after Gold tables exist from 03_gold_kpis.py):
Genie Code — open
04_ai_features.pyon a cluster →Ctrl+I/Cmd+I→ prompt “Show top 5 pickup zones by revenue”ai_query()— SQL Editor → warehousede-workshop-wh→ run a one-row test (verify model name in workspace first):SELECT ai_query( 'databricks-meta-llama-3-3-70b-instruct', 'Reply with exactly: OK' ) AS test;Genie — sidebar Genie → New → add your trainer Gold tables (e.g.
00_juewei_gold.kpi_*) → set default warehousede-workshop-wh→ ask “What hour has the most taxi trips?”
Model IDs drift — list available foundation models in the workspace before class. Do not hardcode names from last year’s delivery.
Official references: AI assistive features · Genie setup · Genie Code · ai_query
Step 11 — Databricks CLI authentication (Module 8 Aiven Secrets)
The trainer needs to create the workshop-scope secrets scope before class or guide attendees through it during Module 8. Two options:
Option A — Trainer creates scope via CLI (recommended): ```bash # Authenticate CLI to workspace databricks configure –token # Prompt: Databricks Host: https://
# Create scope (once) databricks secrets create-scope –scope workshop-scope
# Add secrets (values from Aiven Console) databricks secrets put –scope workshop-scope –key aiven-bootstrap-servers databricks secrets put –scope workshop-scope –key aiven-ca-cert databricks secrets put –scope workshop-scope –key aiven-client-cert databricks secrets put –scope workshop-scope –key aiven-client-key databricks secrets put –scope workshop-scope –key aiven-topic
# Allow trainees to read secrets in notebooks (Module 8) databricks secrets put-acl –scope workshop-scope –principal workshop_trainees_2026 –permission READ databricks secrets put-acl –scope workshop-scope –principal workshop_trainer_2026 –permission READ ```
Option B — Attendees create their own scope during Module 8 (requires each attendee to have a PAT and Databricks CLI installed).
Verify: as a test trainee, databricks secrets list --scope workshop-scope lists five keys (trainees need READ ACL — not scope admin).
ADLS2 Storage (mhpdeworkshopsa)
The workshop storage account already exists. Your job is to upload TLC data, rotate keys, and create SAS tokens — not to provision Azure storage.
| Item | Value |
|---|---|
| Storage account | mhpdeworkshopsa |
| Resource group | 1000_data_engineering_workshop |
| Location | Germany West Central (germanywestcentral) |
| Subscription | MHP Resort Consulting Services |
| Subscription ID | ba826c91-8e52-4e07-ac7c-538858bbc813 |
| Azure tenant | mhpdev.onmicrosoft.com |
| Container | nyc-taxi-data (should already exist) |
Find it in Azure Portal
- Sign in to Azure Portal with your MHP account (
mhpdev.onmicrosoft.com). - Open the resource group directly: 1000_data_engineering_workshop.
- Under Resources, click storage account
mhpdeworkshopsa. - Confirm Location shows Germany West Central on the Overview blade.
Shortcut: search
mhpdeworkshopsain the portal top bar if you are already in the correct subscription.
Same resource group also contains Databricks workspace
mhpdeworkshop_databricks— see Databricks Workspace above.
Shared storage for all attendees. Two credentials — do not mix them up:
| Credential | Used by | Module | Never commit to Git |
|---|---|---|---|
| Storage account key (key1) | Databricks 00_setup.py |
2 | Distribute verbally |
| SAS token (query string) | Snowflake 02_external_stage.sql |
3 | Printed card / slide |
Data layout (container nyc-taxi-data):
| Path | Content |
|---|---|
raw/trips/ |
Parquet trip files (yellow_tripdata_YYYY-MM.parquet) |
raw/lookup/taxi_zone_lookup.csv |
Zone lookup CSV (265 zones) |
streaming/user-activity/ |
Module 8 relay output (optional) |
Official reference: Grant limited access with SAS.
0 — Download from TLC and upload to ADLS2
Workshop pipelines read raw files from ADLS2 — they do not download from the internet at runtime. The trainer (or MHP ops) must download once from NYC TLC and upload to mhpdeworkshopsa before rotating keys or distributing SAS tokens.
Source: NYC TLC Trip Record Data — Yellow Taxi Trip Records, Parquet format.
| File | TLC download (direct) | ADLS2 destination |
|---|---|---|
| Trip data | https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-10.parquet (~61 MB) |
nyc-taxi-data/raw/trips/yellow_tripdata_2024-10.parquet |
| Zone lookup | https://d37ci6vzurychx.cloudfront.net/misc/taxi_zone_lookup.csv (~12 KB) |
nyc-taxi-data/raw/lookup/taxi_zone_lookup.csv |
Workshop default month: October 2024 (
dbt_project/dbt_project.ymlsetsdata_year: 2024,data_month: 10). One month (~3M trips) is enough for all labs. Optional: addyellow_tripdata_2024-09.parquetandyellow_tripdata_2024-11.parquetfor richer time-series KPIs — keep original TLC filenames.
Step 1 — Download locally
- Open the TLC trip record page.
- Under Yellow Taxi Trip Records, choose Parquet (not CSV) for the trip file.
- Download October 2024 Parquet (link above or TLC table row
yellow_tripdata_2024-10.parquet). - Download Taxi Zone Lookup Table CSV (
taxi_zone_lookup.csv— link above or TLC Auxiliary data section). - Confirm locally: trip file is
.parquet; lookup is.csvwith headerLocationID,Borough,Zone,service_zone.
Step 2 — Upload to ADLS2
Use any method below. Create folders raw/trips/ and raw/lookup/ if they do not exist.
Option A — Azure Portal (no extra tools)
- Azure Portal → Storage accounts →
mhpdeworkshopsa. - Data storage → Containers →
nyc-taxi-data. - Open or create
raw/trips/→ Upload → selectyellow_tripdata_2024-10.parquet. - Open or create
raw/lookup/→ Upload → selecttaxi_zone_lookup.csv(exact filename — Snowflake stage and DatabricksLOOKUP_DATA_PATHexpect this name).
Option B — Azure Storage Explorer
- Install Azure Storage Explorer.
- Connect with your MHP Azure account →
mhpdeworkshopsa→nyc-taxi-data. - Drag Parquet into
raw/trips/and CSV intoraw/lookup/.
Option C — Azure CLI (trainer workstation with az logged in)
# Set variables — use key1 from Portal → Access keys (trainer only; never commit)
ACCOUNT=mhpdeworkshopsa
KEY="<storage-account-key1>"
CONTAINER=nyc-taxi-data
az storage blob upload \
--account-name "$ACCOUNT" --account-key "$KEY" \
--container-name "$CONTAINER" \
--file ./yellow_tripdata_2024-10.parquet \
--name raw/trips/yellow_tripdata_2024-10.parquet \
--overwrite
az storage blob upload \
--account-name "$ACCOUNT" --account-key "$KEY" \
--container-name "$CONTAINER" \
--file ./taxi_zone_lookup.csv \
--name raw/lookup/taxi_zone_lookup.csv \
--overwriteStep 3 — Sanity-check before key/SAS distribution
| Check | Expected |
|---|---|
raw/trips/ |
At least one yellow_tripdata_*.parquet visible |
raw/lookup/taxi_zone_lookup.csv |
File present; ~265 data rows (+ header) |
| Bronze ingest (after key) | spark.read.parquet(TRIPS_DATA_PATH).count() ≈ 3M for Oct 2024 |
| Lookup ingest | Zone count 265 in 01_bronze_ingestion |
Do this upload before sections A–C below (key rotation and SAS creation).
A — Regenerate storage account key (Databricks)
Use a fresh key before each workshop cohort.
- Sign in to Azure Portal.
- Search Storage accounts → open
mhpdeworkshopsa. - Left menu → Security + networking → Access keys.
- Under key1, click Rotate key (or Regenerate — regenerates key1 and invalidates the old one).
- Click Show next to key1 → copy the key value.
- Store in your password manager — distribute to class verbally during Module 2 only.
Databricks notebooks use
STORAGE_ACCOUNT_KEYwith theabfss://path tonyc-taxi-data.
B — Create SAS token in Azure Portal (Snowflake)
Snowflake external stages need Read + List on blobs in nyc-taxi-data. Microsoft documents SAS creation in the portal here: Create SAS tokens (Azure portal).
Recommended: container-scoped SAS (least privilege — only nyc-taxi-data):
Azure Portal → storage account
mhpdeworkshopsa.Left menu → Data storage → Containers → click
nyc-taxi-data.Top menu → Generate SAS (or ⋯ → Generate SAS).
Set fields:
Field Workshop value Signing method Account key (Snowflake AZURE_SAS_TOKENexpects key-signed SAS on trial accounts)Permissions ✅ Read, ✅ List only — leave Write / Delete / Add unchecked Start Today (or workshop morning) Expiry Workshop date + 2 days buffer Allowed IP addresses (leave empty for classroom) Allowed protocols HTTPS only Signing key key1 Click Generate SAS token and URL.
Copy only the SAS token field (query string like
sv=2024-11-04&ss=b&srt=sco&sp=rl&se=...&sig=...).- For Snowflake
CREDENTIALS = (AZURE_SAS_TOKEN = '...'), paste the token without a leading?. - The portal shows the token once — save it immediately; you cannot retrieve it later.
- For Snowflake
Optional: copy Blob SAS URL to test in a browser or Azure Storage Explorer.
Alternative: account-level SAS (broader scope — use only if container SAS is unavailable):
Storage account
mhpdeworkshopsa→ Security + networking → Shared access signature.Configure:
Field Workshop value Allowed services Blob only Allowed resource types Container + Object Allowed permissions Read + List Start / Expiry Workshop date → +2 days Allowed protocols HTTPS only Signing key key1 Generate SAS and connection string → copy the SAS token query string (same rules as above).
C — Verify before class
Azure Portal / Storage Explorer
- Containers →
nyc-taxi-data→ raw/trips/ — Parquet files visible. - Open
raw/lookup/taxi_zone_lookup.csvexists.
Snowflake (trainer account)
Run after pasting SAS into snowflake/setup/02_external_stage.sql:
-- Replace 00_JUEWEI with your trainer ID (uppercase in Snowflake)
LIST @00_JUEWEI_BRONZE.nyc_taxi_trips_stage;
LIST @00_JUEWEI_BRONZE.nyc_taxi_lookup_stage;Both commands must return file names (not Access denied or empty error).
Databricks (trainer dry-run)
After 00_setup.py with ADLS2 key:
# In a notebook cell — should print a row count, not auth error
spark.read.parquet(TRIPS_DATA_PATH).count()D — Distribute to attendees
| Item | When | Format |
|---|---|---|
| Storage account key | Module 2 | Verbal only |
| SAS token | Module 3 | Printed card — warn about expiry date |
Common failures: expired SAS, extra spaces when copy-pasting token, using account key in Snowflake stage SQL, or regenerating key1 after Databricks setup without telling the class.
Aiven Kafka (Module 8 only)
- Create Aiven Kafka cluster — see aiven-streaming-setup.qmd
- Start User Activity generator (4 hours max on free tier — start on the morning of Module 8, not before)
- Download SSL certificates:
ca.pem,service.cert,service.key - Note the Service URI:
kafka-xxxxx.aivencloud.com:12345 - Start the relay consumer (
streaming/snowflake/00_relay_consumer.py) — requires ADLS2 key
Power BI (Module 7 demo + optional trainee self-paced)
Trainer — build the dashboard (Desktop, once before class)
- Use a Windows machine with Power BI Desktop (free to install)
- Follow Exercise: Power BI or
powerbi/README.md— connect to trainer Gold: Snowflake{trainer_id}_GOLDor Databricks{trainer_id}_gold - Connect via Snowflake or Azure Databricks — load all 12
kpi_*tables · choose Import for the main workshop demo - Build all five pages: Overview, Map, Time Analysis, Revenue, Efficiency — see Module 7 §3.1
- File → Save As →
YellowLine-NYC-KPIs.pbix(keep a local copy for offline demo fallback)
Trainer — publish to cloud workspace (if you have Power BI Pro / Fabric)
Reports are authored in Desktop; your cloud workspace hosts, refreshes, and shares them. You cannot realistically build this five-page Snowflake/Databricks dashboard from scratch in the browser alone.
| Step | Where | Action |
|---|---|---|
| 1 | Desktop | Sign in with your work Microsoft account (same tenant as the workspace) |
| 2 | Desktop | Home → Publish → select your workshop workspace (not My workspace unless you have no shared workspace) |
| 3 | Service | Open Power BI → workspace → confirm report + semantic model appear |
| 4 | Service | Refresh now on the dataset — Snowflake warehouse DE_WORKSHOP_WH must be Started |
| 5 | Service | Open each of the five report pages — especially Map (Azure Maps geocoding needs network) |
| 6 | Service (optional) | Settings → Scheduled refresh — daily refresh before class if using Import |
Sharing and licensing
| Your setup | What trainees need to view your published report |
|---|---|
| Workspace on Premium / Fabric capacity | Often no Pro — share workspace or report link (viewer) |
| Workspace without Premium capacity | Viewers typically need Power BI Pro (or you screen-share only) |
| No org workspace | Demo from Desktop screen share — still works; no attendee license needed |
Co-trainer access: add them as Member or Contributor on the workspace so they can open the report before Module 7.
Module 7 demo (optional, ~10 min) — pick one path:
| Path | When to use |
|---|---|
| A — Service (browser) | You published to a cloud workspace; maps and refresh tested in app.powerbi.com |
| B — Desktop (local) | Fallback if Service refresh fails, or you have no shared workspace |
| C — Skip live demo | Point trainees to Exercise: Power BI; animation already showed the dashboard |
Trainees — self-paced after Module 4 (optional)
- Not part of main-day timing — no classroom block required
- Prerequisites: Gold KPI tables from Modules 2–4; Windows + free Desktop only
- Point trainees to Power BI setup and Exercise: Power BI after dbt lab
- macOS/Linux attendees: read-only / defer to post-workshop Windows machine
Say once after Module 4:
“Priya’s dashboard is optional self-paced work — if you have Windows, install free Power BI Desktop and connect to the same Gold tables you just built. Full steps are in Setup → Power BI.”
Before the room opens (T-30 min)
| Task | Owner | Done |
|---|---|---|
| Test projector / second screen for animations | Lead | [ ] |
Open story site: https://mhp-data-engineer-2026.pages.dev/ (mirror: Vercel; fallback quarto preview port 4201) |
Co-trainer | [ ] |
Verify Databricks workspace mhpdeworkshop_databricks — Git folder MHP-DE-Workshop-2026, catalog grants, clusters Terminated (see Pre-class setup) |
Co-trainer | [ ] |
| Verify Snowflake on trainer’s own trial account (students create theirs during class) | Co-trainer | [ ] |
| Prepare credentials to distribute: SAS token, ADLS2 storage key, attendee IDs | Co-trainer | [ ] |
| Print architecture decision matrix (1 per trainee) | Co-trainer | [ ] |
| Open Google Form URLs — QR / short links ready per module | Co-trainer | [ ] |
| Photo / save blank whiteboard space for Story sketch | Lead | [ ] |
Power BI: published report opens in cloud workspace or local .pbix on Desktop (Module 7 demo) |
Co-trainer | [ ] |
.env / Codespaces tested on one machine |
Co-trainer | [ ] |
Credentials & Materials to Distribute
Each attendee needs the following credentials and materials during the workshop. Prepare these before class and distribute at the appropriate module.
| Item | When to distribute | Format | Notes |
|---|---|---|---|
| ATTENDEE_ID | Start of day (Module 1) | Printed card or slide | e.g., 01_alice, 02_bob — used in every schema/table name |
| Databricks workspace URL | Module 2 | Invite link via email | Trainer sends workspace invite to each attendee’s email before class |
| Trainer notebook paths | Module 2 | Slide or printed | Workspace → Users → <lead or co-trainer email> → MHP-DE-Workshop-2026 — both trainers share Can Run |
| ADLS2 Storage Account Key | Module 2 | Verbal or printed | Used by Databricks 00_setup.py to read Parquet from ADLS2. Never commit to Git. |
| SAS Token | Module 3 | Printed card or slide | Used by Snowflake 02_external_stage.sql to create External Stage. Has an expiry date — generate fresh before each workshop. |
| Databricks Personal Access Token | Only if using dbt with Databricks target | Self-service | Attendee generates their own via Settings → Access Tokens. Not needed if dbt only targets Snowflake. |
| Snowflake account | Self-service (before or during Module 3) | Attendee creates own trial at signup.snowflake.com | Attendee is ACCOUNTADMIN on their own account; 00_account_setup.sql creates DE_WORKSHOP_ROLE |
Unlike Databricks (trainer-managed workspace), each attendee creates their own Snowflake trial account. This means: - Attendees are ACCOUNTADMIN on their own accounts - They run 00_account_setup.sql themselves during Module 3 — this creates the database (DE_MASTERCLASS), warehouse (DE_WORKSHOP_WH), role (DE_WORKSHOP_ROLE), and personal schemas - The trainer cannot pre-verify attendee Snowflake accounts — only the trainer’s own account can be verified beforehand - dbt connects to Snowflake using the attendee’s own credentials (username/password + DE_WORKSHOP_ROLE)
Databricks Workspace
- Login — Open workspace
mhpdeworkshop_databricks(ID3359135813781456). Confirm sidebar shows Workspace, Catalog, Compute, SQL Editor. - Git folders (both trainers) — each Home →
MHP-DE-Workshop-2026, both trainer schemas exist (e.g.00_juewei_*,00_alisa_*), Share showsworkshop_trainees_2026Can Run. - Compute — Compute page: clusters Terminated (not Error);
Workshoppolicy visible; attendees can start or attach to pre-createdde-workshop-{id}clusters. - Unity Catalog — Catalog →
mhpdeworkshop_databricks_2026. Confirmworkshop_trainees_2026has USE CATALOG + CREATE SCHEMA; test schemas exist after dry-run of00_setup.py. - SQL Editor — Run
SELECT 1 AS teston a SQL warehouse or cluster. - Secrets (Module 8) —
databricks secrets list --scope workshop-scopereturns five keys.
Snowflake Snowsight (trainer’s own account)
These checks run on the trainer’s own Snowflake trial account to verify the UI paths work correctly. Attendees create their own accounts during Module 3.
- Login — Open your Snowsight URL. Confirm the left sidebar shows: Projects, Data, Compute, Admin sections.
- Warehouse — At the top-right, verify
DE_WORKSHOP_WHwarehouse exists and is selectable. If suspended, click to resume and confirm it shows Started within ~10 seconds. - Role — Verify
DE_WORKSHOP_ROLEis available in the role selector dropdown (created by00_account_setup.sql). - Worksheets — Navigate to Projects → Worksheets. Create a test worksheet → run
SELECT CURRENT_VERSION()→ confirm the result appears. - Databases — Navigate to Data → Databases →
DE_MASTERCLASS. Confirm your own schemas exist (e.g.00_JUEWEI_BRONZE,_SILVER,_GOLD— uppercase stem from yourATTENDEE_ID). - External Stage — Run
LIST @00_JUEWEI_BRONZE.nyc_taxi_trips_stage(replace with your trainer ID) to verify the SAS token works and Parquet files are listed.
dbt (Docker / Codespaces)
- Codespaces — Open a Codespace from the fork → run
dbt --versionin the terminal → confirmCore 1.8.xwith adapterssnowflakeanddatabricks. - Docker — Pull the workshop image:
docker pull ghcr.io/mhp-data-engineer/workshop-dbt:2026→ rundocker run --rm ghcr.io/mhp-data-engineer/workshop-dbt:2026 dbt --version. - Connection test — Inside the environment:
cd dbt_project && dbt debug --target snowflake→ confirmAll checks passed!.
Fallback if MP4 missing: Read animation beat from voiceover scripts while showing module story callout on screen.
Trainer roles
| Role | Responsibility |
|---|---|
| Lead | Story narration, reflection facilitation, theory, Module 7 discussion |
| Co-trainer | Lab roaming, environment issues, timing nudges, Power BI demo |
| Both | Never leave a stuck pair >5 min without a hint or checkpoint offer |
Main-day schedule
| Time | Module | Focus | Watch for |
|---|---|---|---|
| 09:00 | Story Welcome | Design worksheet | Save whiteboard photo |
| 09:30 | 1 Fundamentals | Medallion + KPIs | Keep to 35 min |
| 10:00 | 2 Databricks | Core lab | Do not steal lab time for discussion |
| 11:30 | 3 Snowflake | Core lab | Same KPIs narrative |
| 12:45 | Lunch | 45 min | |
| 13:30 | 4 dbt | Core lab | dbt ≠ warehouse |
| 15:00 | 5 Production | Scheduling / CI | LSDP naming |
| 15:45 | 6 AI | Cortex LLM only | Not Module 9 ML |
| 16:30 | 7 Wrap-up | Discussion + optional PBI | Matrix handout at silent write |
| 17:00 | End |
Hard stops: Start Module 2 by 10:00 · Start Module 3 by 11:30 · Start Module 4 by 13:30.
Per-module checklist (repeat every module)
Module-specific notes:
| Module | Trainer note |
|---|---|
| Story | Capture design whiteboard — revisit at 16:30 |
| 2 | Sofia voice: prototype before SQL simplification |
| 3 | “Same architecture. Different implementation philosophy.” |
| 4 | Elena: dbt on Snowflake |
| 6 | Do not demo ML.FORECAST |
| 7 | Theory ≤5 min · open discussion guide |
Per-module UI checkpoints (co-trainer verifies before each module)
| Module | Attendee UI should show | Co-trainer check |
|---|---|---|
| 2 Databricks | Cluster Running (green dot) in Compute page; notebooks visible in Workspace | Confirm all attendee clusters started; notebooks accessible in Shared folder |
| 3 Snowflake | Snowsight open on attendee’s own trial account; 00_account_setup.sql completed; warehouse Started |
Walk around — confirm each attendee has DE_MASTERCLASS database and DE_WORKSHOP_ROLE created; SAS token distributed and working |
| 4 dbt | Terminal open in Codespaces or Docker with dbt_project/ directory; dbt debug --target snowflake passing |
Walk around — check terminals for green All checks passed!; confirm profiles.yml uses DE_WORKSHOP_ROLE and DE_MASTERCLASS |
| 5 Production | Jobs & Pipelines page accessible in Databricks (formerly Workflows → Delta Live Tables; renamed to Lakeflow Declarative Pipelines); Snowflake worksheets with Task SQL ready | Pre-create one Lakeflow pipeline as demo; verify TASK_HISTORY() returns data |
| 6 AI Features | Genie icon visible in Databricks sidebar (under SQL section); Snowflake worksheets ready for Cortex SQL | Confirm AI_COMPLETE returns results (run test query); Genie page loads |
| 7 Wrap-up | No portal needed — whiteboard and discussion only | Print architecture decision matrix handouts |
Module 6 — Databricks AI prerequisites
Complete Step 10 and Step 10b during pre-class setup. Co-trainer verifies the table below before Module 6 (after attendees have Gold tables from Module 2–4).
| Check | Pass criteria |
|---|---|
| Partner-powered AI | On at account + workspace (Step 10b) |
| SQL warehouse | de-workshop-wh (Pro or Serverless) Started; trainees have Can use |
| Databricks SQL entitlement | Enabled for workshop_trainees_2026 (Step 4) |
| UC data access | SELECT on {attendee_id}_gold / kpi_* tables |
| Gold tables exist | 03_gold_kpis.py completed — Module 6 builds on Gold, not Bronze |
| Assistant | Ctrl+I in a notebook returns a code suggestion |
ai_query() |
Test query on de-workshop-wh returns a response (model name verified) |
| Genie | Sidebar icon loads; trainer test space answers a question on Gold KPIs |
| Snowflake Cortex | AI_COMPLETE test query returns on trainer Snowflake account |
Azure-specific note: ai_query() on Pro SQL warehouses requires Azure Private Link enabled for the workspace. If the test query fails with a Private Link error, use a Serverless SQL warehouse instead or work with MHP IT to enable Private Link.
Throughput (classroom scale): Genie defaults to ~20 questions/min per workspace — sufficient for ~20–30 trainees. Do not run a Genie stress test during the module. | 8 Streaming | Databricks cluster with Kafka Maven libs installed; Snowflake warehouse running | Verify Kafka libs on cluster (Compute → Libraries tab); Aiven topic has events flowing | | 9 ML | Databricks AI/ML → Experiments page accessible; Snowflake worksheets ready for ML.FORECAST | Confirm USE AI FUNCTIONS privilege + CORTEX_USER role granted; ML Runtime cluster available |
Module 7 runbook (30 min block)
| Min | Activity | Doc |
|---|---|---|
| 0–3 | Animation mod-07-wrapup.mp4 |
voiceovers |
| 3–5 | Silent write + decision matrix | matrix |
| 5–10 | Short theory: Objectives, PBI demo notes, When to Use What | Module 7 |
| 10–28 | Open discussion (Rounds 1–4) | discussion guide |
| 28–30 | Close: three constraints + “Technology is a decision…” | |
| +10 | Optional Power BI live demo | § Power BI — Service or Desktop |
If running PBI demo before discussion, cut Round 3 synthesis to 5 min.
Common classroom fixes
| Situation | Response |
|---|---|
| Pair stuck on Bronze ingest | Point to checkpoint data / co-trainer pairs in |
| “dbt replaces Snowflake” | Draw platform box; dbt inside as transform layer |
| Reflection runs long | 5-min timer; capture 3 bullets max on whiteboard |
| Running late in Module 2–4 | Cut discussion to 5 min — never cut lab |
| Vendor debate in Module 7 | “We’re advising Marcus, not picking a winner for MHP.” |
| Attendee cannot find notebooks in Databricks | Confirm you Shared MHP-DE-Workshop-2026 → workshop_trainees_2026 Can Run; guide to Workspace → Users → <trainer-email> → MHP-DE-Workshop-2026 |
Trainee cannot Create compute / no Workshop policy |
Grant Workshop policy Can use to workshop_trainees_2026 (Step 6) |
Module 8: PermissionDenied on dbutils.secrets.get |
Run secrets put-acl READ for workshop_trainees_2026 on workshop-scope (Step 11) |
Attendee cannot change ATTENDEE_ID |
Shared folder is read-only — Clone 00_setup.py to Home |
| Trainees see old notebook content after GitHub update | Git folders do not auto-sync — both trainers must Git → Pull on MHP-DE-Workshop-2026; trainees on shared folder inherit trainer’s checkout |
“Environment configurations are not saved” on Git .py notebook |
Expected until PEP 723 metadata is in the file — repo notebooks include it after pull. Tell attendees to attach a cluster (not Serverless); do not add PySpark in the Environment panel |
| Attendee’s Databricks cluster won’t start | Check Compute page for error message; try Restart; if stuck >5 min, assign a buddy cluster |
| Snowflake trial signup fails | Suggest using a different email; check spam folder for verification email; trial creation can take 5–10 min |
Snowflake 00_account_setup.sql fails |
Check attendee is using ACCOUNTADMIN role (default for trial); confirm SET attendee_id = '...' was run first |
Snowflake DE_WORKSHOP_ROLE not found |
The role is created by 00_account_setup.sql — re-run the script; or manually: CREATE ROLE DE_WORKSHOP_ROLE; |
| Snowflake External Stage cannot list files | SAS token may be expired or have extra spaces; re-copy from trainer handout; verify stage URL matches mhpdeworkshopsa.blob.core.windows.net |
| Snowflake warehouse shows “Suspended” | Click warehouse name at top-right → click Resume; wait ~10 seconds for Started status |
| Snowflake worksheet shows “No results” | Check session variable: SELECT $attendee_id; — if null, re-run the SET statement at the top |
dbt debug fails with connection error |
Check profiles.yml — confirm database: DE_MASTERCLASS, role: DE_WORKSHOP_ROLE, and correct Snowflake account/user/password |
| Databricks Experiments page is empty | The experiment appears after the first mlflow.start_run() call — run the training notebook first |
| Databricks “Cannot see catalog” error | Re-run GRANT on mhpdeworkshop_databricks_2026 from Step 5; or Catalog → Permissions UI |
| Git folder clone fails / repo too large | Enable sparse checkout with cone pattern databricks/notebooks only; see Git folders |
| Git folder commit rejected | New folder outside cone pattern — add pattern under Git folder Settings → Advanced → Cone patterns |
Databricks CLI 403 Forbidden |
PAT may be expired or lack admin scope — generate a new token: Settings → User Settings → Access Tokens; ensure workspace has admin consent for CLI apps |
Module 8: workshop-scope secrets scope missing |
Trainer must create scope before class (see Pre-class setup); or guide attendee: databricks secrets create-scope --scope workshop-scope |
ML.FORECAST returns error in Snowflake |
Verify GOLD_TRIPS_BY_HOUR table exists and has PICKUP_HOUR_TS timestamp column; check Cortex role |
| Genie icon visible but spaces won’t open | Enable Partner-powered AI features at account level — see Step 10b |
ai_query() fails or “not supported” |
Attach Pro or Serverless warehouse (not Classic); on Azure Pro, check Private Link or switch to Serverless |
ai_query() model not found |
List foundation models in workspace; update endpoint name in lab — see Module 6 model drift callout |
| Genie returns empty / permission error | Grant SELECT on Gold kpi_* tables in UC; confirm Genie space default warehouse is de-workshop-wh |
| Genie Code / Assistant missing in notebooks | Confirm Partner-powered AI features = On (workspace + account); try Ctrl+I in a notebook — no separate assistive toggle in current UI |
| Power BI cannot connect to Snowflake | Verify warehouse is Started; check server URL matches <account>.snowflakecomputing.com; use DirectQuery mode |
End-of-day close (2 min script)
One dataset. Three implementations. Priya’s dashboard didn’t care which engine built Gold.
Three constraints — cost, performance, compliance. Three decisions — platform, transform, consumption.
Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.
Technology is a decision. Architecture is responsibility.
Optional: 1–5 finger poll — “I could defend my tool choice to a client.”
After class
| Task | Owner |
|---|---|
| Save Story + Module 7 whiteboard photos | Lead |
| Note timing overruns for next delivery | Both |
| Log environment issues (catalog, warehouse, dbt target) | Co-trainer |
| Share tool comparison deep dive link for self-study | Lead |
Full dry-run checklist: docs/dry-run-checklist.md · pre-class-checklist.qmd
Document history
| Date | Change |
|---|---|
| 2026-06-05 | Git folder sync: manual Pull required (Step 3b); cluster vs Serverless; environment banner |
| 2026-06-05 | Added Power BI cloud workspace publish workflow for trainers (Desktop build → Service demo); aligned with five-page / 12-KPI exercise |
| 2026-06-05 | Updated per-module checklist for five-step rhythm; fixed aiven-kafka → workshop-scope secret scope |
| 2026-06-05 | Databricks + ADLS2 both documented under RG 1000_data_engineering_workshop |
| 2026-06-05 | ADLS2: direct Azure Portal link to RG 1000_data_engineering_workshop (mhpdev.onmicrosoft.com) |
| 2026-06-05 | ADLS2: document pre-provisioned mhpdeworkshopsa (RG 1000_data_engineering_workshop, Germany West Central) |
| 2026-06-05 | ADLS2: TLC download + upload to raw/trips/ and raw/lookup/ before key/SAS distribution |
| 2026-06-05 | ADLS2: Azure Portal steps for storage key + container SAS; two-trainer Git folder model |
| 2026-06-05 | Groups workshop_trainees_2026 / workshop_trainer_2026; Step 7 trainee self-create cluster; SELECT + secrets READ ACL; entitlements in Step 4 |
| 2026-06-05 | Added Module 6 Databricks AI prerequisites (Steps 10/10b, pre-class checks, classroom fixes) |
| 2026-06-05 | Step 10b: Partner-powered AI is sole workspace toggle; Genie Code verify checklist (no separate assistive toggle) |
| 2026-06-05 | Trainer IDs 00_{firstname} (2026: 00_juewei/00_alisa); reusable naming convention for other cohorts |
| 2026-06-05 | Two-trainer model, mandatory Git folder share to trainees; Cloudflare URL |
| 2026-06-05 | Expanded Databricks workspace admin guide (mhpdeworkshop_databricks, Git folders, UC grants); Cloudflare production URL |
| 2026-06-04 | Added pre-class infrastructure setup section (Databricks workspace, ADLS2, Aiven, Power BI); added Unity Catalog GRANT statements, cluster creation guide, Databricks CLI auth, SQL Warehouse setup |
| 2026-05-24 | Initial day-of facilitator runbook |