Prerequisites
What you need before the training day
Before You Start — 3 Steps
Follow these steps before the workshop day. They take about 10–15 minutes.
Step 1: Fork the Repository
- Open github.com/jinjuewei/MHPDataEngineerWorkshop
- Click Fork (top-right) → keep the defaults → Create fork
- You now have your own copy at
github.com/<your-username>/MHPDataEngineerWorkshop
Why fork? Your fork is your personal working copy. You can commit changes, push code, and open Codespaces from it without affecting other trainees.
Step 2: Open the Training Website
The training site (module content, setup guides, exercises) is built into the repository. Open it from your fork using GitHub Codespaces — no installation needed.
- Go to your fork on GitHub → Code → Codespaces → Create codespace on main
- Wait ~2 minutes for the container to build
- In the Codespace terminal:
cd workshop-2026-v2/_site
python -m http.server 8000- Codespaces auto-forwards port 8000 — click Open in Browser in the notification.
Trainer-provided URL (recommended): https://mhp-data-engineer-2026.pages.dev/ — open directly in your browser; enter the access token your trainer shares. Mirror: Vercel.
Local machine (no Quarto needed):
cd workshop-2026-v2/_site
python -m http.server 8000
# Open http://localhost:8000GitHub Pages (public repos only): 1. Fork → Settings → Pages 2. Branch: main — Folder: /workshop-2026-v2/_site → Save 3. Live at https://<your-username>.github.io/MHPDataEngineerWorkshop
Step 3: Open Your Working Environment
Your Codespace from Step 2 is already your working environment — all tools are pre-installed at /workspace/. The only remaining step is configuring your credentials (see Configure Credentials below).
| Tool | Status |
|---|---|
| Python 3.13+ | ✅ pre-installed |
| dbt Core 1.8+ | ✅ pre-installed |
| Git | ✅ pre-installed |
| VS Code | ✅ browser-based |
git clone https://github.com/<your-username>/MHPDataEngineerWorkshop.git
cd MHPDataEngineerWorkshop
# Install dependencies
uv pip install --system .
# Verify dbt
dbt --versionAccounts & Access
Databricks
Snowflake
Workshop Credentials (Provided by Trainer)
dbt
- Codespaces: pre-installed ✅ — no action needed
- Local: Python 3.13+ and dbt Core 1.8+ required (
dbt --version)
Power BI (optional — self-paced after Module 4)
Not required for the main workshop day. Priya’s dashboard exercise connects to your Gold kpi_* tables after the pipeline labs.
| Requirement | Notes |
|---|---|
| Windows PC | Power BI Desktop is Windows only — free download |
| Microsoft account | Optional — only if you publish to My Workspace (free personal workspace) |
| Gold KPI tables | Complete Modules 2–4 exercises first (Databricks / Snowflake + dbt) |
| Same warehouse credentials | Snowflake login or Databricks PAT — already in your .env |
Module 7: Trainer may demo a pre-built dashboard. Trainees without Windows can follow the story and read the Power BI dashboard guide on this site.
Configure Credentials
Where to run these commands? - Codespaces users: run everything below in the Codespace terminal (the VS Code terminal inside your browser-based Codespace — not your local machine). Your Codespace is your working environment. - Local machine users: run everything in your local terminal after
cd MHPDataEngineerWorkshop.
1. Environment Variables (.env)
Many scripts and notebooks read credentials from environment variables. Set them up first:
# Run this in your Codespace terminal (or local terminal)
cp .env.template .envThis creates a personal .env file inside your Codespace (or local repo). It will not be uploaded to GitHub — .gitignore keeps it private.
Open .env in the Codespace editor and replace every placeholder with your real credentials:
Variables you must fill in
| Variable | Where to find it | What it’s used for |
|---|---|---|
ATTENDEE_ID |
Provided by trainer (e.g., 01_alice) |
All modules — creates your personal schemas (01_alice_BRONZE, _SILVER, _GOLD) in Databricks, Snowflake, and dbt so each trainee works in isolation |
DATABRICKS_HOST |
Your Databricks workspace URL (without https://) |
Modules 2, 4, 8 — connects Python scripts, dbt, and the Databricks CLI to your workspace |
DATABRICKS_TOKEN |
Databricks PAT — User Settings → Access Tokens | Modules 2, 4, 8 — authenticates API calls and dbt runs against your Databricks workspace |
DATABRICKS_HTTP_PATH |
SQL Warehouse HTTP path — SQL Warehouses → Connection details | Module 4 — dbt connects to the SQL Warehouse to run models on Databricks |
SNOWFLAKE_ACCOUNT |
Your Snowflake account URL (e.g., abc12345.west-europe.azure) |
Modules 3, 4, 9 — connects Snowpark Python scripts and dbt to your Snowflake trial |
SNOWFLAKE_USER |
Your Snowflake login username | Modules 3, 4, 9 — Snowflake authentication |
SNOWFLAKE_PASSWORD |
Your Snowflake login password | Modules 3, 4, 9 — Snowflake authentication |
DATABRICKS_HOST 1. Log in to your Databricks workspace in the browser 2. Copy the URL from the address bar — it looks like https://adb-1234567890.1.azuredatabricks.net 3. Remove the https:// prefix → paste adb-1234567890.1.azuredatabricks.net
DATABRICKS_TOKEN 1. In your Databricks workspace, click your username/email (top-right corner) 2. Select Settings 3. Go to the Developer tab 4. Next to Access tokens, click Manage 5. Click Generate new token 6. Give it a name (e.g., workshop), set lifetime to 90 days 7. Click Generate → copy the token immediately (it’s only shown once!)
DATABRICKS_HTTP_PATH 1. In the Databricks sidebar, click SQL Warehouses 2. Click the warehouse name (trainer will tell you which one, e.g., de-workshop-wh) 3. Click the Connection details tab 4. Copy the HTTP Path value — it looks like /sql/1.0/warehouses/abc123def456
SNOWFLAKE_ACCOUNT 1. Log in to Snowsight 2. Click your account name (bottom-left corner) 3. Select View account details 4. In the Account Details dialog, find Account URL 5. Copy just the domain part without https:// and without .snowflakecomputing.com - Example full URL: https://abc12345.west-europe.azure.snowflakecomputing.com - What to paste in .env: abc12345.west-europe.azure
Tip: if you can’t find it, run this SQL in a Snowsight worksheet:
SELECT CURRENT_ACCOUNT() || '.' || CURRENT_REGION();
SNOWFLAKE_USER / SNOWFLAKE_PASSWORD - These are the credentials you created when you signed up for the Snowflake trial at signup.snowflake.com
Pre-filled defaults (no change needed)
These values are already correct — they match what snowflake/setup/00_account_setup.sql creates:
| Variable | Default value | What it’s used for |
|---|---|---|
SNOWFLAKE_WAREHOUSE |
DE_WORKSHOP_WH |
The compute warehouse created by the setup script — used by all Snowflake queries |
SNOWFLAKE_DATABASE |
DE_MASTERCLASS |
The main database that holds your Bronze/Silver/Gold schemas |
SNOWFLAKE_SCHEMA |
PUBLIC |
Default schema (your actual work goes to {ATTENDEE_ID}_SILVER etc.) |
SNOWFLAKE_ROLE |
DE_WORKSHOP_ROLE |
The role with permissions to read stages and write to your schemas |
Important:
.envis listed in.gitignore— it will never be committed to Git. Never share this file.
Codespaces users — the Codespace starts before you create .env, so the variables are not loaded yet. You have two options:
Option A (quick): Run the setup script now to load everything immediately:
bash .devcontainer/setup-environment.shThis loads .env, generates ~/.dbt/profiles.yml, and shows ✅ for each variable.
Option B (persistent): Restart the Codespace (⌘/Ctrl+Shift+P → Codespaces: Restart). On restart, .env is loaded automatically into all terminal tabs — no manual step needed.
2. dbt Profiles
~/.dbt/profiles.yml is generated automatically when you either: - Run bash .devcontainer/setup-environment.sh (Option A above), or - Restart the Codespace (Option B above)
Verify it:
dbt debug --target snowflakecd dbt_project/
cp profiles.yml.example profiles.yml
# Edit profiles.yml with your Databricks and Snowflake credentials
dbt debug # verify connectionTroubleshooting
If you have trouble, see the tool-specific setup guides:
- Databricks Setup
- Snowflake Setup
- dbt Setup
- Power BI Setup (optional, after Module 4)
Day-of Checklist
Before the training starts, verify: