Databricks Setup Guide
Workspace configuration and environment verification
title: “Databricks Setup Guide” subtitle: “Workspace configuration and environment verification” —
You will run notebooks inside your Databricks workspace (browser-based). The code files live in your forked repository — access them via GitHub Codespaces (all tools pre-installed) or clone locally. Either way, notebooks need to be imported or synced into Databricks using one of the options below.
Access
- You will receive a workspace invite link from the trainer
- Click the link and log in with your credentials
- Navigate to Catalog → verify you can see
mhpdeworkshop_databricks_2026
Notebooks
The trainer’s reference notebooks are in their Home folder, shared with you at CAN RUN access.
To find them: click Workspace in the sidebar → Users → juewei.jin@mhp.com → MHP-DE-Workshop-2026. You can open and run notebooks directly from there — it is read-only for trainees (view and run only, no edit or delete).
To have your own editable copy in your home folder, choose one of the options below.
Option A — Git folder (recommended)
In the sidebar, click Home to open your personal folder (
/Users/<your-email>/).Shortcut via Repos: If you see Repos in the sidebar, clicking it shows a prompt: “You can now create Git folders outside the Repos folder. Go to home folder and create Git folder.” Click Go to home folder — Databricks opens your Home folder with the Git folder dialog automatically. Skip to step 3.
Click ⋮ → Create → Git folder.
Fill in the dialog:
Field Value Git repository URL https://github.com/<YOUR-USERNAME>/MHPDataEngineerWorkshop.git(your fork)Git provider GitHub Git folder name MHP-DE-Workshop-2026Sparse checkout mode ✅ Enable Cone patterns databricks/notebooksClick Create Git folder — notebooks appear under your home folder. Changes you make stay in your copy and don’t affect other trainees.
GitHub credentials: If prompted, link your GitHub account first: click your username → Settings → Linked accounts → Add Git credential.
Option B — Manual import (fallback)
Use this if GitHub access is unavailable.
- Download the four
.pynotebook files from the repository:databricks/notebooks/ - In the sidebar, click Workspace → navigate to your home folder
- Click ⋮ → Import → File
- Upload each
.pyfile (00_setup.pythrough04_ai_features.py)
Running the Setup Notebook
Open
databricks/notebooks/00_setup.pyCritical: Set your
ATTENDEE_IDin the first cell:ATTENDEE_ID = "01_alice" # Replace with your assigned IDRun all cells
Verify in Catalog that three schemas were created:
{attendee_id}_bronze{attendee_id}_silver{attendee_id}_gold
Cluster Configuration
The setup notebook includes Spark configuration. If you need to create a cluster manually:
- Runtime: 15.4 LTS or later
- Node type: Standard_DS3_v2 (or equivalent)
- Workers: 1 (sufficient for workshop data volume)
- Auto-terminate: 30 minutes
ADLS2 Access
The trainer will provide the storage account key during the session. The setup notebook configures Spark to access ADLS2:
spark.conf.set(
f"fs.azure.account.key.{STORAGE_ACCOUNT}.dfs.core.windows.net",
STORAGE_ACCOUNT_KEY
)The storage account key is sensitive. Never commit it to Git. It will be provided verbally during the training and configured in the notebook.
Troubleshooting
| Issue | Solution |
|---|---|
| Cannot see catalog | Ask trainer to grant access to mhpdeworkshop_databricks_2026 |
| Cluster won’t start | Check workspace quotas; try a smaller node type |
| ADLS2 access denied | Verify STORAGE_ACCOUNT_KEY is set correctly |
| Git folder fails | Use Option B — download .py files from GitHub and import manually into your home folder |