Databricks Setup Guide

Workspace configuration and environment verification

title: “Databricks Setup Guide” subtitle: “Workspace configuration and environment verification” —

TipWorking Environment

You will run notebooks inside your Databricks workspace (browser-based). The code files live in your forked repository — access them via GitHub Codespaces (all tools pre-installed) or clone locally. Either way, notebooks need to be imported or synced into Databricks using one of the options below.

Access

  1. You will receive a workspace invite link from the trainer
  2. Click the link and log in with your credentials
  3. Navigate to Catalog → verify you can see mhpdeworkshop_databricks_2026

Notebooks

The trainer’s reference notebooks are in their Home folder, shared with you at CAN RUN access.

To find them: click Workspace in the sidebar → Usersjuewei.jin@mhp.comMHP-DE-Workshop-2026. You can open and run notebooks directly from there — it is read-only for trainees (view and run only, no edit or delete).

To have your own editable copy in your home folder, choose one of the options below.

Option B — Manual import (fallback)

Use this if GitHub access is unavailable.

  1. Download the four .py notebook files from the repository: databricks/notebooks/
  2. In the sidebar, click Workspace → navigate to your home folder
  3. Click ImportFile
  4. Upload each .py file (00_setup.py through 04_ai_features.py)

Running the Setup Notebook

  1. Open databricks/notebooks/00_setup.py

  2. Critical: Set your ATTENDEE_ID in the first cell:

    ATTENDEE_ID = "01_alice"  # Replace with your assigned ID
  3. Run all cells

  4. Verify in Catalog that three schemas were created:

    • {attendee_id}_bronze
    • {attendee_id}_silver
    • {attendee_id}_gold

Cluster Configuration

The setup notebook includes Spark configuration. If you need to create a cluster manually:

  • Runtime: 15.4 LTS or later
  • Node type: Standard_DS3_v2 (or equivalent)
  • Workers: 1 (sufficient for workshop data volume)
  • Auto-terminate: 30 minutes

ADLS2 Access

The trainer will provide the storage account key during the session. The setup notebook configures Spark to access ADLS2:

spark.conf.set(
    f"fs.azure.account.key.{STORAGE_ACCOUNT}.dfs.core.windows.net",
    STORAGE_ACCOUNT_KEY
)
WarningCredential Security

The storage account key is sensitive. Never commit it to Git. It will be provided verbally during the training and configured in the notebook.

Troubleshooting

Issue Solution
Cannot see catalog Ask trainer to grant access to mhpdeworkshop_databricks_2026
Cluster won’t start Check workspace quotas; try a smaller node type
ADLS2 access denied Verify STORAGE_ACCOUNT_KEY is set correctly
Git folder fails Use Option B — download .py files from GitHub and import manually into your home folder