Lab · Snowflake (Codebase, optional)

Module 3 stretch — Snowflake CLI + Snowpark from local terminal

title: “Lab · Snowflake (Codebase, optional)” subtitle: “Module 3 stretch — Snowflake CLI + Snowpark from local terminal” —

ImportantCodebase paths vs main lab

The required lab is Exercise: Snowflake: A.1 (Snowsight SQL) and B.1 (Workspaces Notebook) — both run entirely inside Snowsight.

This optional page covers the codebase alternatives that run from a local terminal (Codespaces or your machine). These paths are common in production CI/CD environments.

Module 3 does not require this page. Try it if you have extra time after completing the main lab.

Module: 3 — Snowflake Pipeline · Required lab: Exercise: Snowflake · Setup: Snowflake

When to use each path

A.1 / B.1 — Snowsight A.2 / B.2 — Codebase
Where code runs Inside Snowflake (browser) Your machine / Codespaces
How you connect Git workspace + Snowflake session .env credentials
Setup needed Git sync only .env + pip install (or snow CLI pre-installed)
Best for Interactive development, workshop labs CI/CD pipelines, automation, terminal-first workflows
When in practice Module 3 labs, debugging, ad-hoc queries Production deployments, GitHub Actions, scheduled jobs
NoteWhy two compute models?
  • A.1 (Snowsight SQL) runs on a warehouse — same SQL engine as A.2.
  • B.1 (Workspaces Notebook) runs on Container Runtime (SPCS) via a compute pool — the notebook kernel executes inside Snowflake.
  • A.2 (Snowflake CLI) sends SQL to a warehouse via the snow CLI from your terminal.
  • B.2 (Snowpark Local) uses the Snowpark client library, which translates DataFrame operations to SQL and sends them to a warehouse. No compute pool needed.

Key insight: Snowsight paths (A.1, B.1) are managed by Snowflake — no credentials on your machine. Codebase paths (A.2, B.2) require .env with your Snowflake credentials — more setup, but full control for automation.

Prerequisites

  • Completed the main Snowflake lab (Bronze, Silver, Gold tables exist)
  • Role DE_WORKSHOP_ROLE, warehouse DE_WORKSHOP_WH Started

Configure .env

Both codebase paths need a local .env. Snowsight paths (A.1, B.1) do not — credentials are managed by Snowflake.

  1. Create .env from the template:

    cp .env.template .env
    copy .env.template .env
    cp .env.template .env
  2. Edit .env — in Codespaces: click .env in the Explorer sidebar (left) to open it in the built-in editor — fill in these values:

    Variable Value Where to find it
    ATTENDEE_ID e.g. de_01_alice Assigned by trainer — see My Workshop
    SNOWFLAKE_ACCOUNT e.g. el30551.west-europe.azure Snowsight URL: …/west-europe.azure/el30551/… → copy the two segments (no https://, no .snowflakecomputing.com)
    SNOWFLAKE_USER your Snowflake username Trial signup email
    SNOWFLAKE_PASSWORD your Snowflake password Trial signup

    Leave defaults for: SNOWFLAKE_WAREHOUSE=DE_WORKSHOP_WH, SNOWFLAKE_DATABASE=DE_MASTERCLASS, SNOWFLAKE_ROLE=DE_WORKSHOP_ROLE.

  3. Load into your terminal:

    source .env
    Get-Content .env | ForEach-Object { if ($_ -match '^([^#][^=]+)=(.*)$') { [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2]) } }

You can skip Databricks variables (DATABRICKS_HOST, DATABRICKS_TOKEN, DATABRICKS_HTTP_PATH) — they are only needed for Module 4 (dbt).

NoteHow .env powers each codebase path
  • A.2 (Snowflake CLI): setup-environment.sh reads .env and generates ~/.snowflake/config.toml — the CLI picks up connection details automatically.
  • B.2 (Snowpark): _snowpark_bootstrap.py reads .env directly to create a Snowpark session.

A.2 — Snowflake CLI

The devcontainer has the Snowflake CLI (snow) pre-installed. After configuring .env, run each .sql file from the terminal:

snow sql -f snowflake/sql/bronze/01_ingest_trips.sql -c default
snow sql -f snowflake/sql/silver/01_create_cleaned.sql -c default
snow sql -f snowflake/sql/silver/02_create_enriched.sql -c default
snow sql -f snowflake/sql/gold/01_create_kpis.sql -c default

Set attendee_id in the SQL file’s SET statement before running (replace de_XX_yourname with your actual ID), or pass it as a CLI variable:

snow sql -f snowflake/sql/bronze/01_ingest_trips.sql -c default -D "attendee_id=$ATTENDEE_ID"

B.2 — Snowpark (Local Terminal)

The Python process runs locally; DataFrame operations are pushed down to the Snowflake warehouse as SQL. Connection details come from snowflake/snowpark/_snowpark_bootstrap.py via .env environment variables (setup).

All packages are pre-installed from pyproject.toml. Skip to running the scripts:

source .env
python snowflake/snowpark/00_snowpark_setup.py   # optional — verify connectivity
python snowflake/snowpark/01_bronze_ingestion.py
python snowflake/snowpark/02_silver_cleaning.py
python snowflake/snowpark/03_gold_kpis.py

Install packages first, then run:

uv pip install --system .
# Load .env (see step 3 above)
python snowflake/snowpark/00_snowpark_setup.py   # optional — verify connectivity
python snowflake/snowpark/01_bronze_ingestion.py
python snowflake/snowpark/02_silver_cleaning.py
python snowflake/snowpark/03_gold_kpis.py
uv pip install --system .
source .env
python snowflake/snowpark/00_snowpark_setup.py   # optional — verify connectivity
python snowflake/snowpark/01_bronze_ingestion.py
python snowflake/snowpark/02_silver_cleaning.py
python snowflake/snowpark/03_gold_kpis.py
  1. Run 01_bronze_ingestion.py — look for [PASS] on all Bronze verification lines
  2. Run 02_silver_cleaning.py
  3. Run 03_gold_kpis.py — confirm kpi_top_pickup_zones shows 20 rows and all 12 kpi_* tables exist

Expected Results

Same as the main lab — Bronze ~3M rows, Silver ~2.5-2.8M, Gold KPIs identical.

Cleanup

Suspend your warehouse when finished:

ALTER WAREHOUSE DE_WORKSHOP_WH SUSPEND;

No compute pool cleanup needed — codebase paths (A.2, B.2) use the warehouse only, not SPCS.


Return to module