Exercise: dbt Pipeline
YellowLine NYC story · full hands-on lab
title: “Exercise: dbt Pipeline” subtitle: “YellowLine NYC story · full hands-on lab” —
Estimated time: 30–35 min (Base: 15 min · Stretch goals: 15–20 min)
YellowLine NYC context (Module 4)
The board wants lineage and tests. Add dbt on top of Snowflake.
Open your fork on GitHub → Code → Codespaces → Create codespace on main. dbt Core, Python, and all dependencies are pre-installed. Skip straight to Step 1.
Local alternative: Install dbt Core 1.8+ manually:
pip install dbt-core dbt-databricks dbt-snowflakePrerequisites
- Codespaces: dbt is pre-installed ✅
- Local: dbt Core 1.8+ installed (
dbt --version) profiles.ymlconfigured with at least one target (Databricks or Snowflake)- Bronze tables exist (from Databricks or Snowflake exercise)
Base Exercise
Step 1: Setup
cd dbt_project/
# Copy and configure profiles
cp profiles.yml.example profiles.yml
# Edit profiles.yml with your credentials
# Verify connection
dbt debug
# Install packages
dbt deps
# Load seed data
dbt seedStep 2: Run the Pipeline
# Run all models
dbt run --target snowflake # or --target databricks
# You should see:
# staging: 2 models (views)
# silver: 2 models (tables)
# gold: 12 models (tables)Step 3: Run Tests
dbt test
# Check results — all tests should pass
# Tests check: not_null, accepted_values, data quality constraintsStep 4: Generate Documentation
dbt docs generate
dbt docs serve
# Opens browser with interactive documentation
# Explore the lineage graph — click any model to see SQL and testsStep 5: Verify Gold KPIs
Connect to your target database and query the Gold tables:
SELECT * FROM kpi_trips_by_hour ORDER BY pickup_hour;
SELECT * FROM kpi_top_pickup_zones WHERE trip_rank <= 5;
SELECT * FROM kpi_data_quality_metrics;Stretch Goals
A: Switch Target
# Run against the OTHER backend
dbt run --target databricks # if you ran snowflake first
# Compare outputs — KPI values should match across backendsB: Write a Custom Test
Create dbt_project/tests/generic/test_no_empty_gold_tables.sql:
-- This test fails if any Gold KPI table has zero rows
{% test has_rows(model) %}
select count(*) as row_count
from {{ model }}
having count(*) = 0
{% endtest %}C: Add a New KPI Model
Create dbt_project/models/gold/kpi_avg_tip_by_borough.sql:
select
pickup_borough,
count(*) as total_trips,
round(avg(tip_amount), 2) as avg_tip,
round(avg(tip_percentage), 2) as avg_tip_pct
from {{ ref('silver_nyc_taxi_enriched') }}
where pickup_borough is not null
group by pickup_borough
order by avg_tip descThen run:
dbt run --select kpi_avg_tip_by_borough
dbt test --select kpi_avg_tip_by_boroughD: Explore Macros
- Open
macros/time_of_day.sql - Run
dbt compileto see how the macro expands in a model - Check
target/compiled/for the expanded SQL
E: Use dbt build
# Run + test in dependency order (most efficient for CI)
dbt build --target snowflakeHead to the Batch Pipeline Comparison page for a full side-by-side breakdown of Databricks, Snowflake, and dbt — verified against official documentation — including storage formats, compute models, cost implications, and key gotchas.
Cleanup
dbt models are stored as tables/views in your target database. To clean up:
# Drop all dbt-created models from the target
dbt run --target snowflake --full-refresh # resets incremental state
# Then manually drop schemas if needed:
# DROP SCHEMA IF EXISTS {attendee_id}_GOLD CASCADE;If using the Snowflake backend, also suspend the warehouse:
ALTER WAREHOUSE DE_WORKSHOP_WH SUSPEND;Return to module
Source: merged from frozen workshop-2026-v1/exercises/ex-dbt.qmd — do not edit workshop-2026-v1/.