Open Discussion Guide

Module 7 capstone — trainee-led tool comparison

Facilitator guide for the Module 7 capstone: trainees compare Databricks, Snowflake, and dbt and defend a production tool choice for YellowLine NYC.

Duration: ~30 minutes total within Module 7
Format: Animation → silent write → open discussion → trainer close
Goal: Trainees lead; trainer facilitates — no vendor lecture

Related docs:

architecture-decision-matrix.qmd — structured workshop matrix trainees fill during silent reflection
reflection-prompts.qmd — Module 7 silent reflection prompts
animation-fallback.qmd — mod-07-wrapup.mp4 narration if MP4 unavailable
exercises/ex-batch-comparison.qmd — observation table (optional Module 7 opener)
modules/07-comparison-wrapup.qmd — technical reference (use only if discussion stalls)

Module 7 Flow (Full Block)

Step	Activity	Duration	Doc
1	Play `mod-07-wrapup.mp4`	4 min	animation-fallback.qmd
2	Silent reflection	2 min	reflection-prompts.qmd § Module 7
3	Short theory recap (optional)	5 min	Three pipelines, one dataset — trainer only if needed
4	Open discussion	20–25 min	This guide
5	Trainer close	2 min	This guide § Close
6	Power BI demo (optional)	10 min	Pre-class checklist § Power BI

Timing note: If running the Power BI demo, shorten Round 3 synthesis to 5 minutes or move the demo before discussion so trainees discuss with the dashboard fresh in mind.

Facilitator Mindset

Do

Ask follow-up questions: “Why?” “What would you give up?” “Who maintains it?”
Capture trainee words on the whiteboard — not slide content.
Let disagreement happen; compare trade-offs, not personalities.
Remind: platform, transform layer, and consumption (Power BI) are separate decisions.
Revisit the Story whiteboard at the end.

Do not

Open with a comparison table slide — trainees should build it.
Declare a single “correct” stack for YellowLine NYC.
Let one vocal participant dominate — rotate speakers in Round 1.
Conflate dbt with a warehouse — correct gently if someone says “we’ll use dbt instead of Snowflake.”
Conflate Module 6 Cortex LLM with Module 9 Cortex ML — they are different APIs and sessions.

Cortex naming (say if confusion arises):

Module	Cortex usage	Purpose
6 AI Features	`AI_COMPLETE`, Copilot, Genie	LLM assistants — SQL, exploration
9 ML (optional)	`ML.FORECAST`, `ML.ANOMALY_DETECTION`, Snowpark ML	Predictive models

Elena’s framing (say once at start of discussion):

“Marcus doesn’t need the best tool in the industry. He needs the best fit for his SQL team, his auditors, and his budget. You built all three paths today — now recommend with evidence.”

Before Discussion — Silent Reflection (2 min)

Hand out the architecture decision matrix (one printed page per trainee) or use the prompts below verbally if no handout.

Display on screen or read aloud:

My recommended stack for YellowLine NYC is: Databricks / Snowflake / dbt / combination
One sentence why (must reference at least one row from the matrix):
One tool I would not choose as the primary platform and why:

No talking. No laptop research. Pens only.

Facilitator: Start a 2-minute timer. When it ends, go straight to Round 1 — do not ask for hands first.

Three-constraint reminder (project during silent fill)

Trainees should weigh every recommendation against three orthogonal dimensions Marcus named over the day:

Constraint	Marcus’s question	Where it appeared
Cost	Can we afford this in year 3?	Story brief; Round 2 Card B
Performance	Fast enough for live dispatch?	Module 8 framing; Round 2 Card C
Compliance	Audit-ready lineage by Q3?	Module 4 dbt motivation; Round 2 Card D

Round 1 — Share (8 minutes)

Optional opener (5 min): If trainees completed ex-batch-comparison, ask for one observation per column — “Where did the tools feel most different?” — then proceed to production-stack recommendations below.

Purpose: Hear diverse recommendations before debate.

Format:

Ask for 3–4 volunteers, 2 minutes each.
No interruptions during each share.
Facilitator notes one keyword per speaker on whiteboard.

Prompt:

“Who wants to go first? Tell Marcus what you’d run in production and one reason.”

Capture table (start empty):

Speaker	Recommended stack	Main reason
1
2
3
4

If no volunteers: Call on regions of the room or pair reps — “Table two — what did you write?”

Listen for (note for Round 2):

Snowflake + dbt combinations (common and valid)
Databricks-only recommendations (valid for scale/ML path)
Treating dbt as primary platform (needs correction in Round 2)

Round 2 — Challenge (8 minutes)

Purpose: Stress-test recommendations against changing constraints.

Format: Announce a constraint card → ask “Does your recommendation still hold?” → brief debate → next card. Spend ~2 minutes per card; use 2–3 cards, not all six.

Constraint cards

Pick cards that match what you heard in Round 1.

Card A — SQL-only team

“Marcus confirms: five SQL analysts, zero Python developers on staff. Nothing changes for two years.”

Follow-ups:

What happens to the Databricks notebook path?
Is Snowpark enough, or is pure SQL required?
Where does dbt fit for this team?

Card B — Budget cut 40%

“Finance cuts the data platform budget by forty percent. One primary platform license.”

Follow-ups:

Do you still run two platforms?
What do you drop — ingest tooling, transform layer, or duplicate pipelines?
Can dbt Core on an existing warehouse reduce cost?

Card C — Real-time in six months

“Marcus needs live zone demand signals in six months — Module 8 streaming.”

Follow-ups:

Which platform from today gives the clearest path to streaming?
Does your batch stack choice block or help streaming later?
Would you split batch and streaming across platforms?

Card D — Audit and lineage

“Regulators audit in Q3. Every dashboard tile must trace to source with tests.”

Follow-ups:

What did dbt add that Workspaces SQL files alone lacked?
Can Unity Catalog or Snowflake Horizon replace dbt docs for Marcus’s board?
Minimum viable governance stack?

Card E — ML and tipping model

“Marcus wants tip prediction and driver incentives — Module 9 ML — within a year.”

Follow-ups:

Does Databricks become non-negotiable?
Can Snowflake Cortex or Snowpark ML suffice for SQL-heavy teams?
Where do features live — dbt table, Silver, or notebook?
Would you batch-score to Gold for Priya’s Power BI page?

If Module 9 was delivered: Ask trainees to reference their RMSE comparison from ex-ml.

Card F — Speed to first dashboard

“Marcus needs something on his desk in two weeks, full platform decision in six months.”

Follow-ups:

Fastest path to Priya’s Overview page?
Build throwaway vs build for production?
Minimum Bronze/Silver/Gold for one KPI?

Facilitator tip: If the room converges too fast, play the opposite card (e.g. ML card after everyone picks Snowflake + dbt).

Trainer answers — constraint cards

Warning

Trainer only — Reasonable themes to steer debate if the room stalls. Not single correct stacks.

Card	Core answer themes
A — SQL-only	Snowflake + dbt primary; Databricks notebooks become consultant-only maintenance burden. Snowpark optional for Python-in-Snowflake; pure SQL Workspaces files + Tasks for nightly medallion.
B — Budget −40%	One primary platform — drop duplicate pipelines, not Gold contract. Keep dbt Core (free) on existing warehouse for lineage. Cut idle compute (auto-suspend, right-size warehouse).
C — Real-time in 6 mo	Databricks Structured Streaming or Snowflake Dynamic Tables / dbt `dynamic_table` — batch-only `COPY INTO` nightly is insufficient for dispatch. May accept higher ops cost for one streaming path.
D — Audit Q3	dbt tests + `dbt docs` lineage minimum; trace Power BI tile → Gold `kpi_*` → Silver `ref()` → Bronze source. Unity Catalog / Horizon help but do not replace versioned transform SQL.
E — ML tipping	Features in dbt (`ml_features_tip_prediction`); train in Databricks MLflow (flexibility) or Snowflake Cortex/Snowpark (SQL team). Batch-score to Gold for Priya — live inference only if ops needs sub-hour updates.
F — Dashboard in 2 weeks	Fastest: finish one pipeline path to Gold (`kpi_trips_by_hour` + Overview page). Throwaway acceptable for demo; production decision in 6 months. Minimum: Bronze ingest + Silver quality + one Gold mart.

Round 3 — Synthesis (8 minutes)

Purpose: Build a comparison table from group consensus, not slides.

Format: Facilitate row by row. Ask for hands or short shouts; write exact phrases.

Whiteboard table (fill live)

Dimension	Databricks	Snowflake	dbt
Best for
Weak for
Fit for Marcus’s SQL team
Ingest strength
Transform / governance
Power BI consumption

Row prompts:

“One thing Databricks clearly won today — shout it out.”
“Where was Databricks overkill for YellowLine NYC?”
“What does dbt do that isn’t just SQL in a Workspaces SQL file?”
“Priya connected Power BI to Gold — does that favor any platform?”

Separate three decisions (draw on board)

┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│  PLATFORM       │   │  TRANSFORM      │   │  CONSUMPTION    │
│  (where data    │   │  (logic, tests, │   │  (Power BI —    │
│   runs)         │   │   lineage)      │   │   Priya)        │
│  Databricks /   │   │  dbt / native   │   │  Gold KPIs      │
│  Snowflake / …  │   │  SQL / notebooks│   │                 │
└─────────────────┘   └─────────────────┘   └─────────────────┘

Key teaching moment:

“dbt is not a third warehouse. It runs on Databricks or Snowflake. Marcus might choose Snowflake + dbt + Power BI — three layers, three roles.”

Trainer answers — synthesis table (if room stalls)

Use only if silence exceeds 30 seconds. Prefer trainee words when available.

Dimension	Databricks	Snowflake	dbt
Best for	Spark ingest, Delta, ML/streaming path	SQL ops, elastic DWH, sharing	Lineage, tests, transform-as-code
Weak for	SQL-only team maintenance	Heavy custom Spark ML	Ingest, storage, scheduling alone
Fit for Marcus’s SQL team	Low (notebooks)	High	High (SQL + YAML)
Ingest strength	Strong (Auto Loader, `spark.read`)	Strong (stages, Snowpipe, `COPY INTO`)	None — reads existing Bronze
Transform / governance	Notebooks + Unity Catalog	SQL + Horizon	`ref()`, tests, docs graph
Power BI consumption	Gold via Databricks connector	Gold via Snowflake connector	Gold via underlying warehouse

Round 4 — Architecture Revisit (5 minutes)

Purpose: Close the loop to the Story.

Bring back Story whiteboard (photo or still on board).

Questions:

“What would you change in your morning design now that you’ve built all three pipelines?”
“What did you get right on day one?”
“If you were Elena, what would you tell Marcus to run on Monday?”

Optional poll (hands or Mentimeter):

Primary platform: Databricks / Snowflake / Both
Transform layer: Native SQL / dbt / Notebooks
Confidence: “I could defend my choice to a client” — 1–5 fingers

Trainer answers — architecture revisit

#	Question	Answer
1	Change morning design?	Add dbt for lineage, explicit Silver quality rules, Gold as BI contract (12 `kpi_*`), separate ingest vs transform owners.
2	Got right day one?	Three layers, need for clean data before BI, Priya needs aggregates not raw trips — celebrate partial medallion thinking.
3	Elena tells Marcus Monday?	Start one path to production Gold (likely Snowflake + dbt if SQL team); schedule nightly job; Power BI on Gold; document lineage before Q3 audit. No big-bang three-platform run.

Trainer Close (2 minutes)

Use this script — adapt, do not read verbatim if the room is engaged:

“Today you saw one dataset flow through medallion architecture three ways. Priya’s Power BI dashboard didn’t care which engine built Gold — same schema, same KPIs.

There’s no universal winner. Real projects choose from skills, cost, governance, and what’s next — streaming, ML, audit. Three constraints stayed with us all day:

Cost — can YellowLine NYC afford this in year 3?

Performance — fast enough for live dispatch later?

Compliance — audit-ready lineage by Q3?

Remember three decisions: platform, transform layer, consumption. YellowLine NYC might combine tools. Your job as data engineers is to recommend with evidence — like you did in this room.

Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.”

Closing line (deliver as the final sentence, project as title card if using slides):

“Technology is a decision. Architecture is responsibility.”

Optional Elena line:

“MHP often lands on Snowflake + dbt for SQL-heavy clients and keeps Databricks for heavy engineering — but that’s a pattern, not a rule. Marcus pays for your recommendation, not ours.”

Reference Material — Use Only If Discussion Stalls

Do not project this at the start. Use if silence exceeds 30 seconds or factual confusion arises.

dbt clarification

Misconception	Correction
“dbt replaces Snowflake”	dbt sends SQL to Snowflake (or Databricks)
“We only need dbt”	dbt does not ingest raw Parquet from ADLS2 by itself
“dbt is only for docs”	Tests and materializations are core value

Comparison dimensions (trainer crib sheet)

Dimension	Databricks	Snowflake	dbt
Primary user	Data engineer / ML engineer	SQL analyst / analytics engineer	Analytics engineer
Ingest	Spark, Auto Loader, DLT, direct ADLS2	External stages, Snowpipe, `COPY INTO`	Reads existing Bronze tables
Transform	PySpark, SQL, Delta Lake	SQL, Snowpark Python	SQL + Jinja, `ref()`, macros
Governance	Unity Catalog	Horizon, tags, masking	Tests, docs, lineage graph
Scheduling	Workflows, DLT pipelines	Tasks, Streams	CI/CD, `dbt build` in GitHub Actions
Power BI	Gold via Databricks connector	Gold via Snowflake connector	Gold via underlying warehouse
Learning curve for SQL team	Higher (notebooks)	Lower (Workspaces SQL files)	Low–medium (SQL + YAML)
Strong when	Scale, Spark, ML, streaming path	SQL ops, elastic DWH, sharing	Lineage, tests, transform-as-code

Example “reasonable” stacks (not answers to give — discussion seeds if stalled)

Stack	When it fits YellowLine NYC
Snowflake + dbt + Power BI	SQL team maintains; auditors need lineage
Databricks only + Power BI	Small eng team; ML/streaming on roadmap
Databricks ingest + Snowflake Gold + dbt	Rare split — discuss complexity cost
Snowflake only (no dbt)	Fast start; weaker lineage story for board

Optional Power BI Demo — Placement Options

Option	When to use
A — Before discussion	Visual payoff first; discussion references live dashboard
B — After discussion	Discussion stays abstract; demo as “Priya’s deliverable”
C — Skip live demo	Point to Exercise: Power BI; animation already showed full dashboard

Delivery path (see pre-class checklist § Power BI): screen-share from Power BI Service (published workspace) or Desktop .pbix fallback.

Demo talking points (2 min max if time-tight):

Built in Desktop, optionally published to trainer cloud workspace — all 12 kpi_* tables, five pages
Same report connects to Databricks or Snowflake Gold — switch data source only
Twelve kpi_* tables — no relationships required
Priya’s five questions from the Story are answered on five pages

Handling Common Classroom Situations

Situation	Response
One person dominates	“Thank you — let’s hear someone who disagrees.”
Vendor debate gets heated	“We’re not picking a winner for MHP — we’re advising Marcus.”
Trainee says “just use everything”	“Marcus has budget for one primary platform. What do you cut?”
Confusion on dbt	Draw platform box with dbt inside as transform layer
Room is quiet after Round 1	Use constraint Card B or F — concrete scenarios unlock opinions
Running out of time	Skip Round 4 poll; keep synthesis table + close + Story revisit

Printable Facilitator Timing Card

MODULE 7 — OPEN DISCUSSION (~30 min)
────────────────────────────────────
[ ] Animation mod-07-wrapup.mp4     4 min
[ ] Silent write                     2 min
[ ] Round 1 — Share (3–4 speakers)   8 min
[ ] Round 2 — Challenge (2–3 cards)  8 min
[ ] Round 3 — Synthesis table        8 min
[ ] Round 4 — Story revisit       5 min  ← shorten if needed
[ ] Trainer close                    2 min
[ ] Power BI demo (optional)        10 min
────────────────────────────────────
OPEN:  "What would YOU choose for YellowLine NYC?"
CLOSE: Platform | Transform | Consumption — three decisions

Success Signals

Discussion succeeded if trainees:

Named at least one strength and one weakness per tool
Did not treat dbt as a warehouse replacement
Referenced Marcus’s SQL team or audit constraint unprompted
Mentioned Priya / Power BI as consumption layer
Changed or defended their Story design with new evidence

Document History

Date	Change
2026-06-18	Trainer answer keys for constraint cards, synthesis table, and Round 4 revisit
2026-05-23	Initial Module 7 open discussion facilitation guide
2026-05-23	Expanded Module 9 ML constraint card follow-ups
2026-05-23	ex-batch-comparison opener; Cortex Module 6 vs 9 distinction
2026-05-24	Decision-matrix handout link; three-constraint reminder (cost / performance / compliance); closing tagline