Open Discussion Guide

Module 7 capstone — trainee-led tool comparison

TRAINER ONLY

Facilitator guide for the Module 7 capstone: trainees compare Databricks, Snowflake, and dbt and defend a production tool choice for YellowLine NYC.

Duration: ~30 minutes total within Module 7
Format: Animation → silent write → open discussion → trainer close
Goal: Trainees lead; trainer facilitates — no vendor lecture

Related docs:


Module 7 Flow (Full Block)

Step Activity Duration Doc
1 Play mod-07-wrapup.mp4 4 min animation-production-scripts.md
2 Silent reflection 2 min reflection-prompts.qmd § Module 7
3 Short theory recap (optional) 5 min Three pipelines, one dataset — trainer only if needed
4 Open discussion 20–25 min This guide
5 Trainer close 2 min This guide § Close
6 Power BI demo (optional) 10 min Facilitator guide § Power BI

Timing note: If running the Power BI demo, shorten Round 3 synthesis to 5 minutes or move the demo before discussion so trainees discuss with the dashboard fresh in mind.


Facilitator Mindset

Do

  • Ask follow-up questions: “Why?” “What would you give up?” “Who maintains it?”
  • Capture trainee words on the whiteboard — not slide content.
  • Let disagreement happen; compare trade-offs, not personalities.
  • Remind: platform, transform layer, and consumption (Power BI) are separate decisions.
  • Revisit the Story whiteboard at the end.

Do not

  • Open with a comparison table slide — trainees should build it.
  • Declare a single “correct” stack for YellowLine NYC.
  • Let one vocal participant dominate — rotate speakers in Round 1.
  • Conflate dbt with a warehouse — correct gently if someone says “we’ll use dbt instead of Snowflake.”
  • Conflate Module 6 Cortex LLM with Module 9 Cortex ML — they are different APIs and sessions.

Cortex naming (say if confusion arises):

Module Cortex usage Purpose
6 AI Features AI_COMPLETE, Copilot, Genie LLM assistants — SQL, exploration
9 ML (optional) ML.FORECAST, ML.ANOMALY_DETECTION, Snowpark ML Predictive models

Elena’s framing (say once at start of discussion):

“Marcus doesn’t need the best tool in the industry. He needs the best fit for his SQL team, his auditors, and his budget. You built all three paths today — now recommend with evidence.”


Before Discussion — Silent Reflection (2 min)

Hand out the architecture decision matrix (one printed page per trainee) or use the prompts below verbally if no handout.

Display on screen or read aloud:

  1. My recommended stack for YellowLine NYC is: Databricks / Snowflake / dbt / combination
  2. One sentence why (must reference at least one row from the matrix):
  3. One tool I would not choose as the primary platform and why:

No talking. No laptop research. Pens only.

Facilitator: Start a 2-minute timer. When it ends, go straight to Round 1 — do not ask for hands first.

Three-constraint reminder (project during silent fill)

Trainees should weigh every recommendation against three orthogonal dimensions Marcus named over the day:

Constraint Marcus’s question Where it appeared
Cost Can we afford this in year 3? Story brief; Round 2 Card B
Performance Fast enough for live dispatch? Module 8 framing; Round 2 Card C
Compliance Audit-ready lineage by Q3? Module 4 dbt motivation; Round 2 Card D

Round 1 — Share (8 minutes)

Optional opener (5 min): If trainees completed ex-batch-comparison, ask for one observation per column — “Where did the tools feel most different?” — then proceed to production-stack recommendations below.

Purpose: Hear diverse recommendations before debate.

Format:

  • Ask for 3–4 volunteers, 2 minutes each.
  • No interruptions during each share.
  • Facilitator notes one keyword per speaker on whiteboard.

Prompt:

“Who wants to go first? Tell Marcus what you’d run in production and one reason.”

Capture table (start empty):

Speaker Recommended stack Main reason
1
2
3
4

If no volunteers: Call on regions of the room or pair reps — “Table two — what did you write?”

Listen for (note for Round 2):

  • Snowflake + dbt combinations (common and valid)
  • Databricks-only recommendations (valid for scale/ML path)
  • Treating dbt as primary platform (needs correction in Round 2)

Round 2 — Challenge (8 minutes)

Purpose: Stress-test recommendations against changing constraints.

Format: Announce a constraint card → ask “Does your recommendation still hold?” → brief debate → next card. Spend ~2 minutes per card; use 2–3 cards, not all six.

Constraint cards

Pick cards that match what you heard in Round 1.

Card A — SQL-only team

“Marcus confirms: five SQL analysts, zero Python developers on staff. Nothing changes for two years.”

Follow-ups:

  • What happens to the Databricks notebook path?
  • Is Snowpark enough, or is pure SQL required?
  • Where does dbt fit for this team?

Card B — Budget cut 40%

“Finance cuts the data platform budget by forty percent. One primary platform license.”

Follow-ups:

  • Do you still run two platforms?
  • What do you drop — ingest tooling, transform layer, or duplicate pipelines?
  • Can dbt Core on an existing warehouse reduce cost?

Card C — Real-time in six months

“Marcus needs live zone demand signals in six months — Module 8 streaming.”

Follow-ups:

  • Which platform from today gives the clearest path to streaming?
  • Does your batch stack choice block or help streaming later?
  • Would you split batch and streaming across platforms?

Card D — Audit and lineage

“Regulators audit in Q3. Every dashboard tile must trace to source with tests.”

Follow-ups:

  • What did dbt add that Snowflake worksheets alone lacked?
  • Can Unity Catalog or Snowflake Horizon replace dbt docs for Marcus’s board?
  • Minimum viable governance stack?

Card E — ML and tipping model

“Marcus wants tip prediction and driver incentives — Module 9 ML — within a year.”

Follow-ups:

  • Does Databricks become non-negotiable?
  • Can Snowflake Cortex or Snowpark ML suffice for SQL-heavy teams?
  • Where do features live — dbt table, Silver, or notebook?
  • Would you batch-score to Gold for Priya’s Power BI page?

If Module 9 was delivered: Ask trainees to reference their RMSE comparison from ex-ml.


Card F — Speed to first dashboard

“Marcus needs something on his desk in two weeks, full platform decision in six months.”

Follow-ups:

  • Fastest path to Priya’s Overview page?
  • Build throwaway vs build for production?
  • Minimum Bronze/Silver/Gold for one KPI?

Facilitator tip: If the room converges too fast, play the opposite card (e.g. ML card after everyone picks Snowflake + dbt).


Round 3 — Synthesis (8 minutes)

Purpose: Build a comparison table from group consensus, not slides.

Format: Facilitate row by row. Ask for hands or short shouts; write exact phrases.

Whiteboard table (fill live)

Dimension Databricks Snowflake dbt
Best for
Weak for
Fit for Marcus’s SQL team
Ingest strength
Transform / governance
Power BI consumption

Row prompts:

  1. “One thing Databricks clearly won today — shout it out.”
  2. “Where was Databricks overkill for YellowLine NYC?”
  3. “What does dbt do that isn’t just SQL in a worksheet?”
  4. “Priya connected Power BI to Gold — does that favor any platform?”

Separate three decisions (draw on board)

┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│  PLATFORM       │   │  TRANSFORM      │   │  CONSUMPTION    │
│  (where data    │   │  (logic, tests, │   │  (Power BI —    │
│   runs)         │   │   lineage)      │   │   Priya)        │
│  Databricks /   │   │  dbt / native   │   │  Gold KPIs      │
│  Snowflake / …  │   │  SQL / notebooks│   │                 │
└─────────────────┘   └─────────────────┘   └─────────────────┘

Key teaching moment:

“dbt is not a third warehouse. It runs on Databricks or Snowflake. Marcus might choose Snowflake + dbt + Power BI — three layers, three roles.”


Round 4 — Architecture Revisit (5 minutes)

Purpose: Close the loop to the Story.

Bring back Story whiteboard (photo or still on board).

Questions:

  1. “What would you change in your morning design now that you’ve built all three pipelines?”
  2. “What did you get right on day one?”
  3. “If you were Elena, what would you tell Marcus to run on Monday?”

Optional poll (hands or Mentimeter):

  • Primary platform: Databricks / Snowflake / Both
  • Transform layer: Native SQL / dbt / Notebooks
  • Confidence: “I could defend my choice to a client” — 1–5 fingers

Trainer Close (2 minutes)

Use this script — adapt, do not read verbatim if the room is engaged:

“Today you saw one dataset flow through medallion architecture three ways. Priya’s Power BI dashboard didn’t care which engine built Gold — same schema, same KPIs.

There’s no universal winner. Real projects choose from skills, cost, governance, and what’s next — streaming, ML, audit. Three constraints stayed with us all day:

  • Cost — can YellowLine NYC afford this in year 3?
  • Performance — fast enough for live dispatch later?
  • Compliance — audit-ready lineage by Q3?

Remember three decisions: platform, transform layer, consumption. YellowLine NYC might combine tools. Your job as data engineers is to recommend with evidence — like you did in this room.

Look at your Story sketch. You weren’t wrong to guess. Now you’ve proved it in code.”

Closing line (deliver as the final sentence, project as title card if using slides):

“Technology is a decision. Architecture is responsibility.”

Optional Elena line:

“MHP often lands on Snowflake + dbt for SQL-heavy clients and keeps Databricks for heavy engineering — but that’s a pattern, not a rule. Marcus pays for your recommendation, not ours.”


Reference Material — Use Only If Discussion Stalls

Do not project this at the start. Use if silence exceeds 30 seconds or factual confusion arises.

dbt clarification

Misconception Correction
“dbt replaces Snowflake” dbt sends SQL to Snowflake (or Databricks)
“We only need dbt” dbt does not ingest raw Parquet from ADLS2 by itself
“dbt is only for docs” Tests and materializations are core value

Comparison dimensions (trainer crib sheet)

Dimension Databricks Snowflake dbt
Primary user Data engineer / ML engineer SQL analyst / analytics engineer Analytics engineer
Ingest Spark, Auto Loader, DLT, direct ADLS2 External stages, Snowpipe, COPY INTO Reads existing Bronze tables
Transform PySpark, SQL, Delta Lake SQL, Snowpark Python SQL + Jinja, ref(), macros
Governance Unity Catalog Horizon, tags, masking Tests, docs, lineage graph
Scheduling Workflows, DLT pipelines Tasks, Streams CI/CD, dbt build in GitHub Actions
Power BI Gold via Databricks connector Gold via Snowflake connector Gold via underlying warehouse
Learning curve for SQL team Higher (notebooks) Lower (worksheets) Low–medium (SQL + YAML)
Strong when Scale, Spark, ML, streaming path SQL ops, elastic DWH, sharing Lineage, tests, transform-as-code

Example “reasonable” stacks (not answers to give — discussion seeds if stalled)

Stack When it fits YellowLine NYC
Snowflake + dbt + Power BI SQL team maintains; auditors need lineage
Databricks only + Power BI Small eng team; ML/streaming on roadmap
Databricks ingest + Snowflake Gold + dbt Rare split — discuss complexity cost
Snowflake only (no dbt) Fast start; weaker lineage story for board

Optional Power BI Demo — Placement Options

Option When to use
A — Before discussion Visual payoff first; discussion references live dashboard
B — After discussion Discussion stays abstract; demo as “Priya’s deliverable”
C — Skip live demo Point to Exercise: Power BI; animation already showed full dashboard

Delivery path (see facilitator guide): screen-share from Power BI Service (published workspace) or Desktop .pbix fallback.

Demo talking points (2 min max if time-tight):

  • Built in Desktop, optionally published to trainer cloud workspace — all 12 kpi_* tables, five pages
  • Same report connects to Databricks or Snowflake Gold — switch data source only
  • Twelve kpi_* tables — no relationships required
  • Priya’s five questions from the Story are answered on five pages

Handling Common Classroom Situations

Situation Response
One person dominates “Thank you — let’s hear someone who disagrees.”
Vendor debate gets heated “We’re not picking a winner for MHP — we’re advising Marcus.”
Trainee says “just use everything” “Marcus has budget for one primary platform. What do you cut?”
Confusion on dbt Draw platform box with dbt inside as transform layer
Room is quiet after Round 1 Use constraint Card B or F — concrete scenarios unlock opinions
Running out of time Skip Round 4 poll; keep synthesis table + close + Story revisit

Printable Facilitator Timing Card

MODULE 7 — OPEN DISCUSSION (~30 min)
────────────────────────────────────
[ ] Animation mod-07-wrapup.mp4     4 min
[ ] Silent write                     2 min
[ ] Round 1 — Share (3–4 speakers)   8 min
[ ] Round 2 — Challenge (2–3 cards)  8 min
[ ] Round 3 — Synthesis table        8 min
[ ] Round 4 — Story revisit       5 min  ← shorten if needed
[ ] Trainer close                    2 min
[ ] Power BI demo (optional)        10 min
────────────────────────────────────
OPEN:  "What would YOU choose for YellowLine NYC?"
CLOSE: Platform | Transform | Consumption — three decisions

Success Signals

Discussion succeeded if trainees:


Document History

Date Change
2026-05-23 Initial Module 7 open discussion facilitation guide
2026-05-23 Expanded Module 9 ML constraint card follow-ups
2026-05-23 ex-batch-comparison opener; Cortex Module 6 vs 9 distinction
2026-05-24 Decision-matrix handout link; three-constraint reminder (cost / performance / compliance); closing tagline