# Open Discussion Guide — Module 7 Tool Comparison

Facilitator guide for the **Module 7 capstone**: trainees compare Databricks, Snowflake, and dbt
and defend a production tool choice for YellowLine NYC.

**Duration**: ~30 minutes total within Module 7  
**Format**: Animation → silent write → open discussion → trainer close  
**Goal**: Trainees lead; trainer facilitates — **no vendor lecture**

**Related docs**:

- [architecture-decision-matrix.md](architecture-decision-matrix.md) — **structured workshop matrix** trainees fill during silent reflection
- [reflection-prompts.md](reflection-prompts.md) — Module 7 silent reflection prompts
- [animation-production-scripts.md](../docs/animation-production-scripts.md) — `mod-07-wrapup.mp4`
- [TRAINING_MATERIAL_MIGRATION_PLAN.md](../../TRAINING_MATERIAL_MIGRATION_PLAN.md) — storyline context
- [module-prerequisites-and-order.md](../docs/module-prerequisites-and-order.md) — editorial rules *(workshop-2026 only; do not edit workshop-2026-v1/)*
- [exercises/ex-batch-comparison.qmd](../exercises/ex-batch-comparison.qmd) — observation table *(optional Module 7 opener)*
- [modules/07-comparison-wrapup.qmd](../modules/07-comparison-wrapup.qmd) — technical reference *(use only if discussion stalls)*

---

## Module 7 Flow (Full Block)

| Step | Activity | Duration | Doc |
|------|----------|----------|-----|
| 1 | Play `mod-07-wrapup.mp4` | 4 min | [animation-production-scripts.md](../docs/animation-production-scripts.md) |
| 2 | Silent reflection | 2 min | [reflection-prompts.md](reflection-prompts.md) § Module 7 |
| 3 | Short theory recap *(optional)* | 5 min | Three pipelines, one dataset — trainer only if needed |
| 4 | **Open discussion** | 20–25 min | **This guide** |
| 5 | Trainer close | 2 min | This guide § Close |
| 6 | Power BI demo *(optional)* | 10 min | [`powerbi/README.md`](../../powerbi/README.md) |

**Timing note**: If running the Power BI demo, shorten Round 3 synthesis to 5 minutes or move the
demo **before** discussion so trainees discuss with the dashboard fresh in mind.

---

## Facilitator Mindset

### Do

- Ask follow-up questions: *"Why?"* *"What would you give up?"* *"Who maintains it?"*
- Capture **trainee words** on the whiteboard — not slide content.
- Let disagreement happen; compare trade-offs, not personalities.
- Remind: **platform**, **transform layer**, and **consumption** (Power BI) are separate decisions.
- Revisit the **Story whiteboard** at the end.

### Do not

- Open with a comparison table slide — trainees should build it.
- Declare a single "correct" stack for YellowLine NYC.
- Let one vocal participant dominate — rotate speakers in Round 1.
- Conflate dbt with a warehouse — correct gently if someone says *"we'll use dbt instead of Snowflake."*
- Conflate **Module 6 Cortex LLM** with **Module 9 Cortex ML** — they are different APIs and sessions.

**Cortex naming** (say if confusion arises):

| Module | Cortex usage | Purpose |
|--------|--------------|---------|
| 6 AI Features | `AI_COMPLETE`, Copilot, Genie | LLM assistants — SQL, exploration |
| 9 ML *(optional)* | `ML.FORECAST`, `ML.ANOMALY_DETECTION`, Snowpark ML | Predictive models |

**Elena's framing** (say once at start of discussion):

> *"Marcus doesn't need the best tool in the industry. He needs the best fit for his SQL team, his
> auditors, and his budget. You built all three paths today — now recommend with evidence."*

---

## Before Discussion — Silent Reflection (2 min)

**Hand out** the [architecture decision matrix](architecture-decision-matrix.md)
(one printed page per trainee) **or** use the prompts below verbally if no handout.

Display on screen or read aloud:

1. **My recommended stack for YellowLine NYC is**: Databricks / Snowflake / dbt / combination
2. **One sentence why** (must reference at least one row from the matrix):
3. **One tool I would not choose as the primary platform and why**:

No talking. No laptop research. Pens only.

**Facilitator**: Start a 2-minute timer. When it ends, go straight to Round 1 — do not ask for
hands first.

### Three-constraint reminder (project during silent fill)

Trainees should weigh every recommendation against three orthogonal dimensions Marcus
named over the day:

| Constraint | Marcus's question | Where it appeared |
|------------|-------------------|-------------------|
| **Cost** | *Can we afford this in year 3?* | Story brief; Round 2 Card B |
| **Performance** | *Fast enough for live dispatch?* | Module 8 framing; Round 2 Card C |
| **Compliance** | *Audit-ready lineage by Q3?* | Module 4 dbt motivation; Round 2 Card D |

---

## Round 1 — Share (8 minutes)

**Optional opener** (5 min): If trainees completed [ex-batch-comparison](../exercises/ex-batch-comparison.qmd),
ask for one observation per column — *"Where did the tools feel most different?"* — then proceed to
production-stack recommendations below.

**Purpose**: Hear diverse recommendations before debate.

**Format**:

- Ask for **3–4 volunteers**, 2 minutes each.
- No interruptions during each share.
- Facilitator notes one keyword per speaker on whiteboard.

**Prompt**:

> *"Who wants to go first? Tell Marcus what you'd run in production and one reason."*

**Capture table** (start empty):

| Speaker | Recommended stack | Main reason |
|---------|-------------------|-------------|
| 1 | | |
| 2 | | |
| 3 | | |
| 4 | | |

**If no volunteers**: Call on regions of the room or pair reps — *"Table two — what did you write?"*

**Listen for** (note for Round 2):

- Snowflake + dbt combinations (common and valid)
- Databricks-only recommendations (valid for scale/ML path)
- Treating dbt as primary platform (needs correction in Round 2)

---

## Round 2 — Challenge (8 minutes)

**Purpose**: Stress-test recommendations against changing constraints.

**Format**: Announce a constraint card → ask *"Does your recommendation still hold?"* → brief debate
→ next card. Spend ~2 minutes per card; use **2–3 cards**, not all six.

### Constraint cards

Pick cards that match what you heard in Round 1.

#### Card A — SQL-only team

> *"Marcus confirms: five SQL analysts, zero Python developers on staff. Nothing changes for two
> years."*

**Follow-ups**:

- What happens to the Databricks notebook path?
- Is Snowpark enough, or is pure SQL required?
- Where does dbt fit for this team?

---

#### Card B — Budget cut 40%

> *"Finance cuts the data platform budget by forty percent. One primary platform license."*

**Follow-ups**:

- Do you still run two platforms?
- What do you drop — ingest tooling, transform layer, or duplicate pipelines?
- Can dbt Core on an existing warehouse reduce cost?

---

#### Card C — Real-time in six months

> *"Marcus needs live zone demand signals in six months — Module 8 streaming."*

**Follow-ups**:

- Which platform from today gives the clearest path to streaming?
- Does your batch stack choice block or help streaming later?
- Would you split batch and streaming across platforms?

---

#### Card D — Audit and lineage

> *"Regulators audit in Q3. Every dashboard tile must trace to source with tests."*

**Follow-ups**:

- What did dbt add that Snowflake worksheets alone lacked?
- Can Unity Catalog or Snowflake Horizon replace dbt docs for Marcus's board?
- Minimum viable governance stack?

---

#### Card E — ML and tipping model

> *"Marcus wants tip prediction and driver incentives — Module 9 ML — within a year."*

**Follow-ups**:

- Does Databricks become non-negotiable?
- Can Snowflake Cortex or Snowpark ML suffice for SQL-heavy teams?
- Where do features live — dbt table, Silver, or notebook?
- Would you batch-score to Gold for Priya's Power BI page?

**If Module 9 was delivered**: Ask trainees to reference their RMSE comparison from `ex-ml`.

---

#### Card F — Speed to first dashboard

> *"Marcus needs *something* on his desk in two weeks, full platform decision in six months."*

**Follow-ups**:

- Fastest path to Priya's Overview page?
- Build throwaway vs build for production?
- Minimum Bronze/Silver/Gold for one KPI?

---

**Facilitator tip**: If the room converges too fast, play the **opposite** card (e.g. ML card after
everyone picks Snowflake + dbt).

---

## Round 3 — Synthesis (8 minutes)

**Purpose**: Build a comparison table from **group consensus**, not slides.

**Format**: Facilitate row by row. Ask for hands or short shouts; write exact phrases.

### Whiteboard table (fill live)

| Dimension | Databricks | Snowflake | dbt |
|-----------|------------|-----------|-----|
| **Best for** | | | |
| **Weak for** | | | |
| **Fit for Marcus's SQL team** | | | |
| **Ingest strength** | | | |
| **Transform / governance** | | | |
| **Power BI consumption** | | | |

**Row prompts**:

1. *"One thing Databricks clearly won today — shout it out."*
2. *"Where was Databricks overkill for YellowLine NYC?"*
3. *"What does dbt do that isn't just SQL in a worksheet?"*
4. *"Priya connected Power BI to Gold — does that favor any platform?"*

### Separate three decisions (draw on board)

```
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│  PLATFORM       │   │  TRANSFORM      │   │  CONSUMPTION    │
│  (where data    │   │  (logic, tests, │   │  (Power BI —    │
│   runs)         │   │   lineage)      │   │   Priya)        │
│  Databricks /   │   │  dbt / native   │   │  Gold KPIs      │
│  Snowflake / …  │   │  SQL / notebooks│   │                 │
└─────────────────┘   └─────────────────┘   └─────────────────┘
```

**Key teaching moment**:

> *"dbt is not a third warehouse. It runs on Databricks or Snowflake. Marcus might choose
> Snowflake + dbt + Power BI — three layers, three roles."*

---

## Round 4 — Architecture Revisit (5 minutes)

**Purpose**: Close the loop to the Story.

**Bring back Story whiteboard** (photo or still on board).

**Questions**:

1. *"What would you change in your morning design now that you've built all three pipelines?"*
2. *"What did you get right on day one?"*
3. *"If you were Elena, what would you tell Marcus to run on Monday?"*

**Optional poll** (hands or Mentimeter):

- Primary platform: Databricks / Snowflake / Both
- Transform layer: Native SQL / dbt / Notebooks
- Confidence: *"I could defend my choice to a client"* — 1–5 fingers

---

## Trainer Close (2 minutes)

Use this script — adapt, do not read verbatim if the room is engaged:

> *"Today you saw one dataset flow through medallion architecture three ways. Priya's Power BI
> dashboard didn't care which engine built Gold — same schema, same KPIs.*
>
> *There's no universal winner. Real projects choose from **skills, cost, governance, and what's
> next** — streaming, ML, audit. Three constraints stayed with us all day:*
>
> - ***Cost** — can YellowLine NYC afford this in year 3?*
> - ***Performance** — fast enough for live dispatch later?*
> - ***Compliance** — audit-ready lineage by Q3?*
>
> *Remember three decisions: **platform**, **transform layer**, **consumption**. YellowLine NYC might
> combine tools. Your job as data engineers is to recommend with evidence — like you did in this
> room.*
>
> *Look at your Story sketch. You weren't wrong to guess. Now you've proved it in code.*"

**Closing line** (deliver as the final sentence, project as title card if using slides):

> ***"Technology is a decision. Architecture is responsibility."***

**Optional Elena line**:

> *"MHP often lands on Snowflake + dbt for SQL-heavy clients and keeps Databricks for heavy
> engineering — but that's a pattern, not a rule. Marcus pays for your recommendation, not ours."*

---

## Reference Material — Use Only If Discussion Stalls

Do **not** project this at the start. Use if silence exceeds 30 seconds or factual confusion arises.

### dbt clarification

| Misconception | Correction |
|---------------|------------|
| "dbt replaces Snowflake" | dbt sends SQL **to** Snowflake (or Databricks) |
| "We only need dbt" | dbt does not ingest raw Parquet from ADLS2 by itself |
| "dbt is only for docs" | Tests and materializations are core value |

### Comparison dimensions (trainer crib sheet)

| Dimension | Databricks | Snowflake | dbt |
|-----------|------------|-----------|-----|
| Primary user | Data engineer / ML engineer | SQL analyst / analytics engineer | Analytics engineer |
| Ingest | Spark, Auto Loader, DLT, direct ADLS2 | External stages, Snowpipe, `COPY INTO` | Reads existing Bronze tables |
| Transform | PySpark, SQL, Delta Lake | SQL, Snowpark Python | SQL + Jinja, `ref()`, macros |
| Governance | Unity Catalog | Horizon, tags, masking | Tests, docs, lineage graph |
| Scheduling | Workflows, DLT pipelines | Tasks, Streams | CI/CD, `dbt build` in GitHub Actions |
| Power BI | Gold via Databricks connector | Gold via Snowflake connector | Gold via underlying warehouse |
| Learning curve for SQL team | Higher (notebooks) | Lower (worksheets) | Low–medium (SQL + YAML) |
| Strong when | Scale, Spark, ML, streaming path | SQL ops, elastic DWH, sharing | Lineage, tests, transform-as-code |

### Example "reasonable" stacks (not answers to give — discussion seeds if stalled)

| Stack | When it fits YellowLine NYC |
|-------|--------------------------|
| Snowflake + dbt + Power BI | SQL team maintains; auditors need lineage |
| Databricks only + Power BI | Small eng team; ML/streaming on roadmap |
| Databricks ingest + Snowflake Gold + dbt | Rare split — discuss complexity cost |
| Snowflake only (no dbt) | Fast start; weaker lineage story for board |

---

## Optional Power BI Demo — Placement Options

| Option | When to use |
|--------|-------------|
| **A — Before discussion** | Visual payoff first; discussion references live dashboard |
| **B — After discussion** | Discussion stays abstract; demo as "Priya's deliverable" |
| **C — Skip live demo** | Point to `powerbi/README.md`; animation already showed full dashboard |

**Demo talking points** (2 min max if time-tight):

- Same `.pbix` connects to Databricks **or** Snowflake Gold — switch data source only
- Twelve `kpi_*` tables — no relationships required
- Priya's five questions from the Story are answered on five pages

---

## Handling Common Classroom Situations

| Situation | Response |
|-----------|----------|
| One person dominates | *"Thank you — let's hear someone who disagrees."* |
| Vendor debate gets heated | *"We're not picking a winner for MHP — we're advising Marcus."* |
| Trainee says "just use everything" | *"Marcus has budget for one primary platform. What do you cut?"* |
| Confusion on dbt | Draw platform box with dbt inside as transform layer |
| Room is quiet after Round 1 | Use constraint Card B or F — concrete scenarios unlock opinions |
| Running out of time | Skip Round 4 poll; keep synthesis table + close + Story revisit |

---

## Printable Facilitator Timing Card

```
MODULE 7 — OPEN DISCUSSION (~30 min)
────────────────────────────────────
[ ] Animation mod-07-wrapup.mp4     4 min
[ ] Silent write                     2 min
[ ] Round 1 — Share (3–4 speakers)   8 min
[ ] Round 2 — Challenge (2–3 cards)  8 min
[ ] Round 3 — Synthesis table        8 min
[ ] Round 4 — Story revisit       5 min  ← shorten if needed
[ ] Trainer close                    2 min
[ ] Power BI demo (optional)        10 min
────────────────────────────────────
OPEN:  "What would YOU choose for YellowLine NYC?"
CLOSE: Platform | Transform | Consumption — three decisions
```

---

## Success Signals

Discussion succeeded if trainees:

- [ ] Named at least one strength **and** one weakness per tool
- [ ] Did not treat dbt as a warehouse replacement
- [ ] Referenced Marcus's SQL team or audit constraint unprompted
- [ ] Mentioned Priya / Power BI as consumption layer
- [ ] Changed or defended their Story design with new evidence

---

## Document History

| Date | Change |
|------|--------|
| 2026-05-23 | Initial Module 7 open discussion facilitation guide |
| 2026-05-23 | Expanded Module 9 ML constraint card follow-ups |
| 2026-05-23 | ex-batch-comparison opener; Cortex Module 6 vs 9 distinction |
| 2026-05-24 | Decision-matrix handout link; three-constraint reminder (cost / performance / compliance); closing tagline |