Module 6: AI Features
Cortex LLM assistants — not predictive ML
Duration: 45 min — Animation (3) · Think & Discuss (7) · Theory (15) · Quiz (3) · Practice (17)
1. Animation
2. Think & Discuss
Situation: Marcus saw Priya’s dashboard and asked whether AI can speed up analyst workflows. MHP demos assistants — medallion remains the source of truth.
Prompts:
- What task is Marcus trying to speed up — pipelines, SQL, or dashboards?
- Where would you trust AI in this project? Where would you not?
- If AI writes SQL against Silver, what must still exist for trustworthy answers?
3. Theory
| Module | Product | Examples |
|---|---|---|
| 6 — AI Features | Cortex LLM | AI_COMPLETE, Genie, Copilot |
| 9 — ML (optional) | Snowflake ML Functions | ML.FORECAST, Snowpark ML training |
Do not preview ML.FORECAST in Module 6.
Do not hardcode LLM IDs (mistral-large2, databricks-meta-llama-3-3-70b-instruct, etc.). Verify available models in your workspace/account before class.
Beyond AI_COMPLETE() (used hands-on below), Snowflake offers two additional AI services worth knowing:
- Cortex Analyst — translates natural language questions into SQL, but unlike Genie, it runs against a governed semantic model (a YAML definition of tables, columns, and business metrics). This ensures the AI only queries approved, well-defined columns — making it suitable for self-service analytics where governance matters.
- Cortex Search — provides hybrid search (semantic + keyword) over both unstructured documents and structured data, enabling RAG (Retrieval-Augmented Generation) patterns within Snowflake.
Databricks RAG equivalent: On Databricks, Mosaic AI Vector Search provides managed vector indexes for RAG patterns — embedding documents and retrieving relevant chunks to ground LLM responses. Both platforms fully support RAG; the choice depends on which ecosystem your data already lives in.
Snowflake Intelligence (announced 2025) bundles these services into a unified AI layer — think of it as the umbrella brand for all Cortex AI capabilities. For this workshop, the hands-on focus remains on AI_COMPLETE(), Genie, and notebook assistants.
3.1
Snowflake Cortex AI (20 min — Hands-on)
AI_COMPLETE — LLM-powered Data Analysis
Snowflake’s AI_COMPLETE() function runs LLMs directly in SQL queries:
SELECT
pickup_zone,
dropoff_zone,
time_of_day,
AI_COMPLETE(
'mistral-large2',
'Classify this NYC taxi trip as business, leisure, or commute: '
|| 'From ' || pickup_zone || ' to ' || dropoff_zone
|| ' during ' || time_of_day
) AS trip_classification
FROM silver_nyc_taxi_enriched
LIMIT 10;Additional built-in Cortex AI functions include AI_SUMMARIZE_AGG() for document summaries and AI_CLASSIFY() for categorization — all without model training or external API calls.
Cortex Code — AI-Assisted SQL
- Available in Snowsight SQL editor
- Generates SQL from natural language descriptions
- Explains existing queries
- Suggests optimizations
Exercise
See AI Features Exercise for guided Cortex AI exercises.
3.2
Databricks Genie (15 min — Guided Demo + Hands-on)
Databricks Assistant
AI coding assistant integrated into notebooks:
- Type a prompt → get PySpark code
- Explain existing code
- Debug errors
- Available in every notebook cell
Try these prompts: > “Write a PySpark query to find the top 5 routes by revenue” > “Explain this data quality filter” > “Add a column that classifies tips as low/medium/high”
AI Functions — ai_query()
Run LLMs on your data via SQL:
SELECT
pickup_zone,
ai_query(
'databricks-meta-llama-3-3-70b-instruct',
CONCAT('Is this zone a tourist area? Zone: ', pickup_zone)
) AS is_tourist_area
FROM gold_kpi_top_pickup_zones;Genie (formerly AI/BI Spaces)
Create a natural-language interface over Gold tables:
- In the Databricks sidebar, click Genie (under the SQL section)
- Create a new Genie space connected to your Gold schema
- Ask questions in plain English: “What hour has the most taxi trips?”
Beyond the Genie and Assistant features demonstrated here, Databricks offers the Mosaic AI stack for production AI workloads: Model Serving (managed REST endpoints for ML and LLM models), Vector Search (managed vector indexes for RAG), and Agent Framework (orchestration for multi-step AI agents). These are production-grade tools that extend beyond the workshop scope but are worth knowing for teams building AI applications on Databricks.
3.3
dbt Copilot (10 min — Trainer Demo)
Features (dbt Cloud Starter+ plans)
- Auto-generate documentation: AI writes model descriptions from SQL
- Auto-generate tests: AI suggests data tests based on column patterns
- SQL generation: Natural language → dbt SQL model
dbt MCP Server (Free — Trainees Can Use)
The dbt MCP server integrates dbt with AI coding assistants in VS Code:
- Query dbt documentation from your IDE
- Get model recommendations
- Auto-complete
ref()andsource()references
3.4 Discussion: Where Does AI Add Most Value?
Consider these scenarios:
| Scenario | Best AI Tool |
|---|---|
| Classify trip data at scale | |
| Generate boilerplate SQL | |
| Auto-document 50 dbt models | |
| Explore data via natural language | |
| Debug a failing query | Any AI assistant |
| Write data tests | dbt Copilot |
Key insight: AI is most valuable for repetitive, pattern-based tasks — not for architectural decisions or complex business logic.
AI on governed data only
Always run AI functions against Silver or Gold tables — never against raw Bronze. LLMs produce more reliable, consistent results when the input data is clean, typed, and validated. If AI_COMPLETE() generates a classification for a zone name that contains null or garbled values, the output is garbage. Clean data in, useful AI out.
AI outputs are not deterministic
LLM-based functions (AI_COMPLETE, ai_query()) are probabilistic — the same input can produce different outputs across runs. This means: (a) never use LLM output as a primary key or join condition, (b) always validate AI-generated classifications against known ground truth before trusting them at scale, and (c) do not put AI-generated columns into financial reports without human review. AI accelerates exploration; humans validate production KPIs.
3.5 Key Takeaways
- This module ≠ Module 9 — Module 6 covers LLM assistants (text generation, classification, code generation); Module 9 covers predictive ML (regression, forecasting)
AI_COMPLETE()andai_query()run LLMs inside SQL — no separate API server or infrastructure needed- Genie and Cortex Analyst provide natural-language interfaces over Gold tables — ideal for self-service analytics
- AI is most valuable for repetitive, pattern-based tasks: boilerplate SQL, data classification, documentation generation
- AI outputs are probabilistic — always validate against governed Silver/Gold data and never use as primary keys
- Cortex Analyst uses semantic models for governed self-service; Cortex Search enables RAG patterns
4. Quiz
Quiz: Module 6 — AI Features Quiz
Before moving on, make sure you can answer:
- What is the difference between
AI_COMPLETE()(Module 6) andML.FORECAST(Module 9)? - Why should AI functions run against Silver or Gold tables rather than Bronze?
- Name two scenarios where AI assistants save significant time in data engineering workflows.
5. Practice
Hands-on lab
Priya / Power BI: AI helps Priya explore Gold faster — it does not replace her KPI definitions or dashboard.
Next module
Module 7: Comparison & Wrap-up — Priya presents the finished dashboard. Marcus asks what to run in production — you decide.