Aiven Streaming Setup Guide

Kafka cluster, SSL certificates, and secrets for Module 8 (Optional)

title: “Aiven Streaming Setup Guide” subtitle: “Kafka cluster, SSL certificates, and secrets for Module 8 (Optional)” —

NoteTrainer-managed setup

Most of the Aiven configuration is done once by the trainer before the workshop. Trainees only need to add SSL credentials to Databricks Secrets and run the SQL files. This guide covers both trainer setup and trainee steps.

Overview

Module 8 uses Aiven Free Kafka as the streaming broker.

Component Who sets it up What it does
Aiven Kafka cluster Trainer Apache Kafka broker (free tier)
User Activity generator Trainer Publishes simulated events automatically
SSL certificates Trainer downloads, shares Authenticates consumers to Kafka
Databricks Secrets Trainee Stores SSL certs for notebook access
Relay consumer (00_relay_consumer.py) Trainer Writes Kafka events to ADLS2 as NDJSON
Snowflake stage + Snowpipe Trainee (SQL file) Loads NDJSON files into Bronze table

Trainer Setup Steps

1. Create the Aiven Kafka cluster

  1. Go to aiven.ioLoginCreate a new service
  2. Select Apache Kafka
  3. Choose Free plan (no credit card required)
  4. Region: azure-westeurope (same as workshop storage)
  5. Service name: mhp-de-workshop-kafka
  6. Click Create service — takes ~2 minutes to provision
WarningFree tier limits
  • 2 partitions per topic (maximum)
  • 250 KiB/s throughput
  • 5 topics maximum
  • Idle shutdown after 24 hours of inactivity
  • One free Kafka service per organisation

2. Start the User Activity generator

  1. Open the Aiven Console → select your Kafka service
  2. Navigate to Sample data tab
  3. Select User Activity scenario
  4. Duration: 4 hours (maximum for free tier)
  5. Click Start generating — events start flowing immediately to a user-activity topic

The generator publishes events in Avro format using the built-in Karapace Schema Registry.

Event schema:

Field Type Example
timestamp ISO8601 string 2026-04-04T10:23:45Z
user_id string user-abc123
action string view, click, scroll, search, purchase
page string /products/shoes
country string DE, US, GB, FR

3. Download SSL certificates

  1. In the Aiven Console → your Kafka service → Connection information
  2. Under Available protocols → Kafka — download all three:
    • CA Certificate → save as ca.pem
    • Access Certificate → save as service.cert
    • Access Key → save as service.key
  3. Note the Service URI — format: kafka-xxxxx.aivencloud.com:12345
TipShare credentials securely

Share the Service URI and cert content with trainees via the workshop credential sheet. Never commit cert files to Git.

4. Start the relay consumer

The relay consumer (streaming/snowflake/00_relay_consumer.py) reads from Kafka and writes NDJSON files to ADLS2 so Snowpipe can ingest them.

pip install kafka-python fastavro azure-storage-blob

export AIVEN_BOOTSTRAP_SERVERS="kafka-xxxxx.aivencloud.com:12345"
export AIVEN_TOPIC="user-activity"
export AIVEN_CA_CERT_PATH="/path/to/ca.pem"
export AIVEN_CLIENT_CERT_PATH="/path/to/service.cert"
export AIVEN_CLIENT_KEY_PATH="/path/to/service.key"
export AZURE_STORAGE_ACCOUNT="mhpdeworkshopsa"
export AZURE_STORAGE_KEY="<storage-account-key>"

python streaming/snowflake/00_relay_consumer.py

Files land in: https://mhpdeworkshopsa.blob.core.windows.net/nyc-taxi-data/streaming/user-activity/


Trainee Setup Steps

1. Add Aiven credentials to Databricks Secrets

The Databricks streaming notebooks read SSL credentials from the Databricks Secrets store. The trainer will share the values during the session.

Open a cluster terminal or notebook and run:

# Install Databricks CLI if not present
pip install databricks-cli

# Create the secrets scope (if it doesn't exist)
databricks secrets create-scope --scope workshop-scope

# Add each secret (values provided by trainer)
databricks secrets put --scope workshop-scope --key aiven-bootstrap-servers
databricks secrets put --scope workshop-scope --key aiven-ca-cert
databricks secrets put --scope workshop-scope --key aiven-client-cert
databricks secrets put --scope workshop-scope --key aiven-client-key
databricks secrets put --scope workshop-scope --key aiven-topic
TipShortcut: Databricks UI

You can also manage secrets in the Databricks workspace UI: Settings → Admin Console → Secrets, or via the Secrets UI (if enabled in your workspace).

Secret names used by the notebooks:

Secret key Value source
workshop-scope/aiven-bootstrap-servers Service URI from Aiven Console
workshop-scope/aiven-ca-cert Contents of ca.pem
workshop-scope/aiven-client-cert Contents of service.cert
workshop-scope/aiven-client-key Contents of service.key
workshop-scope/aiven-topic user-activity

2. Install Maven libraries on your Databricks cluster

The streaming notebooks require two Maven libraries that are not in the standard Databricks Runtime.

  1. Open your Databricks workspace → Compute → select your cluster
  2. Click Libraries tab → Install new
  3. Source: Maven — install both:
Coordinates Purpose
org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 Kafka source/sink for Structured Streaming
org.apache.spark:spark-avro_2.12:3.5.0 Avro decoding (from_avro)
  1. Restart the cluster after installing

3. Set up Snowflake for streaming (Bronze + Snowpipe)

  1. Open a Snowsight SQL Worksheet
  2. Open streaming/snowflake/01_setup_streaming.sql
  3. Replace {ATTENDEE_ID} with your assigned ID (uppercase)
  4. Replace <sas-token-from-trainer> with the SAS token provided during the session
  5. Run all statements
  6. Verify the Snowpipe is created:
SELECT SYSTEM$PIPE_STATUS(
    'DE_MASTERCLASS.{ATTENDEE_ID}_STREAMING.STREAMING_PIPE_USER_ACTIVITY'
);

Verify Everything is Working

Databricks — confirm Kafka connection (Bronze notebook cell 1):

# Run first cell of 01_streaming_bronze.py
# Should print: "Aiven SSL config ready"
# Should NOT raise: AuthenticationException or ssl.SSLError

Snowflake — confirm Bronze rows are landing:

-- Run this every 30 seconds after relay consumer starts
SELECT COUNT(*), MAX(ingest_ts) AS last_row
FROM DE_MASTERCLASS.{ATTENDEE_ID}_STREAMING.STREAMING_BRONZE_USER_ACTIVITY;

Expect first rows within 1–2 minutes of the relay consumer starting.


Troubleshooting

Issue Solution
ssl.SSLError: certificate verify failed Check that ca-cert secret contains the full content of ca.pem (no extra whitespace)
AuthenticationException Verify bootstrap-servers format is correct: hostname:port
Databricks: ClassNotFoundException: kafka Maven library not installed or cluster not restarted
Databricks: ClassNotFoundException: from_avro spark-avro library not installed
Bronze table empty after 3 min Confirm relay consumer is running; check ADLS2 container for files
Snowpipe not loading Run ALTER PIPE ... REFRESH; manually; check pipe status
Aiven service offline Free tier shuts down after 24h idle — restart from Aiven Console