Data Model Reference

Schemas, columns, and 12 core KPIs

title: “Data Model Reference” subtitle: “Schemas, columns, and 12 core KPIs” —

Bronze Schema (19 columns)

Raw data from NYC TLC, unchanged except for metadata additions.

Column Type Description
VendorID Integer TPEP provider (1=CMT, 2=VeriFone)
tpep_pickup_datetime Timestamp Meter start time
tpep_dropoff_datetime Timestamp Meter stop time
passenger_count Integer Number of passengers
trip_distance Double Trip distance in miles
RatecodeID Integer Rate code (1=Standard, 2=JFK, etc.)
store_and_fwd_flag String Store-and-forward flag
PULocationID Integer Pickup taxi zone ID
DOLocationID Integer Dropoff taxi zone ID
payment_type Integer Payment method (1=Credit, 2=Cash, etc.)
fare_amount Double Base fare
extra Double Extra charges
mta_tax Double MTA tax
tip_amount Double Tip (credit card only)
tolls_amount Double Bridge/tunnel tolls
improvement_surcharge Double Improvement surcharge
total_amount Double Total charged to passenger
congestion_surcharge Double NYC congestion surcharge
airport_fee Double Airport pickup fee

Metadata columns (added during Bronze ingestion): - source_file — original filename - bronze_processing_timestamp — when ingested - data_year, data_month — extracted from pickup datetime

Silver Schema (derived columns)

All Bronze columns plus these derived fields:

Column Type Derivation
trip_duration Double (dropoff - pickup) in minutes
fare_per_mile Double fare_amount / trip_distance
tip_percentage Double tip_amount / fare_amount * 100
avg_speed Double trip_distance / (duration_hours)
pickup_hour Integer HOUR(pickup_datetime)
pickup_day_of_week Integer DAYOFWEEK(pickup_datetime)
is_weekend Boolean Saturday or Sunday
is_peak_hour Boolean 7-9 AM or 4-7 PM
time_of_day String Morning Rush / Midday / Evening Rush / Night
distance_band String Short / Medium / Long / Very Long
payment_type_desc String Credit Card / Cash / No Charge / Dispute
rate_code_desc String Standard Rate / JFK / Newark / etc.
pickup_zone String Zone name from lookup
pickup_borough String Borough from lookup
dropoff_zone String Zone name from lookup
dropoff_borough String Borough from lookup
is_same_borough Boolean pickup_borough == dropoff_borough

Data Quality Rules

Applied during Silver transformation:

Rule Filter Condition
Valid timestamps pickup_datetime IS NOT NULL AND dropoff_datetime IS NOT NULL
Pickup before dropoff pickup_datetime <= dropoff_datetime
Reasonable duration trip_duration BETWEEN 1 AND 1440 minutes
Positive distance trip_distance > 0
Positive fare fare_amount > 0
Positive total total_amount > 0
Valid passengers passenger_count BETWEEN 1 AND 8
Negative tip correction CASE WHEN tip_amount < 0 THEN 0 ELSE tip_amount END
Deduplication Remove exact duplicate rows

Zone Lookup (4 columns)

Column Type Description
LocationID Integer Zone identifier (1-265)
Borough String NYC borough name
Zone String Neighborhood/zone name
service_zone String Service zone category

12 Core KPIs

KPI 1: Trips by Hour

Column Description
pickup_hour Hour of day (0-23)
total_trips Trip count
total_revenue Sum of total_amount
avg_distance Mean trip distance
peak_hour_trips Trips during peak hours
weekend_trips Trips on weekends

KPI 2: Trips by Day

Column Description
day_of_week Day name
total_trips Trip count
total_revenue Sum of total_amount
avg_fare Mean total_amount

KPI 3: Time of Day Analysis

Column Description
time_of_day Morning Rush / Midday / Evening Rush / Night
total_trips Trip count
total_revenue Sum of total_amount
avg_distance Mean trip distance

KPI 4: Top Pickup Zones

Column Description
pickup_zone Zone name
pickup_borough Borough
total_trips Trip count
trip_rank Rank by trip count (top 20)

KPI 5: Borough Analysis

Column Description
pickup_borough Origin borough
dropoff_borough Destination borough
total_trips Trip count
total_revenue Sum of total_amount

KPI 7: Distance Bands

Column Description
distance_band Short / Medium / Long / Very Long
total_trips Trip count
avg_fare_per_mile Mean fare per mile

KPI 8: Passenger Count Analysis

Column Description
passenger_count Number of passengers
total_trips Trip count
pct_of_total Percentage of all trips

KPI 9: Revenue by Hour

Column Description
pickup_hour Hour of day
total_revenue Sum of total_amount
total_tips Sum of tip_amount
avg_tip_pct Mean tip percentage

KPI 10: Payment Type Analysis

Column Description
payment_type_desc Credit Card / Cash / etc.
total_trips Trip count
tip_rate_pct Percentage of trips with tips

KPI 11: Trip Efficiency

Column Description
distance_band Distance category
time_of_day Time period
avg_speed_mph Mean speed
revenue_per_minute Revenue efficiency

KPI 12: Data Quality Metrics

Column Description
bronze_total Raw record count
silver_total Post-filter record count
filter_pct Percentage filtered out
avg_data_quality_score Composite quality score