title: “Data Model Reference” subtitle: “Schemas, columns, and 12 core KPIs” —
Bronze Schema (19 columns)
Raw data from NYC TLC, unchanged except for metadata additions.
| VendorID |
Integer |
TPEP provider (1=CMT, 2=VeriFone) |
| tpep_pickup_datetime |
Timestamp |
Meter start time |
| tpep_dropoff_datetime |
Timestamp |
Meter stop time |
| passenger_count |
Integer |
Number of passengers |
| trip_distance |
Double |
Trip distance in miles |
| RatecodeID |
Integer |
Rate code (1=Standard, 2=JFK, etc.) |
| store_and_fwd_flag |
String |
Store-and-forward flag |
| PULocationID |
Integer |
Pickup taxi zone ID |
| DOLocationID |
Integer |
Dropoff taxi zone ID |
| payment_type |
Integer |
Payment method (1=Credit, 2=Cash, etc.) |
| fare_amount |
Double |
Base fare |
| extra |
Double |
Extra charges |
| mta_tax |
Double |
MTA tax |
| tip_amount |
Double |
Tip (credit card only) |
| tolls_amount |
Double |
Bridge/tunnel tolls |
| improvement_surcharge |
Double |
Improvement surcharge |
| total_amount |
Double |
Total charged to passenger |
| congestion_surcharge |
Double |
NYC congestion surcharge |
| airport_fee |
Double |
Airport pickup fee |
Metadata columns (added during Bronze ingestion): - source_file — original filename - bronze_processing_timestamp — when ingested - data_year, data_month — extracted from pickup datetime
Silver Schema (derived columns)
All Bronze columns plus these derived fields:
| trip_duration |
Double |
(dropoff - pickup) in minutes |
| fare_per_mile |
Double |
fare_amount / trip_distance |
| tip_percentage |
Double |
tip_amount / fare_amount * 100 |
| avg_speed |
Double |
trip_distance / (duration_hours) |
| pickup_hour |
Integer |
HOUR(pickup_datetime) |
| pickup_day_of_week |
Integer |
DAYOFWEEK(pickup_datetime) |
| is_weekend |
Boolean |
Saturday or Sunday |
| is_peak_hour |
Boolean |
7-9 AM or 4-7 PM |
| time_of_day |
String |
Morning Rush / Midday / Evening Rush / Night |
| distance_band |
String |
Short / Medium / Long / Very Long |
| payment_type_desc |
String |
Credit Card / Cash / No Charge / Dispute |
| rate_code_desc |
String |
Standard Rate / JFK / Newark / etc. |
| pickup_zone |
String |
Zone name from lookup |
| pickup_borough |
String |
Borough from lookup |
| dropoff_zone |
String |
Zone name from lookup |
| dropoff_borough |
String |
Borough from lookup |
| is_same_borough |
Boolean |
pickup_borough == dropoff_borough |
Data Quality Rules
Applied during Silver transformation:
| Valid timestamps |
pickup_datetime IS NOT NULL AND dropoff_datetime IS NOT NULL |
| Pickup before dropoff |
pickup_datetime <= dropoff_datetime |
| Reasonable duration |
trip_duration BETWEEN 1 AND 1440 minutes |
| Positive distance |
trip_distance > 0 |
| Positive fare |
fare_amount > 0 |
| Positive total |
total_amount > 0 |
| Valid passengers |
passenger_count BETWEEN 1 AND 8 |
| Negative tip correction |
CASE WHEN tip_amount < 0 THEN 0 ELSE tip_amount END |
| Deduplication |
Remove exact duplicate rows |
Zone Lookup (4 columns)
| LocationID |
Integer |
Zone identifier (1-265) |
| Borough |
String |
NYC borough name |
| Zone |
String |
Neighborhood/zone name |
| service_zone |
String |
Service zone category |
12 Core KPIs
KPI 1: Trips by Hour
| pickup_hour |
Hour of day (0-23) |
| total_trips |
Trip count |
| total_revenue |
Sum of total_amount |
| avg_distance |
Mean trip distance |
| peak_hour_trips |
Trips during peak hours |
| weekend_trips |
Trips on weekends |
KPI 2: Trips by Day
| day_of_week |
Day name |
| total_trips |
Trip count |
| total_revenue |
Sum of total_amount |
| avg_fare |
Mean total_amount |
KPI 3: Time of Day Analysis
| time_of_day |
Morning Rush / Midday / Evening Rush / Night |
| total_trips |
Trip count |
| total_revenue |
Sum of total_amount |
| avg_distance |
Mean trip distance |
KPI 4: Top Pickup Zones
| pickup_zone |
Zone name |
| pickup_borough |
Borough |
| total_trips |
Trip count |
| trip_rank |
Rank by trip count (top 20) |
KPI 5: Borough Analysis
| pickup_borough |
Origin borough |
| dropoff_borough |
Destination borough |
| total_trips |
Trip count |
| total_revenue |
Sum of total_amount |
KPI 6: Popular Routes
| pickup_zone / dropoff_zone |
Route endpoints |
| total_trips |
Trip count |
| route_rank |
Rank by trip count (top 50) |
KPI 7: Distance Bands
| distance_band |
Short / Medium / Long / Very Long |
| total_trips |
Trip count |
| avg_fare_per_mile |
Mean fare per mile |
KPI 8: Passenger Count Analysis
| passenger_count |
Number of passengers |
| total_trips |
Trip count |
| pct_of_total |
Percentage of all trips |
KPI 9: Revenue by Hour
| pickup_hour |
Hour of day |
| total_revenue |
Sum of total_amount |
| total_tips |
Sum of tip_amount |
| avg_tip_pct |
Mean tip percentage |
KPI 10: Payment Type Analysis
| payment_type_desc |
Credit Card / Cash / etc. |
| total_trips |
Trip count |
| tip_rate_pct |
Percentage of trips with tips |
KPI 11: Trip Efficiency
| distance_band |
Distance category |
| time_of_day |
Time period |
| avg_speed_mph |
Mean speed |
| revenue_per_minute |
Revenue efficiency |
KPI 12: Data Quality Metrics
| bronze_total |
Raw record count |
| silver_total |
Post-filter record count |
| filter_pct |
Percentage filtered out |
| avg_data_quality_score |
Composite quality score |