Train delay prediction
Author: Mihai Nan
Easy
Your best score: N/A
Problem Description
🚆 Train delay prediction ⏱️
🇷🇴 Context 🚃
The national railway company 🇷🇴 wants to predict the delay of a train (in minutes, integer) at its final station. For this purpose, you are provided with a dataset containing details of train trips over the past year.
For each trip, the following characteristics are known:
| Name | Type | Description |
|---|---|---|
SampleID | int | Unique identifier for the sample |
departure_time | string (HH:MM) | Train departure time |
distance_km | float | Total distance of the route |
avg_speed_kmh | float | Actual average speed |
num_stops | int | Number of intermediate stops |
weather | category | Weather conditions: sunny, rain, snow, fog |
weekday | category | Day of the week |
special_events | 0/1 | Exceptional events on the route |
num_cars | int | Number of train cars |
ticket_price | float | Ticket price |
comfort_class | category | standard, intermediate, premium |
delay_minutes | int | Target variable – train delay in minutes |
The delay_minutes information is only available in the training set (train.csv).
🎯 Problem goal ⏱️
You need to train a model capable of predicting delay_minutes based on the other features.
📝 Submission format
You must submit a CSV file (submission.csv) with the following format:
SampleID,delay_minutes
0,12
1,3
2,15
Where:
SampleIDmust match the values intest.csvdelay_minutesis your model's prediction, rounded to the nearest integer
📊 Evaluation
The evaluation will be based on MAE (Mean Absolute Error):
The final score is calculated according to the following rules:
- MAE ≤ 5 → 100 points
- MAE ≥ 20 → 0 points
- Intermediate values receive a proportional score between 0 and 100.