Автор: Mihai Nan
The national railway company 🇷🇴 wants to predict the delay of a train (in minutes, integer) at its final station. For this purpose, you are provided with a dataset containing details of train trips over the past year.
For each trip, the following characteristics are known:
| Name | Type | Description |
|---|---|---|
SampleID | int | Unique identifier for the sample |
departure_time | string (HH:MM) | Train departure time |
distance_km | float | Total distance of the route |
avg_speed_kmh | float | Actual average speed |
num_stops | int | Number of intermediate stops |
weather | category | Weather conditions: sunny, rain, snow, fog |
weekday | category | Day of the week |
special_events | 0/1 | Exceptional events on the route |
num_cars | int | Number of train cars |
ticket_price | float | Ticket price |
comfort_class | category | standard, intermediate, premium |
delay_minutes | int | Target variable – train delay in minutes |
The delay_minutes information is only available in the training set (train.csv).
You need to train a model capable of predicting delay_minutes based on the other features.
You must submit a CSV file (submission.csv) with the following format:
SampleID,delay_minutes
0,12
1,3
2,15
Where:
SampleID must match the values in test.csvdelay_minutes is your model's prediction, rounded to the nearest integerThe evaluation will be based on MAE (Mean Absolute Error):
The final score is calculated according to the following rules: