Skip to main content

Train delay prediction

Author: Mihai Nan

Easy
Your best score: N/A
Problem Description

🚆 Train delay prediction ⏱️

🇷🇴 Context 🚃

The national railway company 🇷🇴 wants to predict the delay of a train (in minutes, integer) at its final station. For this purpose, you are provided with a dataset containing details of train trips over the past year.

For each trip, the following characteristics are known:

NameTypeDescription
SampleIDintUnique identifier for the sample
departure_timestring (HH:MM)Train departure time
distance_kmfloatTotal distance of the route
avg_speed_kmhfloatActual average speed
num_stopsintNumber of intermediate stops
weathercategoryWeather conditions: sunny, rain, snow, fog
weekdaycategoryDay of the week
special_events0/1Exceptional events on the route
num_carsintNumber of train cars
ticket_pricefloatTicket price
comfort_classcategorystandard, intermediate, premium
delay_minutesintTarget variable – train delay in minutes

The delay_minutes information is only available in the training set (train.csv).


🎯 Problem goal ⏱️

You need to train a model capable of predicting delay_minutes based on the other features.


📝 Submission format

You must submit a CSV file (submission.csv) with the following format:

SampleID,delay_minutes
0,12
1,3
2,15

Where:

  • SampleID must match the values in test.csv
  • delay_minutes is your model's prediction, rounded to the nearest integer

📊 Evaluation

The evaluation will be based on MAE (Mean Absolute Error):

MAE

The final score is calculated according to the following rules:

  • MAE ≤ 5 → 100 points
  • MAE ≥ 20 → 0 points
  • Intermediate values receive a proportional score between 0 and 100.
Submit Solution
Upload output file and optionally source code for evaluation.

Submission File

Source Code File (optional)

Sign in to upload a submission.