Flight delay prediction
Author: Mihai Nan
Medium
Your best score: N/A
Problem Description
✈️ Flight Delay Prediction 🕒
🇺🇸 Context
The Federal Aviation Administration wants to develop a system capable of predicting flight arrival delays (in minutes) based on historical information about airlines, airports, and operational conditions.
For this purpose, you are provided with a dataset aggregated at the level of:
(year, month, airline, airport).
📦 Available Features
| Name | Description |
|---|---|
sample_id | Unique sample ID (e.g., 0194048) |
year | Reporting year |
month | Reporting month |
carrier | Airline code |
carrier_name | Full airline name |
airport | Airport code (IATA) |
airport_name | Airport name |
arr_flights | Total number of arrived flights |
arr_del15 | Number of arrivals delayed >15 minutes |
carrier_ct | Delays attributed to the carrier |
weather_ct | Delays caused by weather conditions |
nas_ct | Delays caused by the national air system (NAS) |
security_ct | Security-related delays |
late_aircraft_ct | Delays caused by late-arriving aircraft |
arr_cancelled | Cancelled flights |
arr_diverted | Diverted flights |
delay | Target variable — total arrival delay (minutes, available only in train.csv) |
Note: In
test.csv, thedelaycolumn is absent and needs to be predicted.
🎯 Problem Objective
Train a model to predict delay (in minutes) using the features listed above.
The final result should be submitted as a submission.csv file.
📝 Submission Format
The file should be:
sample_id,delay
0194048,132
0194049,0
0194050,215
where:
sample_idmust match the values intest.csvdelayrepresents an integer, the model's predicted delay
📊 Evaluation
Evaluation is performed using MAE (Mean Absolute Error):
🏆 Final Score
- MAE ≤ 400 → 100 points
- MAE ≥ 600 → 0 points
- Intermediate values receive proportional scores between 0 and 100.
🔎 Notes
- You are free to use any ML technique.
- Preprocessing, encoding, and feature engineering are allowed.
Good luck and smooth flying! ✨