Författare: Mihai Nan
The Federal Aviation Administration wants to develop a system capable of predicting flight arrival delays (in minutes) based on historical information about airlines, airports, and operational conditions.
For this purpose, you are provided with a dataset aggregated at the level of:
(year, month, airline, airport).
| Name | Description |
|---|---|
sample_id | Unique sample ID (e.g., 0194048) |
year | Reporting year |
month | Reporting month |
carrier | Airline code |
carrier_name | Full airline name |
airport | Airport code (IATA) |
airport_name | Airport name |
arr_flights | Total number of arrived flights |
arr_del15 | Number of arrivals delayed >15 minutes |
carrier_ct | Delays attributed to the carrier |
weather_ct | Delays caused by weather conditions |
nas_ct | Delays caused by the national air system (NAS) |
security_ct | Security-related delays |
late_aircraft_ct | Delays caused by late-arriving aircraft |
arr_cancelled | Cancelled flights |
arr_diverted | Diverted flights |
delay | Target variable — total arrival delay (minutes, available only in train.csv) |
Note: In
test.csv, thedelaycolumn is absent and needs to be predicted.
Train a model to predict delay (in minutes) using the features listed above.
The final result should be submitted as a submission.csv file.
The file should be:
sample_id,delay
0194048,132
0194049,0
0194050,215
where:
sample_id must match the values in test.csvdelay represents an integer, the model's predicted delayEvaluation is performed using MAE (Mean Absolute Error):
Good luck and smooth flying! ✨