Daily Average Temperature Prediction
Автор: Mihai Nan
🌡️ Daily Average Temperature Prediction
Problem Description
The goal is to build a regression model that predicts the daily average temperature in a certain city, based on meteorological characteristics.
Each record is characterized by the following attributes:
humidity– relative humidity (%)wind_speed– wind speed (km/h)pressure– atmospheric pressure (hPa)rainfall– precipitation amount (mm)cloud_cover– cloud coverage (%)solar_radiation– solar radiation (W/m²)day_of_year– day of the year (1–365)
The label (target column) is:
temperature– daily average temperature (°C)
This problem belongs to the continuous regression category.
📘 Input File Structure
train.csv
Contains all feature columns plus the target column temperature.
Values are continuous numeric.
| SampleID | humidity | wind_speed | pressure | rainfall | cloud_cover | solar_radiation | day_of_year | temperature |
|---|---|---|---|---|---|---|---|---|
| 1 | 70.0 | 15.0 | 1015.0 | 2.0 | 50 | 500.0 | 1 | 22.5 |
| 2 | 65.0 | 10.0 | 1018.0 | 0.0 | 20 | 650.0 | 2 | 24.0 |
| 3 | 80.0 | 5.0 | 1012.0 | 5.0 | 80 | 200.0 | 3 | 19.0 |
test.csv
Contains the same columns without temperature, but includes SampleID.
| SampleID | humidity | wind_speed | pressure | rainfall | cloud_cover | solar_radiation | day_of_year |
|---|---|---|---|---|---|---|---|
| 101 | 75.0 | 12.0 | 1016.0 | 1.0 | 60 | 550.0 | 101 |
| 102 | 60.0 | 8.0 | 1019.0 | 0.0 | 10 | 700.0 | 102 |
📤 Submission
The output file (submission.csv) must contain exactly two columns:
SampleIDtemperature– the value predicted by the model (float, °C)
| SampleID | temperature |
|---|---|
| 101 | 23.5 |
| 102 | 25.1 |
| 103 | 21.8 |
⚙️ Evaluation
Model evaluation will be done using Root Mean Squared Error (RMSE):
where:
Nis the number of examples in the test sety_iis the actual temperature valueŷ_iis the value predicted by the model
RMSE provides a measure of the mean squared deviation between predictions and actual values, expressed in the same unit as the target (°C).
A lower RMSE score indicates a more accurate model.
📊 Source
The dataset is a synthetic set inspired by real meteorological data.