Tekijä: Mihai Nan
The goal is to build a regression model that predicts the daily average temperature in a certain city, based on meteorological characteristics.
Each record is characterized by the following attributes:
humidity – relative humidity (%)wind_speed – wind speed (km/h)pressure – atmospheric pressure (hPa)rainfall – precipitation amount (mm)cloud_cover – cloud coverage (%)solar_radiation – solar radiation (W/m²)day_of_year – day of the year (1–365)The label (target column) is:
temperature – daily average temperature (°C)This problem belongs to the continuous regression category.
train.csvContains all feature columns plus the target column temperature.
Values are continuous numeric.
| SampleID | humidity | wind_speed | pressure | rainfall | cloud_cover | solar_radiation | day_of_year | temperature |
|---|---|---|---|---|---|---|---|---|
| 1 | 70.0 | 15.0 | 1015.0 | 2.0 | 50 | 500.0 | 1 | 22.5 |
| 2 | 65.0 | 10.0 | 1018.0 | 0.0 | 20 | 650.0 | 2 | 24.0 |
| 3 | 80.0 | 5.0 | 1012.0 | 5.0 | 80 | 200.0 | 3 | 19.0 |
test.csvContains the same columns without temperature, but includes SampleID.
| SampleID | humidity | wind_speed | pressure | rainfall | cloud_cover | solar_radiation | day_of_year |
|---|---|---|---|---|---|---|---|
| 101 | 75.0 | 12.0 | 1016.0 | 1.0 | 60 | 550.0 | 101 |
| 102 | 60.0 | 8.0 | 1019.0 | 0.0 | 10 | 700.0 | 102 |
The output file (submission.csv) must contain exactly two columns:
SampleIDtemperature – the value predicted by the model (float, °C)| SampleID | temperature |
|---|---|
| 101 | 23.5 |
| 102 | 25.1 |
| 103 | 21.8 |
Model evaluation will be done using Root Mean Squared Error (RMSE):
where:
N is the number of examples in the test sety_i is the actual temperature valueŷ_i is the value predicted by the modelRMSE provides a measure of the mean squared deviation between predictions and actual values, expressed in the same unit as the target (°C).
A lower RMSE score indicates a more accurate model.
The dataset is a synthetic set inspired by real meteorological data.