Autor: Mihai Nan
The goal is to build a regression model that predicts the daily electrical energy production (kWh) of a solar panel, based on meteorological conditions and installation characteristics.
Each sample represents a production day and is characterized by several numerical attributes, such as light intensity, air temperature, wind speed, and others.
The target label (energy_output) represents the total energy generated on that day.
This problem belongs to the univariate regression category.
solar_irradiance – average solar radiation (W/m²)temperature – average air temperature (°C)humidity – relative humidity (%)wind_speed – average wind speed (m/s)cloud_cover – average cloud cover (%)panel_angle – panel tilt angle (°)panel_efficiency – panel efficiency (%)train.csvContains all feature columns plus the energy_output column, which represents the target value.
Example:
| SampleID | solar_irradiance | temperature | humidity | wind_speed | cloud_cover | panel_angle | panel_efficiency | energy_output |
|---|---|---|---|---|---|---|---|---|
| 1 | 750.5 | 25.2 | 40.0 | 3.5 | 10 | 30 | 18.5 | 42.3 |
| 2 | 610.0 | 22.1 | 55.0 | 2.0 | 50 | 25 | 17.0 | 28.7 |
test.csvContains the same columns as train.csv, but without energy_output, and includes SampleID.
The output file (submission.csv) must contain exactly two columns:
SampleIDenergy_output – the value predicted by the model (float, with 2 decimal places)Example:
| SampleID | energy_output |
|---|---|
| 1 | 41.75 |
| 2 | 29.10 |
| 3 | 35.80 |
Model evaluation will be performed using Root Mean Squared Error (RMSE):
where N is the number of examples in the test set, y_i is the real value and y^_i is the value predicted by the model.
The final score will be scaled between 0 and 100, so that low RMSE leads to high score.
The data used for this problem is synthetically generated.