🏡 House Price Prediction
For this problem you need to implement a regression model capable of predicting the value of a house price (Price) using an available dataset.
The model's performance will be evaluated based on the mean absolute error (MAE).
📊 Dataset
The dataset is organized in a CSV file and contains the following columns:
- Square_Footage: House area
- Num_Bedrooms: Number of bedrooms
- Num_Bathrooms: Number of bathrooms
- Year_Built: Year of construction
- Lot_Size: Lot size
- Garage_Size: Garage size
- Neighborhood_Quality: Neighborhood quality
- Footage_to_Lot_Ratio: Ratio between house area and lot size
- Total_Rooms: Total number of rooms
- Age_of_House: Age of the house
- Garage_to_Footage_Ratio: Ratio between garage size and house area
- Avg_Room_Size: Average room size
- Price: Target variable, house price (numerical value)
- House_Orientation_Angle: House orientation angle
- Street_Alignment_Offset: Street alignment
- Solar_Exposure_Index: Solar exposure index
- Magnetic_Field_Strength: Magnetic field strength
- Vibration_Level: Vibration level
🧩 Tasks
🔹 Subtask 1 (10p)
For each house in the test set, determine the estimated total area as the sum of:
Square_Footage + Garage_Size + Lot_Size.
🔹 Subtask 2 (10p)
For each house in the test set, calculate the ratio:
Garage_Size / Total_Rooms
The result will be added as a new column: Garage_to_Room_Ratio.
🔹 Subtask 3 (10p)
Calculate the environmental stability index:

🔹 Subtask 4 (10p)
- Calculate the mean of the
Square_Footage column on the training set.
- For each house in the test set, determine:
|Square_Footage - Mean(Square_Footage_train)|
🔹 Subtask 5 (60p)
Build a regression model for predicting the Price field, using the training set dataset_train.csv.
Determine the predictions on the evaluation set dataset_eval.csv (which does not contain the Price column).
🧠 Notes about the dataset
- Target field:
Price
- Numerical variables can be used directly for regression.
- It is recommended to handle missing values and apply scaling/normalization to improve model performance.
🧮 Evaluation criteria
- Performance: The model must achieve the lowest possible MAE.
Formula used for MAE calculation:

where:
N = number of examples in the test set,
y_i = actual value,
y^_i = value predicted by the model.
💡 Note
If you submit sample_output, you will receive 5 points.
🗂️ Useful resources