作者: Mihai Nan
For this problem you need to implement a regression model capable of predicting the exam score (Exam_Score) using an available dataset. The dataset is organized in a CSV file, and the model's performance will be evaluated based on Mean Absolute Error (MAE).
The dataset contains the following columns:
Calculate the mean of values from the Hours_Studied column on the training set. For each student in the test set, determine the absolute difference between Hours_Studied and the calculated mean.
Determine for each student in the test set whether they sleep little (<7 hours). The result will be True or False.
Count how many students from the training set had a previous score (Previous_Scores) greater than or equal to each student in the test set.
Determine the number of students from training with the same motivation level (Motivation_Level) as each student in the test set.
Build a regression model to predict Exam_Score based on the provided features. The model must generalize to new data and will be evaluated with MAE.
Exam_Scoretrain_data.csv and evaluated on test_data.csvThe submission file must be a CSV with exactly three columns:
| Column | Type | Description |
|---|---|---|
subtaskID | integer | Represents the subtask ID (from 1 to 5). |
datapointID | integer/string | Represents the unique identifier of the row from the test set (ID). |
answer | float / int / bool | Answer for the respective subtask. The value type depends on the subtask: • Subtask 1: float • Subtask 2: boolean • Subtask 3: integer • Subtask 4: integer • Subtask 5: float (model predictions) |
| subtaskID | datapointID | answer |
|---|---|---|
| 1 | 101 | 12.5 |
| 2 | 101 | True |
| 3 | 101 | 7 |
| 4 | 101 | 3 |
| 5 | 101 | 85.3 |
Important: Each row in the CSV represents the answer for a single subtask and a single datapoint. For each
datapointIDthere must be one row for each subtask.
Submitting a sample_output generates 6 points.