作者: Mihai Nan
The goal is to build a regression model to predict the final exam score (Exam_Score) based on students' academic, social, and personal factors.
The model receives a set of features and must estimate a continuous numeric value.
Each instance contains multiple variables, such as:
StudyHoursAttendanceParentalInvolvementHealthStatus... (other columns present in the dataset)train.csvContains all features + target label.
Required columns:
SampleIDExam_ScoreExample:
| SampleID | StudyHours | Attendance | ParentalInvolvement | ... | Exam_Score |
|---|---|---|---|---|---|
| 1 | 3.5 | High | Medium | ... | 78 |
| 2 | 1.2 | Low | Low | ... | 55 |
| 3 | 4.0 | High | High | ... | 92 |
test.csvHas the same structure as train.csv, but without the Exam_Score column, since it must be predicted.
Example:
| SampleID | StudyHours | Attendance | ParentalInvolvement | ... |
|---|---|---|---|---|
| 101 | 3.0 | High | Medium | ... |
| 102 | 0.7 | Low | Low | ... |
The file submission.csv must contain exactly two columns:
SampleIDExam_Score — model predictionExample:
| SampleID | Exam_Score |
|---|---|
| 101 | 81.2 |
| 102 | 49.7 |
Model evaluation uses two metrics:
Then RMSE is converted into a 0–100 score using linear interpolation:
The ideal model (RMSE = 0) achieves the maximum score of 100.
The dataset is generated based on the public Kaggle dataset: Student Performance Factors Dataset