מחבר: Mihai Nan
For this problem, you need to implement a machine learning model to predict the Status field using a provided dataset containing patient information. The dataset is organized in a CSV file with various features, and the model will be evaluated based on precision for the Dead class.
The dataset contains the following fields:
Dead or Alive) – target field.For the first subtasks, you will need to load the dataset and perform statistical analyses on test.csv.
Classify kidney function for each patient in the test set based on GFR:
Normal if GFR >= 90Mildly Decreased if 60 <= GFR < 90Compute the quartiles (Q1, Q2, Q3) of the Serum Creatinine column in the training set and classify patients in the test set as follows:
Very Low if Serum Creatinine <= Q1Low if Q1 < Serum Creatinine <= Q2High if Q2 < Serum Creatinine <= Q3Very High if Serum Creatinine > Q3Determine BMI value:
Count the number of patients in train with the same T Stage as the patient in test.
Build an ML model to predict Status (Dead or Alive) based on features. Evaluation is based on precision for the Dead class.
The evaluation metric is precision for the Dead class:
where:
DeadDeadStatus.sample_output for local testing gives 5 points.The submission.csv file must contain results for all 5 subtasks for each row in the test set.
Each test row generates 5 lines in the file, one for each subtask.
File structure:
subtaskID, datapointID, answer
Column meanings:
Normal / Mildly Decreased / DecreasedVery Low / Low / High / Very High0 or 11 if model predicts Dead, 0 if model predicts AliveExample for a single patient with ID 3220:
subtaskID datapointID answer
1 3220 Normal
2 3220 Low
3 3220 0
4 3220 5
5 3220 1