Diabetes Diagnosis Based on Blood Tests
Autore: Mihai Nan
🩺 Diabetes Diagnosis Based on Blood Tests
Problem Description
The goal is to build a classification model that predicts whether a patient has diabetes based on blood tests and demographic data.
Each patient is characterized by 8 numerical attributes obtained from analyses and clinical measurements, and the label (target) indicates the presence of diabetes (1 for positive, 0 for negative).
This type of problem belongs to the binary classification category.
🔹 Features
pregnancies– number of pregnanciesglucose– blood glucose levelblood_pressure– blood pressureskin_thickness– skin fold thicknessinsulin– insulin levelbmi– body mass indexdiabetes_pedigree_function– genetic risk scoreage– patient age
📘 Input File Structure
train.csv
Contains all 8 feature columns plus the column:
target– indicates the presence of diabetes (0or1)
Example:
| SampleID | pregnancies | glucose | blood_pressure | skin_thickness | insulin | bmi | diabetes_pedigree_function | age | target |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 |
| 2 | 1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 |
test.csv
Contains the same columns without target, but includes SampleID.
Example:
| SampleID | pregnancies | glucose | blood_pressure | skin_thickness | insulin | bmi | diabetes_pedigree_function | age |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 |
| 2 | 5 | 116 | 74 | 0 | 0 | 25.6 | 0.201 | 30 |
📤 Submission
The output file (submission.csv) must contain exactly two columns:
SampleIDlabel– the label predicted by the model (0or1)
Example:
| SampleID | label |
|---|---|
| 1 | 1 |
| 2 | 0 |
| 3 | 0 |
⚙️ Evaluation
Model evaluation will be performed using the following metric:
- Binary F1-score
This metric is suitable for binary classification because it gives equal importance to prediction accuracy for both classes.
General formula:
where:
The final score is expressed as a percentage (0–100), rounded to two decimal places.
📊 Source
The dataset comes from the original collection:
Pima Indians Diabetes Database – Kaggle