Аутор: ML de bază
The goal is to build a classification model that determines the species of a flower based on its characteristics.
Each sample is characterized by 4 numerical attributes: sepal and petal length and width, and the label (species) indicates the flower species:
setosaversicolorvirginicaThis type of problem belongs to the multi-class classification category.
The dataset is derived from the classic Iris collection (UCI ML Repository).
sepal_lengthsepal_widthpetal_lengthpetal_widthtrain.csvContains all 4 feature columns plus the column:
target – represents the flower species (0=setosa, 1=versicolor, 2=virginica)Example:
sepal_length_(cm) sepal_width_(cm) petal_length_(cm) petal_width_(cm) target SampleID
8 4.4 2.9 1.4 0.2 0 9
106 4.9 2.5 4.5 1.7 2 107
76 6.8 2.8 4.8 1.4 1 77
9 4.9 3.1 1.5 0.1 0 10
89 5.5 2.5 4.0 1.3 1 90
test.csvContains the same columns without target, but includes SampleID.
Example:
sepal_length_(cm) sepal_width_(cm) petal_length_(cm) petal_width_(cm) SampleID
38 4.4 3.0 1.3 0.2 39
127 6.1 3.0 4.9 1.8 128
57 4.9 2.4 3.3 1.0 58
93 5.0 2.3 3.3 1.0 94
42 4.4 3.2 1.3 0.2 43
The output file (submission.csv) must contain exactly two columns:
SampleIDlabel – predicted species (0, 1 or 2)Example:
| SampleID | label |
|---|---|
| 101 | 1 |
| 102 | 2 |
| 103 | 0 |
Model evaluation will be performed using the following metric:
This metric is suitable for multi-class classification because it gives equal weight to each class, regardless of the number of examples in each.
General formula:
where C is the number of classes, and F1_i is the F1 score for class i.
The final score is expressed as a percentage (0–100), rounded to two decimal places.
The dataset comes from the original collection:
UCI Machine Learning Repository – Iris Data Set