Wine Type Classification
Автор: Mihai Nan
🧪 Wine Type Classification from the Wine Dataset
Problem Description
The goal is to build a classification model that determines the type of wine based on its chemical characteristics.
Each sample is characterized by 13 numerical attributes that describe chemical and physical properties of the wine, and the label (target) indicates the wine variety from which it originates.
This type of problem belongs to the multi-class classification category.
🔹 Features
alcoholmalic_acidashalcalinity_of_ashmagnesiumtotal_phenolsflavanoidsnonflavanoid_phenolsproanthocyaninscolor_intensityhueod280/od315_of_diluted_winesproline
The dataset comes from the original UCI Machine Learning Repository collection:
https://archive.ics.uci.edu/ml/datasets/Wine
📘 Input File Structure
train.csv
Contains all 13 features columns plus the column:
target– represents the wine class (variety).
Possible values are1,2and3.
Example:
SampleID alcohol malic_acid ... od280/od315_of_diluted_wines proline target
0 37 13.28 1.64 ... 2.78 880.0 0
1 31 13.73 1.50 ... 2.71 1285.0 0
2 27 13.39 1.77 ... 3.22 1195.0 0
3 13 13.75 1.73 ... 2.90 1320.0 0
4 149 13.32 3.24 ... 1.62 650.0 2
test.csv
Contains the same columns without target, but includes SampleID.
Example:
SampleID alcohol malic_acid ... hue od280/od315_of_diluted_wines proline
0 11 14.10 2.16 ... 1.25 3.17 1510.0
1 135 12.51 1.24 ... 0.75 1.51 650.0
2 29 13.87 1.90 ... 1.25 3.40 915.0
3 122 11.56 2.05 ... 0.93 3.69 465.0
4 63 13.67 1.25 ... 1.23 2.46 630.0
📤 Submission
The output file (submission.csv) must contain exactly two columns:
SampleIDlabel– the label predicted by the model (1, 2 or 3)
Example:
| SampleID | label |
|---|---|
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
⚙️ Evaluation
Model evaluation will be performed using the following metric:
- Macro F1-score
This metric is suitable for multi-class classification because it gives equal weight to each class, regardless of the number of examples in each.
General formula:
where C is the number of classes, and F1_i is the F1 score for class i.
The final score is expressed as a percentage (0–100), rounded to two decimal places.
📊 Source
The dataset comes from the original collection:
UCI Machine Learning Repository – Wine Data Set