Författare: Mihai Nan
The goal is to build a classification model that predicts the primary emotion associated with a text.
Each example is represented by a short text, and the label (label) corresponds to the emotion (e.g., joy, anger, sadness, etc.).
This problem belongs to the category of multi-class classification.
text – the textual content of the message or articleTarget label:
label – the emotion associated with the text (string)train.csvContains the following columns:
SampleIDtextlabelExample:
| SampleID | text | label |
|---|---|---|
| 1 | "I am so happy today!" | joy |
| 2 | "I feel really angry about this situation." | anger |
| 3 | "Feeling a bit sad after watching that movie." | sadness |
test.csvContains the same columns without label, but includes SampleID.
Example:
| SampleID | text |
|---|---|
| 101 | "What a wonderful surprise!" |
| 102 | "I can't believe this happened." |
The output file (submission.csv) must contain exactly two columns:
SampleIDlabel – the label predicted by the modelExample:
| SampleID | label |
|---|---|
| 101 | joy |
| 102 | surprise |
Models will be evaluated using Macro F1-score:
where:
The final score will be scaled to 0–100, so a higher F1 results in a higher score. To achieve the maximum score, F1 must be at least 0.9.
The dataset comes from Kaggle: Kaggle Emotion Dataset