Szerző: Mihai Nan
The goal is to build a text classification model that assigns each news article to one of three thematic categories:
WELLNESSENTERTAINMENTPOLITICSThe model must learn from training texts and predict the label for new texts.
This is a multi-class classification problem.
train.csvContains the text and corresponding category (label).
Each row represents a news article.
| SampleID | text | label |
|---|---|---|
| 139768 | Take a Presence Power Break (The New Coffee Break) As soon as you honor the present moment... | WELLNESS |
| 297 | Yolanda Hadid Returns To Social Media After 9-Month Break For Depression, Lyme Relapse... | ENTERTAINMENT |
| 2274 | Democrats Want Paid Sick Days, Breaks For Domestic Workers... | POLITICS |
test.csvContains only the texts for which predictions must be made.
| SampleID | text |
|---|---|
| 106057 | Taylor Swift Calls Out Sexist Critics Again |
The output file must be named submission.csv and contain two columns:
SampleID — unique identifier of the textlabel — category predicted by the model (WELLNESS, ENTERTAINMENT or POLITICS)Example:
| SampleID | label |
|---|---|
| 106057 | ENTERTAINMENT |
| 297 | ENTERTAINMENT |
| 2274 | POLITICS |
Model performance will be measured using macro F1-score, a balanced metric for multi-class classification:
where:
The final score is calculated as the arithmetic mean of F1-scores for all classes.
The dataset is derived from a collection of news articles with various themes (wellness, entertainment, politics), sourced from public media platforms.