Text Classification into Thematic Categories
Автор: Mihai Nan
📰 Text Classification into Thematic Categories
📘 Problem Description
The goal is to build a text classification model that assigns each news article to one of three thematic categories:
WELLNESSENTERTAINMENTPOLITICS
The model must learn from training texts and predict the label for new texts.
This is a multi-class classification problem.
🔹 Data Structure
train.csv
Contains the text and corresponding category (label).
Each row represents a news article.
| SampleID | text | label |
|---|---|---|
| 139768 | Take a Presence Power Break (The New Coffee Break) As soon as you honor the present moment... | WELLNESS |
| 297 | Yolanda Hadid Returns To Social Media After 9-Month Break For Depression, Lyme Relapse... | ENTERTAINMENT |
| 2274 | Democrats Want Paid Sick Days, Breaks For Domestic Workers... | POLITICS |
test.csv
Contains only the texts for which predictions must be made.
| SampleID | text |
|---|---|
| 106057 | Taylor Swift Calls Out Sexist Critics Again |
📤 Submission
The output file must be named submission.csv and contain two columns:
SampleID— unique identifier of the textlabel— category predicted by the model (WELLNESS,ENTERTAINMENTorPOLITICS)
Example:
| SampleID | label |
|---|---|
| 106057 | ENTERTAINMENT |
| 297 | ENTERTAINMENT |
| 2274 | POLITICS |
⚙️ Evaluation
Model performance will be measured using macro F1-score, a balanced metric for multi-class classification:
where:
The final score is calculated as the arithmetic mean of F1-scores for all classes.
📊 Source
The dataset is derived from a collection of news articles with various themes (wellness, entertainment, politics), sourced from public media platforms.
🗂️ Useful Resources
- Complete Starter Kit – contains a skeleton from which you can start solving the problem