Autors: Mihai Nan
A dataset with poems in Romanian is provided, each written by one of 7 authors. The poems have been subsequently divided into groups of 4 verses.
The 7 authors are:
A group of 4 verses can represent:
The goal of the problem is to build a model that can predict the author of a group of 4 verses.
The training dataset, containing the following columns:
Id – unique identifier for the group of versesVersuri – content of 4 versesAutor – author of the versesExample:
| Id | Versuri | Autor |
|---|---|---|
| 0001 | Sus, pe dealuri, Toamna pune... | Mihai Eminescu |
| 0002 | Te sărut și eu și Luna... | George Toparceanu |
The testing dataset, which does not contain the Autor column:
Id – unique identifierVersuri – content of 4 versesExample:
| Id | Versuri |
|---|---|
| 120001 | Freamătul pădurii se așterne ușor... |
| 120002 | În zori de zi, zorii răsar peste sat... |
The submission file must be a CSV with the following columns:
Id – identifier of the group of versesAutor – predicted authorExample:
| Id | Autor |
|---|---|
| 120001 | Mihai Eminescu |
| 120002 | George Toparceanu |
Predictions will be compared with the actual labels and accuracy will be calculated:
accuracy = (number_of_correct_predictions / total_number_of_predictions)
The final score is calculated based on the obtained accuracy using the following rules: