作者: Mihai Nan
Every year, thousands of medical graduates prepare for the most difficult moment of their careers: the Residency Exam. For months, future doctors memorize, review, and solve hundreds of multiple-choice questions.
But this year, the Central Committee decided to introduce a major innovation: an automated platform that checks and evaluates answers using machine learning.
Unfortunately, the system prototype started producing errors, and the committee needs your help to fix it.
You have been given two files:
Your goal is to rebuild the automatic correction mechanism.
Each row represents a multiple-choice exam question:
Build a machine-learning model capable of predicting, for each question in test.csv, which of the four options (0–3) is the correct answer.
Your model will be evaluated using accuracy:
Any method is allowed: classic ML algorithms, embeddings, language models, medical BERT, etc.
The submission.csv file must contain one row for each question in the test set.
The first line should be:
DatapointID, PredictedAnswer
where:
SampleID from the test set84f328d3-fca4-422d-8fb2-19d55eb31503):84f328d3-fca4-422d-8fb2-19d55eb31503, 2