Residency Exam
Author: Mihai Nan
🩺 Residency Exam 🤖📘
Every year, thousands of medical graduates prepare for the most difficult moment of their careers: the Residency Exam. For months, future doctors memorize, review, and solve hundreds of multiple-choice questions.
But this year, the Central Committee decided to introduce a major innovation: an automated platform that checks and evaluates answers using machine learning.
Unfortunately, the system prototype started producing errors, and the committee needs your help to fix it.
You have been given two files:
- train.csv — official questions with the correct answer
- test.csv — new questions for which you must predict the correct option (some of these will be selected for the actual residency exam 😅)
Your goal is to rebuild the automatic correction mechanism.
📊 Dataset
Each row represents a multiple-choice exam question:
- SampleID – unique identifier of the question
- Question – the question text
- Option0, Option1, Option2, Option3 – the four answer choices
- Answer – only in train.csv (0–3), indicating the correct answer
📝 Task (100 points)
Build a machine-learning model capable of predicting, for each question in test.csv, which of the four options (0–3) is the correct answer.
Your model will be evaluated using accuracy:
- Accuracy ≥ 70% → 100 points
- Accuracy ≤ 25% → 0 points
- Intermediate values are scored proportionally.
Any method is allowed: classic ML algorithms, embeddings, language models, medical BERT, etc.
📄 Submission Format
The submission.csv file must contain one row for each question in the test set.
The first line should be:
DatapointID, PredictedAnswer
where:
- DatapointID — the
SampleIDfrom the test set - PredictedAnswer — a number between 0 and 3 (the predicted correct option)
Example (SampleID = 84f328d3-fca4-422d-8fb2-19d55eb31503):
84f328d3-fca4-422d-8fb2-19d55eb31503, 2