Automated Loan Evaluation System
Author: Mihai Nan
💼 Automated Loan Evaluation System 💰
Modern financial institutions process thousands of loan applications daily.
To quickly and accurately decide who is eligible, banks use automated systems
based on risk models.
You are tasked with building such a system. You have access to a realistic dataset
of loan applications along with their approval history.
Your objective is to develop a model that can evaluate new applications submitted
by clients.
You have been given two files:
- train.csv - historical applications with the final decision (
loan_status) - test.csv - new applications, for which we do not have a decision
Main goal: predict a probability to classify each application as approved or rejected
(a value between 0 and 1, where 0 means definitely not approved and 1 means definitely approved).
📊 Dataset
Each record represents a loan application with demographic, financial, behavioral,
and loan-type information.
Key attributes:
customer_id- unique identifierage,occupation_status,years_employedannual_income,credit_score,credit_history_yearssavings_assets,current_debtdefaults_on_file,delinquencies_last_2yrs,derogatory_marksproduct_type,loan_intent,loan_amount,interest_ratedebt_to_income_ratio,loan_to_income_ratio,payment_to_income_ratioloan_status- only in train.csv, label to predict
The final goal is to predict loan_status for rows in test.csv.
📝 Tasks
The first three subtasks check understanding of the dataset structure.
The last subtask evaluates the classification model.
Subtask 1 (10 points)
Classify each applicant in the test set by age:
Youngifage< 30Adultif 30 ≤age< 60Seniorifage≥ 60
Subtask 2 (15 points)
Determine the risk level based on debt_to_income_ratio:
LowRiskif DTI < 20MediumRiskif 20 ≤ DTI < 40HighRiskif DTI ≥ 40
Subtask 3 (15 points)
For each row in test, calculate:
total_obligations = current_debt + derogatory_marks + delinquencies_last_2yrs
Return an integer value.
Subtask 4 (60 points)
Build a classification model that predicts loan_status (a probability p in [0,1]) for each test row.
Evaluation is performed using AUC (Area Under the ROC Curve).
🧮 Evaluation
- AUC ≥ 0.95 → 60 points
- AUC ≤ 0.80 → 0 points
- Intermediate range: proportional scoring
Subtasks 1–3 are evaluated exactly (via comparison).
📄 Submission Format
The submission.csv file must contain 4 rows for each test row,
corresponding to the 4 subtasks.
Structure:
subtaskID datapointID answer
Where:
- subtaskID - 1, 2, 3, or 4
- datapointID - the
customer_id - answer - possible values depending on the subtask:
- Subtask 1:
Young/Adult/Senior - Subtask 2:
LowRisk/MediumRisk/HighRisk - Subtask 3: integer
- Subtask 4: probability that
loan_status = 1(real number 0–1)
- Subtask 1:
Example for customer_id = 9071:
subtaskID datapointID answer
1 9071 Adult
2 9071 MediumRisk
3 9071 12
4 9071 0.742
Good luck developing the automated loan evaluation system!
📊 Evaluation Metric: ROC AUC 📈
For Subtask 4, evaluation uses ROC AUC (Area Under the ROC Curve).
This is a single measure summarizing classifier performance across all possible decision thresholds.
How to compute ROC AUC
- Plot the ROC curve, representing:
- TPR (True Positive Rate) - proportion of approved applications correctly identified
- FPR (False Positive Rate) - proportion of rejected applications incorrectly approved

- Area under the curve (AUC) is computed using the trapezoidal rule:
- The curve is divided into trapezoids using vertical lines at FPR values and horizontal lines at TPR values
- Sum the areas of the trapezoids to obtain the final AUC

- Score interpretation:
- ROC AUC = 1 🏆 → perfect classifier, all applications classified correctly
- ROC AUC = 0.5 🎲 → random classifier, no predictive power
- 0.5 < ROC AUC < 1 📈 → how well the classifier separates the classes