Auteur: Mihai Nan
Modern financial institutions process thousands of loan applications daily.
To quickly and accurately decide who is eligible, banks use automated systems
based on risk models.
You are tasked with building such a system. You have access to a realistic dataset
of loan applications along with their approval history.
Your objective is to develop a model that can evaluate new applications submitted
by clients.
You have been given two files:
loan_status)Main goal: predict a probability to classify each application as approved or rejected
(a value between 0 and 1, where 0 means definitely not approved and 1 means definitely approved).
Each record represents a loan application with demographic, financial, behavioral,
and loan-type information.
Key attributes:
customer_id - unique identifierage, occupation_status, years_employedannual_income, credit_score, credit_history_yearssavings_assets, current_debtdefaults_on_file, delinquencies_last_2yrs, derogatory_marksproduct_type, loan_intent, loan_amount, interest_ratedebt_to_income_ratio, loan_to_income_ratio, payment_to_income_ratioloan_status - only in train.csv, label to predictThe final goal is to predict loan_status for rows in test.csv.
The first three subtasks check understanding of the dataset structure.
The last subtask evaluates the classification model.
Classify each applicant in the test set by age:
Young if age < 30Adult if 30 ≤ age < 60Senior if age ≥ 60Determine the risk level based on debt_to_income_ratio:
LowRisk if DTI < 20MediumRisk if 20 ≤ DTI < 40HighRisk if DTI ≥ 40For each row in test, calculate:
total_obligations = current_debt + derogatory_marks + delinquencies_last_2yrs
Return an integer value.
Build a classification model that predicts loan_status (a probability p in [0,1]) for each test row.
Evaluation is performed using AUC (Area Under the ROC Curve).
Subtasks 1–3 are evaluated exactly (via comparison).
The submission.csv file must contain 4 rows for each test row,
corresponding to the 4 subtasks.
Structure:
subtaskID datapointID answer
Where:
customer_idYoung / Adult / SeniorLowRisk / MediumRisk / HighRiskloan_status = 1 (real number 0–1)customer_id = 9071:subtaskID datapointID answer
1 9071 Adult
2 9071 MediumRisk
3 9071 12
4 9071 0.742
Good luck developing the automated loan evaluation system!
For Subtask 4, evaluation uses ROC AUC (Area Under the ROC Curve).
This is a single measure summarizing classifier performance across all possible decision thresholds.

