Automated Loan Evaluation System

Author: Mihai Nan

Easy

Your best score: N/A

Problem Description

💼 Automated Loan Evaluation System 💰

Modern financial institutions process thousands of loan applications daily.
To quickly and accurately decide who is eligible, banks use automated systems
based on risk models.

You are tasked with building such a system. You have access to a realistic dataset
of loan applications along with their approval history.
Your objective is to develop a model that can evaluate new applications submitted
by clients.

You have been given two files:

train.csv - historical applications with the final decision (loan_status)
test.csv - new applications, for which we do not have a decision

Main goal: predict a probability to classify each application as approved or rejected
(a value between 0 and 1, where 0 means definitely not approved and 1 means definitely approved).

📊 Dataset

Each record represents a loan application with demographic, financial, behavioral,
and loan-type information.

Key attributes:

customer_id - unique identifier
age, occupation_status, years_employed
annual_income, credit_score, credit_history_years
savings_assets, current_debt
defaults_on_file, delinquencies_last_2yrs, derogatory_marks
product_type, loan_intent, loan_amount, interest_rate
debt_to_income_ratio, loan_to_income_ratio, payment_to_income_ratio
loan_status - only in train.csv, label to predict

The final goal is to predict loan_status for rows in test.csv.

📝 Tasks

The first three subtasks check understanding of the dataset structure.
The last subtask evaluates the classification model.

Subtask 1 (10 points)

Classify each applicant in the test set by age:

Young if age < 30
Adult if 30 ≤ age < 60
Senior if age ≥ 60

Subtask 2 (15 points)

Determine the risk level based on debt_to_income_ratio:

LowRisk if DTI < 20
MediumRisk if 20 ≤ DTI < 40
HighRisk if DTI ≥ 40

Subtask 3 (15 points)

For each row in test, calculate:

total_obligations = current_debt + derogatory_marks + delinquencies_last_2yrs

Return an integer value.

Subtask 4 (60 points)

Build a classification model that predicts loan_status (a probability p in [0,1]) for each test row.

Evaluation is performed using AUC (Area Under the ROC Curve).

🧮 Evaluation

AUC ≥ 0.95 → 60 points
AUC ≤ 0.80 → 0 points
Intermediate range: proportional scoring

Subtasks 1–3 are evaluated exactly (via comparison).

📄 Submission Format

The submission.csv file must contain 4 rows for each test row,
corresponding to the 4 subtasks.

Structure:

subtaskID datapointID answer

Where:

subtaskID - 1, 2, 3, or 4
datapointID - the customer_id
answer - possible values depending on the subtask:
- Subtask 1: Young / Adult / Senior
- Subtask 2: LowRisk / MediumRisk / HighRisk
- Subtask 3: integer
- Subtask 4: probability that loan_status = 1 (real number 0–1)

Example for `customer_id = 9071`:

subtaskID datapointID answer
1 9071 Adult
2 9071 MediumRisk
3 9071 12
4 9071 0.742

Good luck developing the automated loan evaluation system!

📊 Evaluation Metric: ROC AUC 📈

For Subtask 4, evaluation uses ROC AUC (Area Under the ROC Curve).
This is a single measure summarizing classifier performance across all possible decision thresholds.

How to compute ROC AUC

Plot the ROC curve, representing:
- TPR (True Positive Rate) - proportion of approved applications correctly identified
- FPR (False Positive Rate) - proportion of rejected applications incorrectly approved

Area under the curve (AUC) is computed using the trapezoidal rule:
- The curve is divided into trapezoids using vertical lines at FPR values and horizontal lines at TPR values
- Sum the areas of the trapezoids to obtain the final AUC

Score interpretation:
- ROC AUC = 1 🏆 → perfect classifier, all applications classified correctly
- ROC AUC = 0.5 🎲 → random classifier, no predictive power
- 0.5 < ROC AUC < 1 📈 → how well the classifier separates the classes

Files

Submit Solution

Upload output file and optionally source code for evaluation.

Submission File

Click to upload or drag and drop

CSV, ZIP, etc. (MAX. 100MB)

Source Code File (optional)