Royal Diamond Store
Autor: Mihai Nan
💎 Royal Diamond Store ✨
In the heart of the Kingdom lies the Royal Diamond Store, where precious stones
of all shapes and sizes are carefully kept.
The store's guardian wants to discover how valuable the diamonds are,
based on their physical and aesthetic characteristics, but the price scrolls
have been lost over time.
To help, the royal council created a dataset:
train.csv– diamonds already evaluated, with all features and their pricetest.csv– new diamonds, with features filled in, but no price
Your task is to uncover the secrets of diamond value using data analysis and predictive models.
📊 Dataset
Each row in train.csv and test.csv represents a diamond described by the following features:
- SampleID – unique diamond identifier
- carat – weight of the diamond
- cut – cut quality (
Fair,Good,Very Good,Premium,Ideal)
(quality in increasing order) - color – diamond color (
D…J), D being the best - clarity – clarity grade (from best to worst: FL, IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1, I2, I3)
- depth – depth in percentage (height from culet to table / average girdle diameter)
- table – table width as a percentage of average diameter
- x, y, z – dimensions in mm (length, width, height)
- price – only in train.csv, diamond value
📝 Tasks
Subtask 1 (10 points)
Classify diamonds in test.csv based on weight (carat):
Lightif carat < 0.5Mediumif 0.5 ≤ carat < 1.5Heavyif carat ≥ 1.5
Subtask 2 (15 points)
Calculate the proportion of depth to table for each diamond in test:
proportion = depth / table
Subtask 3 (15 points)
Determine the approximate volume of each diamond using:
volume = x * y * z
Subtask 4 (60 points)
Build a method capable of estimating diamond value (price)
for each diamond in test.csv.
Evaluation is performed using MAE (Mean Absolute Error):
- MAE (the lower, the higher the score)
🧮 Evaluation
Score for the last task:
- MAE ≤ 260 → 60 points
- MAE ≥ 300 → 0 points
- Intermediate values → proportional scoring
Subtask 1 answers are evaluated exactly.
For subtasks 2 and 3, answers are evaluated up to 2 decimal places.
📌 Notes
- Categorical variables (
cut,color,clarity) can be converted to numeric
using any method (e.g., LabelEncoder). - Any analysis and prediction technique is allowed.
📄 Submission Format
The submission.csv file must contain one line per test row
and per subtask:
subtaskID datapointID answer
- subtaskID – 1, 2, 3, or 4
- datapointID –
SampleIDfrom test - answer – calculated or predicted result
Example for a diamond with SampleID = 1023:
subtaskID datapointID answer
1 1023 Medium
2 1023 0.619
3 1023 0.34
4 1023 4578