Song popularity prediction
Author: Mihai Nan
🎧 Song Popularity Prediction 🎹
The digital music industry is evolving rapidly, and streaming platforms compete to estimate how well a song will perform among listeners. Popularity is more than just a number—it influences recommendations, automated playlists, and even artists' contracts.
To train a new prediction module, you have received a dataset extracted from Spotify containing audio features and metadata of songs.
Your goal is to build a model that predicts a song’s popularity based on its numerical characteristics and several auxiliary features.
📁 Dataset
You have two files:
- train.csv — songs with all columns including
popularity - test.csv — songs without the
popularitycolumn (these must be predicted)
Each song has a unique identifier and a set of features:
track_id— unique song IDartistsalbum_nametrack_namepopularity(train only)duration_msexplicitdanceabilityenergykeyloudnessmodespeechinessacousticnessinstrumentalnesslivenessvalencetempotime_signaturetrack_genre
The main goal is to predict the popularity column for the test set.
🧠 Task (100 points)
Build a machine learning model that predicts the popularity of each song in test.csv.
Your predictions must be saved in a submission.csv file with the format:
track_id,popularity
101,42.7
102,55.1
103,38.0
where:
track_id— the song ID from the test setpopularity— a real-valued (float) predicted popularity score
📊 Evaluation
Evaluation will be performed using MAE (Mean Absolute Error):
The final score is based on the obtained MAE using the following rules:
- MAE ≤ 5 → 100 points
- MAE ≥ 20 → 0 points
- Intermediate values receive proportional scores between 0 and 100.