Song popularity prediction

Author: Mihai Nan

Medium

Your best score: N/A

Problem Description

🎧 Song Popularity Prediction 🎹

The digital music industry is evolving rapidly, and streaming platforms compete to estimate how well a song will perform among listeners. Popularity is more than just a number—it influences recommendations, automated playlists, and even artists' contracts.

To train a new prediction module, you have received a dataset extracted from Spotify containing audio features and metadata of songs.

Your goal is to build a model that predicts a song’s popularity based on its numerical characteristics and several auxiliary features.

📁 Dataset

You have two files:

train.csv — songs with all columns including popularity
test.csv — songs without the popularity column (these must be predicted)

Each song has a unique identifier and a set of features:

track_id — unique song ID
artists
album_name
track_name
popularity (train only)
duration_ms
explicit
danceability
energy
key
loudness
mode
speechiness
acousticness
instrumentalness
liveness
valence
tempo
time_signature
track_genre

The main goal is to predict the popularity column for the test set.

🧠 Task (100 points)

Build a machine learning model that predicts the popularity of each song in test.csv.

Your predictions must be saved in a submission.csv file with the format:

track_id,popularity
101,42.7
102,55.1
103,38.0

where:

track_id — the song ID from the test set
popularity — a real-valued (float) predicted popularity score

📊 Evaluation

Evaluation will be performed using MAE (Mean Absolute Error):

$MAE$

The final score is based on the obtained MAE using the following rules:

MAE ≤ 5 → 100 points
MAE ≥ 20 → 0 points
Intermediate values receive proportional scores between 0 and 100.

Files

Submit Solution

Upload output file and optionally source code for evaluation.

Submission File

Click to upload or drag and drop

CSV, ZIP, etc. (MAX. 100MB)

Source Code File (optional)

Click to upload or drag and drop

Archive, notebook or code file