Avtor: Mihai Nan
The digital music industry is evolving rapidly, and streaming platforms compete to estimate how well a song will perform among listeners. Popularity is more than just a number—it influences recommendations, automated playlists, and even artists' contracts.
To train a new prediction module, you have received a dataset extracted from Spotify containing audio features and metadata of songs.
Your goal is to build a model that predicts a song’s popularity based on its numerical characteristics and several auxiliary features.
You have two files:
popularitypopularity column (these must be predicted)Each song has a unique identifier and a set of features:
track_id — unique song IDartistsalbum_nametrack_namepopularity (train only)duration_msexplicitdanceabilityenergykeyloudnessmodespeechinessacousticnessinstrumentalnesslivenessvalencetempotime_signaturetrack_genreThe main goal is to predict the popularity column for the test set.
Build a machine learning model that predicts the popularity of each song in test.csv.
Your predictions must be saved in a submission.csv file with the format:
track_id,popularity
101,42.7
102,55.1
103,38.0
where:
track_id — the song ID from the test setpopularity — a real-valued (float) predicted popularity scoreEvaluation will be performed using MAE (Mean Absolute Error):
The final score is based on the obtained MAE using the following rules: