Git Product home page Git Product logo

apc-spotify-million-playlists-ironhack-project's Introduction

Ironhack logo

Automatic playlist continuation

Inspired by Spotify Million Playlist Dataset Challenge

Introduction

People love playlists. Spotify reported in 2008 that their users have generated over 4 bn playlists [2]. Various industry studies indicate that playlists account for a third of all music playtime [1], and over a half of users say that playlists are replacing albums in they music listening habits [2].

Playlists create benefits for consumers by providing personalised music discovery and reccomendations for various occasions, moods and themes. The importance of playlist for the music industry is also paramount, covering use cases like consumer engagement improvement, increased playtime, better music search, and also helping less known artist get discovered though automatically generated playlists.

In this project I have explored Content Based Filtering (CBF) and Collaborative Filtering (CF) with python to solve the task of automatic playlist creation based on first n tracks from a playlist or n randomly selected items from a playlist.

Dataset

Dataset comes from original [1] Spotify Million Playlist Dataset Challenge

Models

As mentioned above, the project used Collaborative Filtering (CF) and Content Based Filtering (CBF) as two main approaches

Collaborative filtering

Collaborative Filtering: This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on a set of items, A is more likely to have B's opinion for a given item than that of a randomly chosen person.

Notebooks:

  • Memory based: modeling-notebooks/CF00_Memory_based_scaled_down.ipynb
  • Model based:
    • Surprise
      • modeling-notebooks/CF01_Model_Surprise_50pct_sample.ipynb
      • modeling-notebooks/CF01_Model_Surprise_scaled_down.ipynb
      • modeling-notebooks/CF01_Model_Surprise_scaled_down.ipynb
    • Alternating Least Squares with Implicit
      • modeling-notebooks/F02_Model_ALS_Implicit_binary.ipynb - contains demo
      • modeling-notebooks/CF02_Model_ALS_Implicit_pos.ipynb - contains demo
    • SVD
      • modeling-notebooks/CF03_Model_SVD_sparse_matrix_binary_ratings.ipynb
      • modeling-notebooks/CF03_Model_SVD_sparse_matrix_pos_ratings.ipynb

Content based filtering

Content-Based Filtering: This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.

Notebooks:

  • modeling-notebooks/CBF00_audio_features.ipynb - contains demo
  • WIP: modeling-notebooks/CBF01_Audio_features_genres_data_preparation.ipynb
  • WIP: modeling-notebooks/CBF01_Audio_features_genres_model.ipynb

Evaluation

Two information retrival systems evaluation metrics were used, partially following the set up from the original challenge. Notebook and the Original Challenge Definitions. To complete the evaluation, data set is split into pseudo train and test sets. Each playlist in the test set is split into two subsets: seed tracks and hold-out, or ground thruth, tracks. Train playlists and test playlists containing only seed tracks are then used to train the models. Test playlists containing only seed tracks are then used to obtain recommendations from the models. The R-precision and NDCG of the obtained recommendation is calculated against the Ground truth

  • R-precision measures the number of retrieved relevant tracks divided by the number of known relevant tracks (i.e., the number of withheld tracks)
  • Normalised Discounted Cumulative Gain (NDGS) Discounted Cumulative Gain (DCG) measures the ranking quality of the recommended tracks, increasing when relevant tracks are placed higher in the list. Normalized DCG (NDCG) is determined by calculating the DCG and dividing it by the ideal DCG in which the recommended tracks are perfectly ranked

Environment

  • python version: python 3.8.3
  • dependencies: requirements.txt

Project Organization

├── data-processing-notebooks/     <- Notebooks with data extractions and processing
├── evaluation/                    <- Evaluation results in csv
├── modeling-notebooks/            <- Notebooks with models and evaluation
├── README.md                      <- High level readme file
├── requirements.txt               <- requirements.txt
└── src/                           <- scripts provided with data set to to obtain basic descriptive statistics of the dataset

Resources

[1] C.W. Chen, P. Lamere, M. Schedl, and H. Zamani. Recsys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys ’18), 2018.

[2] Spotify Million Playlist Dataset Challenge

[3] Spotify Web API

[4] Recommender system

[5] Introduction to recommender systems

[6] Playlists: Good Or Bad For Musicians?

[7] Spotify Sentiment Analysis

[8] Recommender Systems in Python 101

[9] Evaluate your Recommendation Engine using NDCG

[10] How to Build a Memory-Based Recommendation System using Python Surprise

[11] How to Build a Model-Based Recommendation System using Python Surprise

[12] ALS Implicit Collaborative Filtering

[13] Beginner Tutorial: Recommender Systems in Python

[14] Stop One-Hot Encoding Your Categorical Variables

[15] Brief presentation

apc-spotify-million-playlists-ironhack-project's People

Contributors

irynahorova avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.