Git Product home page Git Product logo

vl6_recsys2018's Introduction

ACM RecSys Challenge 2018 Team vl6

Team members: Maksims Volkovs, Himanshu Rai, Zhaoyue Cheng, Yichao Lu (University of Toronto), Ga Wu (University of Toronto, Vector Institute), Scott Sanner (University of Toronto, Vector Institute)

Contact: [email protected]

This repository contains the Java implementation of our entries for both main and creative tracks. Our approach consists of a two-stage model where in the first stage a blend of collaborative filtering methods is used to quickly retrieve a set of candidate songs for each playlist with high recall. Then in the second stage a pairwise playlist-song gradient boosting model is used to re-rank the retrieved candidates and maximize precision at the top of the recommended list.

The model is implemented in Java and tested on the following environment:

  • Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
  • 256GB RAM
  • Nvidia Titan V
  • Java Oracle 1.8.0_171
  • Python, Numpy 1.14.3, Sklearn 0.19.1, Scipy 1.1.0
  • Apache Maven 3.3.9
  • CUDA 8.0 and CUDNN 8.0
  • Intel MKL 2018.1.038
  • XGBoost and XGBoost4j 0.7

All models are executed from src/main/java/main/Executor.java, the main function has examples on how to do main and creative track model training, evaluation and submission. To run the model:

  • Set all paths:
//OAuth token for spotify creative api, if doing creative track submission
String authToken = "";

// path to song audio feature file, if doing creative track submission
String creativeTrackFile = "/home/recsys2018/data/song_audio_features.txt";

// path to MPD directory with the JSON files
String trainPath = "/home/recsys2018/data/train/";

// path to challenge set JSON file
String testFile = "/home/recsys2018/data/test/challenge_set.json";

// path to python SVD script included in the repo, default location: script/svd_py.py
String pythonScriptPath = "/home/recsys2018/script/svd_py.py";

//path to cache folder for temp storage, at least 20GB should be available in this folder
String cachePath = "/home/recsys2018/cache/";
  • Compile and execute with maven:
export MAVEN_OPTS="-Xms150g -Xmx150g"
mvn clean compile
mvn exec:java -Dexec.mainClass="main.Executor" 

Note that by default the code is executing model for the main track, to run the creative track model set xgbParams.doCreative = true. For the creative track we extracted extra song features from the Spotify Audio API. We were able to match most songs from the challenge Million Playlist Dataset, and used the following fields for further feature extraction: [acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence]. In order to download the data for this track, you need to get the OAuth Token from Spotify API page and assign it to the authToken variable in the Executor.main function.

We prioritized speed over memory for this project so you'll need at least 100GB of RAM to run model training and inference. The full end-to-end runtime takes approximately 1.5 days.

vl6_recsys2018's People

Contributors

mvolkovs avatar himsr avatar zhaoyuecheng avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.