Git Product home page Git Product logo

dgl_lfm1b's Introduction

DGL_LFM1b

This repository is a custom DGL datatset created from the LFM-1b database. The database downloads and processes the full database to create one singular DGL heterogeneous graph.

The LFM-1b dataset collection more than one billion listening events, intended to be used for various music retrieval and recommendation tasks. The paper written by Schedl, M. was published in 2016 for ICMR and is directly available through the website.

In case you make use of the LFM-1b dataset in your own research, please cite the following paper:

The LFM-1b Dataset for Music Retrieval and Recommendation
Schedl, M.
Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR 2016), New York, USA, April 2016.

Additionally, the paper written by Schedl, M. and Ferwerda, B. discussing the LFM1b User Genre Profile dataset was published in 2017 for ISM. It uses Last.fm artist tags indexed with two dictionaries of genre and style descriptors (from Allmusic and Freebase) to create, for each user in LFM-1b, a preference profile as a vector over genres.

In case you make use of the LFM-1b UGP dataset in your own research, please cite the following paper:

Large-scale Analysis of Group-specific Music Genre Taste From Collaborative Tags
Schedl, M. and Ferwerda, B.
Proceedings of the 19th IEEE International Symposium on Multimedia (ISM 2017), Taichung, Taiwan, December 2017.

Requirements

This repository was built with Python 3.8.10 to install the requirements.txt file, ensure you have the correct lilbraries pre-installed:

Follow these manual imports with:

pip install -r requirements.txt

The Data

The node types of the graph:

  • User (120K)
  • Artsit (3M)
  • Album (15M)
  • Track (32M)
  • Genre (20)

The Edge types of the graph :

  • User -> Artsit (61411336)
  • Artsit -> User (61411336)
  • User -> Album (na)
  • Album -> User (na)
  • User -> Track (na)
  • Track -> User (na)
  • Artsit -> Genre (414379)
  • Genre -> Artsit (414379)
  • Album -> Artsit (14184326)
  • Artsit -> Album (14184326)
  • Track -> Artsit (27258365)
  • Artsit -> Track (27258365)

Additionally, for all the user edges:

  • User -> Artsit
  • Artsit -> User
  • User -> Album
  • Album -> User
  • User -> Track
  • Track -> User

There is 'norm_connections' edge data indicating the normalized realtive interaction count a src node had with a specified dst artist, album, track node. The 'norm_connections' edge data for all other edges is represented as a 1

Compile the dataset

To compile the LFM1b database, simply 'cd' into the root of the repository and run:

python LFM1b.py

Precurser warning:

I, the author of the repository, am using a Linux Machine with 30GB of RAM and 12GB of GPU. To run the above script, it will take the machine ~2hrs, and I am unable to store the full knowledge graph in memory

Compile a subset

To create a subset of the LFM1b database, simply 'cd' into the root of the repository and run:

python LFM1b.py --n_users 50

This provides a subset of 50 users with their correspoing listen events, and the artists/albums/tracks associated with their listening habits

The DGL Framework

The Deep Graph library (DGL) framework provides the ability to utilize the DGLDataset object to generate a customizeable dataset for the purpose of node/link/graph down stream tasks.

Once the dataset is compiled you may import the class into any file and load the precompiled graph for DGL based analysis.

from LFM1b import LFM1b 

dataset = LFM1b()
glist, glabels = dataset.load()
hg=glist[0]
print(hg)

dgl_lfm1b's People

Contributors

deancochran avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.