Git Product home page Git Product logo

citeulike-a's Introduction

Dataset citeulike-a for CTRSR

This dataset, citeulike-a, was used in the IJCAI paper 'Collaborative Topic Regression with Social Regularization' [Wang, Chen and Li]. It was collected from CiteULike and Google Scholar. CiteULike allows users to create their own collections of articles. There are abstracts, titles, and tags for each article. Other information like authors, groups, posting time, and keywords is not used in this paper 'Collaborative Topic Regression with Social Regularization' [Wang, Chen and Li]. The details can be found at http://www.citeulike.org/faq/data.adp.

It is partly from [Wang and Blei]. Note that the original dataset in [Wang and Blei] does not contain relations between items. We collect the tag information from CiteULike and citations from Google Scholar.

The text information (item content) of citeulike-a is preprocessed by following the same procedure as that in [Wang and Blei].

Some statistics are listed as follows:

Entity Total Number
users 5551
items 16980
tags 46391
citations 44709
user-item pairs 204987

DATA FILES citations.dat citations between articles item-tag.dat tags corresponding to articles, one line corresponds to tags of one article (note that this is the version prior to preprocess thus would have more tags than used in the paper) mult.dat bag of words for each article raw-data.csv raw data tags.dat tags, sorted by tag-id's users.dat rating matrix (user-item matrix) vocabulary.dat corresponding words for file mult.dat

Data Format

In citation.dat, each line corresponds to the edges linked to one node. For example, Line 1: 3 2 485 3284 means there are 3 edges linked to node 0 (0-based indexing), their (0-based) ID's are 2, 485, and 3284. In other words article 'The metabolic world of escherichia coli is not small' is linked to 'Exploring complex networks', 'Community structure in social and biological networks', and 'Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms'.

The file tags.dat lists all the tags.

The file item-tag.dat tells us what tags each node (article) has. For example, the first line: '17 4276 32443 37837 3378 7650 44590 42810 28819 43806 3805 25497 23779 42140 12234 37386 30698 43503' means that node 0 has 17 tags, and their ID's are 4276, ..., 43503.

raw-data.csv: Note that different from other dat files that use 0-based indexing (ID counts from 0), raw-data.csv uses 1-based indexing (ID counts from 1).

See Also

Another similar dataset from CiteULike, citeulike-t.

Reference:

Collaborative Topic Regression with Social Regularization

@inproceedings{DBLP:conf/ijcai/WangCL13,
  author    = {Hao Wang and
               Binyi Chen and
               Wu-Jun Li},
  title     = {Collaborative Topic Regression with Social Regularization
               for Tag Recommendation},
  booktitle = {IJCAI},
  year      = {2013}
}

citeulike-a's People

Contributors

js05212 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.