Git Product home page Git Product logo

lessr's Introduction

LESSR

A PyTorch implementation of LESSR (Lossless Edge-order preserving aggregation and Shortcut graph attention for Session-based Recommendation) from the paper:
Handling Information Loss of Graph Neural Networks for Session-based Recommendation, Tianwen Chen and Raymong Chi-Wing Wong, KDD '20

Requirements

  • PyTorch 1.6.0
  • NumPy 1.19.1
  • Pandas 1.1.3
  • DGL 0.5.2

Usage

  1. Install the requirements.
    If you use Anaconda, you can create a conda environment with the required packages using the following command.

    conda env create -f packages.yml

    Activate the created conda environment.

    conda activate lessr
    
  2. Download and extract the datasets.

  3. Preprocess the datasets using preprocess.py.
    For example, to preprocess the Diginetica dataset, extract the file train-item-views.csv to the folder datasets/ and run the following command:

    python preprocess.py -d diginetica -f datasets/train-item-views.csv

    The preprocessed dataset is stored in the folder datasets/diginetica.
    You can see the detailed usage of preprocess.py by running the following command:

    python preprocess.py -h
  4. Train the model using main.py.
    If no arguments are passed to main.py, it will train a model using a sample dataset with default hyperparameters.

    python main.py

    The commands to train LESSR with suggested hyperparameters on different datasets are as follows:

    python main.py --dataset-dir datasets/diginetica --embedding-dim 32 --num-layers 4
    python main.py --dataset-dir datasets/gowalla --embedding-dim 64 --num-layers 4
    python main.py --dataset-dir datasets/lastfm --embedding-dim 128 --num-layers 4

    You can see the detailed usage of main.py by running the following command:

    python main.py -h
  5. Use your own dataset.

    1. Create a subfolder in the datasets/ folder.
    2. The subfolder should contain the following 3 files.
      • num_items.txt: This file contains a single integer which is the number of items in the dataset.
      • train.txt: This file contains all the training sessions.
      • test.txt: This file contains all the test sessions.
    3. Each line of train.txt and test.txt represents a session, which is a list of item IDs separated by commas. Note the item IDs must be in the range of [0, num_items).
    4. See the folder datasets/sample for an example of a dataset.

Citation

If you use our code in your research, please cite our paper:

@inproceedings{chen2020lessr,
    title="Handling Information Loss of Graph Neural Networks for Session-based Recommendation",
    author="Tianwen {Chen} and Raymond Chi-Wing {Wong}",
    booktitle="Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20)",
    pages="1172-–1180",
    year="2020"
}

lessr's People

Contributors

twchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lessr's Issues

Baseline code

Hi. Could you share the baseline code that on your datasets, for example, SR-GNN? Because I cannot run the baseline code on your datasets.😭

Gowalla and Last.fm

Hi, Could you please upload the two processed datasets? Because I don't know how to process the original dataset. Thanks

Data preprocessing for youchoose

Please Can You provide dataset preprocessed of youchoose. I can't reproduce results on this dataset. Something maybe wrong on your paper. If I you preprocess youchoose from sr-gnn(https://github.com/CRIPAC-DIG/SR-GNN) the statistics is not same in your paper.
And also, Did you ignored session has length higher 20 like you did in the digentical.
Thank you.

Question about EOPA

Hi, may I ask how does line 19 of lessr.py guarantee the order of m, which is the neighbors' message and in random order?

_, hn = self.gru(m)

Issue the request:RuntimeError

Hi, do you have the following problem? Now the version is corresponding to the same, I don't know where the problem appears, is there any friend who can help solve it?
Traceback (most recent call last):
File "E:/WorkSpace2--pycharm/lessr-master/main.py", line 85, in
runner.train(args.epochs, args.log_interval)
File "E:\WorkSpace2--pycharm\lessr-master\utils\train.py", line 105, in train
logits = self.model(*inputs)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "E:\WorkSpace2--pycharm\lessr-master\lessr.py", line 171, in forward
feat = self.embedding(iid)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
return F.embedding(
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\functional.py", line 1813, in embedding
no_grad_embedding_renorm(weight, input, max_norm, norm_type)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\functional.py", line 1733, in no_grad_embedding_renorm
torch.embedding_renorm
(weight, input, max_norm, norm_type)
RuntimeError: Expected tensor for argument #2 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding_renorm
)

This code chooses best model on test set instead of validation set.

At utils/train.py, the train process returns the best MRR and HIT on TEST SET, RESPECTIVELY:

def train(self, epochs, log_interval=100):
    # ...
    for epoch in range(epochs):
        self.model.train()
        for batch in self.train_loader:
        # ...
        mrr, hit = evaluate(self.model, self.test_loader, self.device)  # mrr and hit on test set
        # ...
        max_mrr = max(max_mrr, mrr)  # best mrr on test set
        max_hit = max(max_hit, hit)  # best hit on test set
        self.epoch += 1
    return max_mrr, max_hit  # <- It's NOT OK to return best mrr and hit on TEST SET, RESPECTIVELY

I believe this code is not the final version, please update it to the code used in your paper.

question data preprocessing

Even if the same preprocessing method is adopted, why is the number of items in diginetica 42596 instead of 43097.
Because I see that most studies (such as SR-GNN) use the same preprocessing method, the number of items in diginetica is 43097.
Thank you for your attention.

question data preprocessing

Hi, I have a question about data preprocessing. Why you use different process for different dataset? Like, for Diginetica, you didn't remove the immediate repeat. And for gowalla and lastfm, short sessions were not removed. Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.