twchen / lessr Goto Github PK

Handling Information Loss of Graph Neural Networks for Session-based Recommendation

License: MIT License

Python 100.00%

lessr's Introduction

LESSR

A PyTorch implementation of LESSR (Lossless Edge-order preserving aggregation and Shortcut graph attention for Session-based Recommendation) from the paper:
Handling Information Loss of Graph Neural Networks for Session-based Recommendation, Tianwen Chen and Raymong Chi-Wing Wong, KDD '20

Requirements

PyTorch 1.6.0
NumPy 1.19.1
Pandas 1.1.3
DGL 0.5.2

Usage

Install the requirements.
If you use Anaconda, you can create a conda environment with the required packages using the following command.
```
conda env create -f packages.yml
```
Activate the created conda environment.
```
conda activate lessr
```
Download and extract the datasets.
Preprocess the datasets using preprocess.py.
For example, to preprocess the Diginetica dataset, extract the file train-item-views.csv to the folder datasets/ and run the following command:
```
python preprocess.py -d diginetica -f datasets/train-item-views.csv
```
The preprocessed dataset is stored in the folder datasets/diginetica.
You can see the detailed usage of preprocess.py by running the following command:
```
python preprocess.py -h
```
Train the model using main.py.
If no arguments are passed to main.py, it will train a model using a sample dataset with default hyperparameters.
```
python main.py
```
The commands to train LESSR with suggested hyperparameters on different datasets are as follows:
```
python main.py --dataset-dir datasets/diginetica --embedding-dim 32 --num-layers 4
python main.py --dataset-dir datasets/gowalla --embedding-dim 64 --num-layers 4
python main.py --dataset-dir datasets/lastfm --embedding-dim 128 --num-layers 4
```
You can see the detailed usage of main.py by running the following command:
```
python main.py -h
```
Use your own dataset.
1. Create a subfolder in the datasets/ folder.
2. The subfolder should contain the following 3 files.
  - num_items.txt: This file contains a single integer which is the number of items in the dataset.
  - train.txt: This file contains all the training sessions.
  - test.txt: This file contains all the test sessions.
3. Each line of train.txt and test.txt represents a session, which is a list of item IDs separated by commas. Note the item IDs must be in the range of [0, num_items).
4. See the folder datasets/sample for an example of a dataset.

Citation

If you use our code in your research, please cite our paper:

@inproceedings{chen2020lessr,
    title="Handling Information Loss of Graph Neural Networks for Session-based Recommendation",
    author="Tianwen {Chen} and Raymond Chi-Wing {Wong}",
    booktitle="Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20)",
    pages="1172-–1180",
    year="2020"
}

lessr's People

Contributors

Stargazers

Watchers

lessr's Issues

Baseline code

Hi. Could you share the baseline code that on your datasets, for example, SR-GNN? Because I cannot run the baseline code on your datasets.😭

Gowalla and Last.fm

Hi, Could you please upload the two processed datasets? Because I don't know how to process the original dataset. Thanks

Data preprocessing for youchoose

Please Can You provide dataset preprocessed of youchoose. I can't reproduce results on this dataset. Something maybe wrong on your paper. If I you preprocess youchoose from sr-gnn(https://github.com/CRIPAC-DIG/SR-GNN) the statistics is not same in your paper.
And also, Did you ignored session has length higher 20 like you did in the digentical.
Thank you.

Question about EOPA

Hi, may I ask how does line 19 of lessr.py guarantee the order of m, which is the neighbors' message and in random order?

_, hn = self.gru(m)

Issue the request：RuntimeError

Hi, do you have the following problem? Now the version is corresponding to the same, I don't know where the problem appears, is there any friend who can help solve it?
Traceback (most recent call last):
File "E:/WorkSpace2--pycharm/lessr-master/main.py", line 85, in
runner.train(args.epochs, args.log_interval)
File "E:\WorkSpace2--pycharm\lessr-master\utils\train.py", line 105, in train
logits = self.model(*inputs)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "E:\WorkSpace2--pycharm\lessr-master\lessr.py", line 171, in forward
feat = self.embedding(iid)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
return F.embedding(
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\functional.py", line 1813, in embedding
no_grad_embedding_renorm(weight, input, max_norm, norm_type)
File "E:\requireSoft\anaconda38\lib\site-packages\torch\nn\functional.py", line 1733, in no_grad_embedding_renorm
torch.embedding_renorm(weight, input, max_norm, norm_type)
RuntimeError: Expected tensor for argument #2 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding_renorm)

question data preprocessing

This code chooses best model on test set instead of validation set.

At utils/train.py, the train process returns the best MRR and HIT on TEST SET, RESPECTIVELY:

def train(self, epochs, log_interval=100):
    # ...
    for epoch in range(epochs):
        self.model.train()
        for batch in self.train_loader:
        # ...
        mrr, hit = evaluate(self.model, self.test_loader, self.device)  # mrr and hit on test set
        # ...
        max_mrr = max(max_mrr, mrr)  # best mrr on test set
        max_hit = max(max_hit, hit)  # best hit on test set
        self.epoch += 1
    return max_mrr, max_hit  # <- It's NOT OK to return best mrr and hit on TEST SET, RESPECTIVELY

I believe this code is not the final version, please update it to the code used in your paper.

Full form of 'rst' and 'fc'?

What does rst and fc stand for here?

question data preprocessing

Even if the same preprocessing method is adopted, why is the number of items in diginetica 42596 instead of 43097.
Because I see that most studies (such as SR-GNN) use the same preprocessing method, the number of items in diginetica is 43097.
Thank you for your attention.

question data preprocessing

Hi, I have a question about data preprocessing. Why you use different process for different dataset? Like, for Diginetica, you didn't remove the immediate repeat. And for gowalla and lastfm, short sessions were not removed. Thanks in advance.

twchen / lessr Goto Github PK

lessr's Introduction

LESSR

Requirements

Usage

Citation

lessr's People

Contributors

Stargazers

Watchers

Forkers

lessr's Issues

Recommend Projects

Recommend Topics

Recommend Org