Git Product home page Git Product logo

hkuds / sslrec Goto Github PK

View Code? Open in Web Editor NEW
366.0 9.0 43.0 105.2 MB

[WSDM'2024 Oral] "SSLRec: A Self-Supervised Learning Framework for Recommendation"

Home Page: https://arxiv.org/abs/2308.05697

License: Apache License 2.0

Python 100.00%
collaborative-filtering contrastive-learning graph-neural-networks recommender-systems self-supervised-learning deep-learning recommendation-algorithms user-behavior

sslrec's Issues

NaN problem of dcrec_seq model

Firstly, I am grateful for your valuable contribution to applying self-supervised learning in the recommendation system.

Problem Description

I discovered that the kl_loss of the DCRec model is yielding a value of nan during the training of the DCRec_seq model.

Steps to reproduce

  1. Run the code by the python main.py --model dcrec_seq
  2. Check out the log of loss, which is [Epoch 49 / 50] loss: 5.5758 cl_loss: 0.0003 kl_loss: nan at the last epoch

Problem Cause

Upon delving into the code, I found the main reason for the problem is the following code:

expected_weights_distribution = torch.normal(
            self.weight_mean, 0.1, size=mainstream_weights.size()).sort()[0].to(self.device)
        kl_loss = self.kl_weight * F.kl_div(F.log_softmax(
            mainstream_weights, dim=0).sort()[0], expected_weights_distribution, reduction="batchmean")

the expected_weights_distribution is a normal distribution with the mean of self.weight_mean and the std of 0.1. However, we should notice that the value of the expected_weights_distribution, obtained by torch.normal, may be non-positive, which will cause the nan problem during the calculation of KL divergence.

Problem Solution

Due to the problem cause, I modify the code as follows:

        expected_weights_distribution = torch.normal(
            self.weight_mean, 0.1, size=mainstream_weights.size()).sort()[0].to(self.device)
        kl_loss = self.kl_weight * F.kl_div(F.log_softmax(
            mainstream_weights, dim=0).sort()[0], F.softmax(expected_weights_distribution, dim=-1), reduction="batchmean")

I simply apply the softmax function to ensure the value of expected_weights_distribution is positive. However, I have no idea whether the distribution of the handled one is still normal.

I would appreciate your reply.

关于SSLRec运行时出现“Cuda Kernel Error”?

  • 问题描述:直接运行SSLRec中的LightGCN代码,调试到EdgeDrop是出现“Cuda Kernel Error”,作者有知道原因吗?如下是报错图片以及GPU使用情况:
    image
    image
  • 测试环境描述:
    • torch==1.11.0
    • scipy==1.11.4
    • numpy==1.26.4

非常感谢贵团队共享的代码。有一个疑问想请教贵团队

我在configurator文件中指定了3号显卡,但是报错却是0号显卡内存不足,请问这种情况我还怎么处理,感谢。

configurator.py文件

def parse_configure():
    parser = argparse.ArgumentParser(description='SSLRec')
    parser.add_argument('--model', type=str, default='SGL', help='model name')
    parser.add_argument('--dataset', type=str, default='iFashion', help='dataset name (yelp, gowalla, '
                                                                        'iFashion, beauty)')
    parser.add_argument('--device', type=str, default='cuda', help='cpu or cuda')
    parser.add_argument('--cuda', type=str, default='3', help='Device number')
    args = parser.parse_args()

    if args.device == 'cuda':
        os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda

        print('Used GPU ID:',args.cuda)

输出信息如下

Used GPU ID: 3
SGL(
(edge_dropper): EdgeDrop()
(node_dropper): NodeDrop()
)
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 300, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 1, 'patience': 20, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'ndcg'], 'k': [10, 20], 'batch_size': 2048}, 'data': {'type': 'general_cf', 'name': 'iFashion', 'user_num': 300000, 'item_num': 81613}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 2, 'reg_weight': 1e-05, 'cl_weight': 1.0, 'temperature': 0.2, 'embedding_size': 64, 'augmentation': 'edge_drop'}, 'tune': {'enable': False, 'hyperparameters': ['layer_num', 'reg_weight', 'cl_weight', 'temperature'], 'layer_num': [2, 3, 4], 'reg_weight': [0.0001, 1e-05, 1e-06, 1e-07, 1e-08], 'cl_weight': [0.01, 0.05, 0.1, 0.5, 1.0], 'temperature': [0.1, 0.2, 0.5, 1.0]}, 'device': 'cuda'}
Training Recommender: 0%| | 1/315 [00:05<27:43, 5.30s/it]

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.58 GiB (GPU 0; 23.69 GiB total capacity; 5.42 GiB already allocated; 3.26 GiB free; 10.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Process finished with exit code 1

我的显卡情况如下
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.version.cuda)
print(torch.cuda.current_device())
print(torch.cuda.get_device_name())

输出信息
True
4
11.7
0
NVIDIA GeForce RTX 3090

cuda版本问题

作者您好
请问需要cuda版本是多少,我安装了cuda10.0,提示不能用。另外,这个代码需要GPU吗,最小多大的GPU可以跑?

是否支持大数据量

你好,感谢你们的开源库,想问一下你们的开源库是否对大数据量进行支持呢,我的任务是做序列推荐,但现在我有几亿用户的历史数据量,请问你们支持如此大的数据量进行训练和推理么,如果支持的话请问需要注意什么呢

关于其他数据集

对于general_cf文件夹下的模型,如果我想在yelp数据集下比较性能,请问可以提供下各个模型的最佳超参数设置吗?

KGIN在SSLRec仓库和KGRec仓库在我的测试下差距过大

SSLRec论文中的KGIN和KGRec差距其实没有这么大。
image

LastFM数据集在我第一轮epoch时,两者差距较大。

KGIN recall@10: 0.0076 recall@20: 0.0098 recall@40: 0.0124
KGRec recall@10: 0.0418 recall@20: 0.0562 recall@40: 0.0745 

我自己的数据集也差了10倍左右。
但是如果跑贵团队的KGRec,结果差距不大,基本和论文结果差不多。
但是如果用SSLRec仓库的KGIN结果差距就很大,下面还是在LastFM数据集上运行。

KGIN recall@20 0.05926699 (KGRec)
KGIN recall@20: 0.0098 (SSLRec)

SSLRec does not seem to be able to effectively reproduce some of the baseline results

I tried to utilize the classic dataset Yelp2018 dataset used in the LightGCN and SimGCL papers. I noticed that the interface for adding new datasets was not presented in data_handler_general_cf.py, so I modified the file so that it could directly read the official code of LightGCN that presents the train.txt and the test.txt files.

After completing this step, I ran LightGCN and SimGCL according to the best-parameters of the original paper, which were set as follows:
LightGCN: batch_size = 2048, layer_num=3, reg_weight=1.0e-4, embedding_size=64
SimGCL: batch_size = 2048, layer_num=3, reg_weight=1.0e-4, cl_weight=0.5, temperature=0.2, eps=0.05, embedding_size=64

Unfortunately, LightGCN and SimGCL do not seem to train properly at all. the loss of LightGCN has been fixed at 0.6931, while the Recall@20 of SimGCL is only 0.04. Subsequently, I further tuned the parameters of SimGCL, but the highest performance is only 0.066 (the best result of the original paper is 0.072).
I have some confusion about this, and I would like to inquire if there are inconsistencies in SSLRec's sampling, training, and strategy with mainstream methods that lead to the above phenomenon.

关于lightgcn层数问题

你好!感谢您的出色工作!
请问如果我想比较不同模型在某个数据集上的性能,我需要控制它们的lightgcn层数相同吗? 还是分别取最优情况即可

SimGCL: InfoNCE Loss Calculation Batch vs. Entire Embeddings

Hello, I found out that the current implementation of the InfoNCE loss in SimGCL seems to use the entire embeddings (user_embeds2 and item_embeds2), but the original paper suggests using batch embeddings.
Could you explain this difference?

结果差异

感谢你的工作非常棒!但是我注意到一个问题,在这个框架中一些模型跑的结果,和模型原论文的原码的结果不一致,差别挺大的。这是什么原因呢

关于模型评测指标数值的问题

您好! 我发现在相同的数据集(Retail Rocket dataset)下,SSLRec 和 原始论文(CML)的 Metrics (NDCG@10)数值量级差距比较大,这是为什么呢?
原始论文:
cml

SSLRec:
sslrec

The SGL results in SSLRec seem overly distinguished.

Thank you very much for your outstanding work.

When I use SGL to run the Tiktok, Amazon_sports dataset, the final result is very good, and has surpassed most of the baselines. here are my parameters and training results, is there something I have set up incorrectly?
(The source of the dataset is "MMSSL: Multi-Modal Self-Supervised Learning for Recommendation" from your team).

Amazon_sports
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 300, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 10, 'patience': 5, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'precision', 'ndcg'], 'k': [20], 'batch_size': 1024}, 'data': {'type': 'general_cf', 'name': 'sports', 'user_num': 35598, 'item_num': 18357}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 2, 'reg_weight': 1e-05, 'cl_weight': 0.01, 'temperature': 0.1, 'embedding_size': 64, 'augmentation': 'node_drop'}

Validation set [recall@20: 0.0980 ] Validation set [precision@20: 0.0055 ] Validation set [ndcg@20: 0.0444 ]
Test set [recall@20: 0.0963 ] Test set [precision@20: 0.0051 ] Test set [ndcg@20: 0.0437 ]

Tiktok
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 600, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 10, 'patience': 5, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'ndcg'], 'k': [10, 20], 'batch_size': 1024}, 'data': {'type': 'general_cf', 'name': 'tiktok', 'user_num': 9308, 'item_num': 6710}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 3, 'reg_weight': 1e-05, 'cl_weight': 1.0, 'temperature': 0.8, 'embedding_size': 64, 'augmentation': 'node_drop'}

Validation set [recall@20: 0.0993 ] Validation set [ndcg@20: 0.0429 ]
Test set [recall@20: 0.0905 ] Test set [ndcg@20: 0.0379 ]

Results of SGL in MMSSL.
Amazon_sports recall@20: 0.0779 ndcg@20: 0.0361
Tiktok recall@20: 0.0603 ndcg@20: 0.0238

Sincerely.

Model performance

I want to know model performance reproduced by SSLRec for General Collaborative Filtering on Yelp and Amazon dataset. If there are experimental results, could you show them to me? Thank you a lot.

增加social rec的模型

您好,非常优秀的工作,贵组以后是否考虑增加social rec的模型,我看到DGNN还没有加入到这个框架中

关于评价指标

您好,非常感谢团队共享的代码,并已经成功复现。但是复现过程中我发现多行为模型(例如CML,KMCLR)跑出来的评价指标值远低于原文。比如当数据集用IJCAI_15时. CML原文作者提供的项目代码NDGC@10能达到0.283,而我用贵团队的代码复现出来NDGC@10只有0.0224。作为一名小白,我想不明白为什么同一个模型同一个数据集且同样的评价指标会相差十倍,可否为我解答困惑??感激不尽。

About KG Data

您好!
我在看KG的模型, 发现eval似乎又个bug? KGTestDataset中只返回了u但是在eval中调用的是 <u,i>.
抱歉还没有细看评测的代码, 希望能拨冗解答, 祝好🙏

# dataset_kg.py
class KGTestDataset(data.Dataset):
    def __init__(self, test_user_dict) -> None:
        self.user_pos_lists = test_user_dict
        self.test_users = np.array(list(test_user_dict.keys()))

    def __getitem__(self, idx):
        return self.test_users[idx]

# metrics.py
class Metric(object):
    def eval_at_one_forward(self, model, test_dataloader):
        for _, tem in enumerate(test_dataloader):
            batch_u, batch_i = batch_data[:2]

另, 有一个问题想咨询, 我想做一个融合KG的Sequential模型, 不知道在本框架下, 关于数据集的格式上您是否有建议? (以及您是否了解相关的 KG+SRS 的数据?)

关于多行为推荐中数据集问题

您好,非常感谢贵团队共享的代码。
我对SSLRec中的多行为数据集存在一些疑问 。
以dataset/tmall 数据集为例:

  1. meta_multi_single_beh_user_index_shuffle 的作用是什么
  2. train_mat_pv_buy.pkl ,train_mat_pv_fav_buy.pkl 等这类命名规则的文件分别代表什么含义
  3. train_mat.pkl 与其他文件的关系。

非常期待您的回复!

此外,是否可以考虑增加一些简单的数据说明文档📝来增强框架代码的易读性。

关于构建自己的kg数据集

非常感谢贵团队开源的代码!阅读之后收获颇丰!

我们希望能在其它更多的数据集上进行实验,但是苦于没有数据集文件生成的相关说明,如datasets/kg下各个数据集生成的详细过程,因此我无法生成其它数据集的相关文件。期待贵团队回复❤十分感谢!

训练集测试集的划分

你好,感谢代码的贡献。
想问一下关于multi behavior的tmall数据集中,train set和test是怎么划分的呢?

保存模型参数和优化器参数

我想请教一下:保存模型的时候,同时保存优化器参数会不会更好?

模型配置文件:只有save_model为True和False的选项,没有关于优化器的。
train.py文件:我看到train.py文件中的save_model函数,里面未对优化器保存。

还是说保存优化器参数的代码在其他地方写了,只是我没看见。

希望解答一下 😃

pickle加载文件时报错ModuleNotFoundError: No module named 'scipy.sparse._csr',scipy库版本为1.7.3,希望得到作者的解答,万分感谢

作者您好:
我在复现多行为模型kmclr时使用pickle.load加载数据集文件出现如下报错:

Traceback (most recent call last):
File "C:\Users\zwl\Downloads\SSLRec-main\data_utils\data_handler_multi_behavior.py", line 49, in _load_data
data = pickle.load(fs)
ModuleNotFoundError: No module named 'scipy.sparse._csr'

   而且我的scipy库版本也是1.7.3。请问您知道这是为什么吗?期待您的答复

Model leaderboard

您好!
请问SSL是否有相关模型+数据的实验结果 (最好能和原始论文的结果进行比较). 没有实验结果的话, 我们自己跑之后, 对于实验的指标没有可比较的对象~

祝好🙏

如何构建自己的数据集

很棒的工作,非常感谢贵团队开源的代码。

我们希望能够在更多数据集上进行实验,然而,我们没有找到关于数据集文件的说明,例如datasets/social/epinions/下的所有文件,我们不知道应该如何基于自己的数据集生成框架所需的数据文件

非常期待贵团队的回复~

The vision of DGL Package

Hello, thank you very much for developing this framework. What is the version of the DGL library used by this framework? Looking forward to your reply。

DuoRec小疑问

如下图,为什么要算两次呢,这样会拖慢速度吧?
image

About the KG dataset

您好! 请问如何下载KG的数据? 例如下面的mind数据?
Hello! Could you give the source how to get the KG data like mind?

class DataHandlerKG:
    def __init__(self) -> None:
        if configs['data']['name'] == 'mind':
            predir = './datasets/kg/mind_kg/'
        elif configs['data']['name'] == 'amazon-book':
            predir = './datasets/kg/amazon-book_kg/'
        elif configs['data']['name'] == 'last-fm':
            predir = './datasets/kg/last-fm_kg/'

关于数据集筛选的问题

您好!非常感谢您的工作!
我注意到在《Debiased Contrastive Learning for Sequential Recommendation》论文中,也就是框架中的DCRec模型,没有提供数据集选择数据方法的详细信息。比如ML-20M数据集,包含2000万个数据点,但实验中只利用了其中的一小部分。因此,我想请求你的帮助。
请问您能否将实验中使用的筛选后的原始数据集以CSV格式分享或者告知一下数据划分方法,仅供学术交流之用。如果您能够提供数据集,我将非常感激,并会在我的研究中引用您的工作。如果您需要更多信息或者有其他问题,请随时与我联系。

关于dgl的cuda版本适配问题

在运行DCRec的时候,提示需要安装cuda的版本才能继续,我去dgl官网选了cuda版本:2.2.1+cu121,但在我这边出现了一些和torch库和其他库不兼容的问题。
我的系统是win,朋友在ubuntu上没有遇到类似问题,请问这种情况应该怎么解决?非常感谢!
截屏2024-06-30 19 04 39

[Warning Report] "is" with a literal

Hello, thank you for your valuable work.

Problem description:
I found the following warning when running the project by python main.py --model LightGCN --device 'cpu':
/SSLRec/models/loss_utils.py:96: SyntaxWarning: "is" with a literal. Did you mean "=="? if reduce is 'mean':

Problem cause:
Starting from Python 3.8, using "is" and "is not" operators will raise a SyntaxWarning.

Problem solution:
Replace "is" and "is not" with "=="" and "!="" in the corresponding statements.

By the way, there is also another RuntimeWarning in /SSLRec/data_utils/data_handler_general_cf.py:47:
RuntimeWarning: divide by zero encountered in power d_inv_sqrt = np.reshape(np.power(degree, -0.5), [-1])

It seems that there is a zero in the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.