hkuds / sslrec Goto Github PK
View Code? Open in Web Editor NEW[WSDM'2024 Oral] "SSLRec: A Self-Supervised Learning Framework for Recommendation"
Home Page: https://arxiv.org/abs/2308.05697
License: Apache License 2.0
[WSDM'2024 Oral] "SSLRec: A Self-Supervised Learning Framework for Recommendation"
Home Page: https://arxiv.org/abs/2308.05697
License: Apache License 2.0
Firstly, I am grateful for your valuable contribution to applying self-supervised learning in the recommendation system.
I discovered that the kl_loss
of the DCRec model is yielding a value of nan
during the training of the DCRec_seq model.
python main.py --model dcrec_seq
[Epoch 49 / 50] loss: 5.5758 cl_loss: 0.0003 kl_loss: nan
at the last epochUpon delving into the code, I found the main reason for the problem is the following code:
expected_weights_distribution = torch.normal(
self.weight_mean, 0.1, size=mainstream_weights.size()).sort()[0].to(self.device)
kl_loss = self.kl_weight * F.kl_div(F.log_softmax(
mainstream_weights, dim=0).sort()[0], expected_weights_distribution, reduction="batchmean")
the expected_weights_distribution
is a normal distribution with the mean of self.weight_mean
and the std of 0.1
. However, we should notice that the value of the expected_weights_distribution
, obtained by torch.normal
, may be non-positive, which will cause the nan problem during the calculation of KL divergence.
Due to the problem cause, I modify the code as follows:
expected_weights_distribution = torch.normal(
self.weight_mean, 0.1, size=mainstream_weights.size()).sort()[0].to(self.device)
kl_loss = self.kl_weight * F.kl_div(F.log_softmax(
mainstream_weights, dim=0).sort()[0], F.softmax(expected_weights_distribution, dim=-1), reduction="batchmean")
I simply apply the softmax
function to ensure the value of expected_weights_distribution
is positive. However, I have no idea whether the distribution of the handled one is still normal.
I would appreciate your reply.
我在configurator文件中指定了3号显卡,但是报错却是0号显卡内存不足,请问这种情况我还怎么处理,感谢。
configurator.py文件
def parse_configure():
parser = argparse.ArgumentParser(description='SSLRec')
parser.add_argument('--model', type=str, default='SGL', help='model name')
parser.add_argument('--dataset', type=str, default='iFashion', help='dataset name (yelp, gowalla, '
'iFashion, beauty)')
parser.add_argument('--device', type=str, default='cuda', help='cpu or cuda')
parser.add_argument('--cuda', type=str, default='3', help='Device number')
args = parser.parse_args()
if args.device == 'cuda':
os.environ['CUDA_VISIBLE_DEVICES'] = args.cuda
print('Used GPU ID:',args.cuda)
输出信息如下
Used GPU ID: 3
SGL(
(edge_dropper): EdgeDrop()
(node_dropper): NodeDrop()
)
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 300, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 1, 'patience': 20, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'ndcg'], 'k': [10, 20], 'batch_size': 2048}, 'data': {'type': 'general_cf', 'name': 'iFashion', 'user_num': 300000, 'item_num': 81613}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 2, 'reg_weight': 1e-05, 'cl_weight': 1.0, 'temperature': 0.2, 'embedding_size': 64, 'augmentation': 'edge_drop'}, 'tune': {'enable': False, 'hyperparameters': ['layer_num', 'reg_weight', 'cl_weight', 'temperature'], 'layer_num': [2, 3, 4], 'reg_weight': [0.0001, 1e-05, 1e-06, 1e-07, 1e-08], 'cl_weight': [0.01, 0.05, 0.1, 0.5, 1.0], 'temperature': [0.1, 0.2, 0.5, 1.0]}, 'device': 'cuda'}
Training Recommender: 0%| | 1/315 [00:05<27:43, 5.30s/it]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.58 GiB (GPU 0; 23.69 GiB total capacity; 5.42 GiB already allocated; 3.26 GiB free; 10.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Process finished with exit code 1
我的显卡情况如下
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.version.cuda)
print(torch.cuda.current_device())
print(torch.cuda.get_device_name())
输出信息
True
4
11.7
0
NVIDIA GeForce RTX 3090
作者您好
请问需要cuda版本是多少,我安装了cuda10.0,提示不能用。另外,这个代码需要GPU吗,最小多大的GPU可以跑?
你好,感谢你们的开源库,想问一下你们的开源库是否对大数据量进行支持呢,我的任务是做序列推荐,但现在我有几亿用户的历史数据量,请问你们支持如此大的数据量进行训练和推理么,如果支持的话请问需要注意什么呢
对于general_cf文件夹下的模型,如果我想在yelp数据集下比较性能,请问可以提供下各个模型的最佳超参数设置吗?
您好,非常感谢您的工作!
关于多行为数据集中,我发现其中的retail_rocket数据集中的test.txt内容是空的,请问是否可以进行更改补充完整。非常感谢!
想问下如何使用自己的数据集呢
SSLRec论文中的KGIN和KGRec差距其实没有这么大。
LastFM数据集在我第一轮epoch时,两者差距较大。
KGIN recall@10: 0.0076 recall@20: 0.0098 recall@40: 0.0124
KGRec recall@10: 0.0418 recall@20: 0.0562 recall@40: 0.0745
我自己的数据集也差了10倍左右。
但是如果跑贵团队的KGRec,结果差距不大,基本和论文结果差不多。
但是如果用SSLRec仓库的KGIN结果差距就很大,下面还是在LastFM数据集上运行。
KGIN recall@20 0.05926699 (KGRec)
KGIN recall@20: 0.0098 (SSLRec)
I tried to utilize the classic dataset Yelp2018 dataset used in the LightGCN and SimGCL papers. I noticed that the interface for adding new datasets was not presented in data_handler_general_cf.py, so I modified the file so that it could directly read the official code of LightGCN that presents the train.txt and the test.txt files.
After completing this step, I ran LightGCN and SimGCL according to the best-parameters of the original paper, which were set as follows:
LightGCN: batch_size = 2048, layer_num=3, reg_weight=1.0e-4, embedding_size=64
SimGCL: batch_size = 2048, layer_num=3, reg_weight=1.0e-4, cl_weight=0.5, temperature=0.2, eps=0.05, embedding_size=64
Unfortunately, LightGCN and SimGCL do not seem to train properly at all. the loss of LightGCN has been fixed at 0.6931, while the Recall@20 of SimGCL is only 0.04. Subsequently, I further tuned the parameters of SimGCL, but the highest performance is only 0.066 (the best result of the original paper is 0.072).
I have some confusion about this, and I would like to inquire if there are inconsistencies in SSLRec's sampling, training, and strategy with mainstream methods that lead to the above phenomenon.
你好!感谢您的出色工作!
请问如果我想比较不同模型在某个数据集上的性能,我需要控制它们的lightgcn层数相同吗? 还是分别取最优情况即可
Hello, I found out that the current implementation of the InfoNCE loss in SimGCL seems to use the entire embeddings (user_embeds2 and item_embeds2), but the original paper suggests using batch embeddings.
Could you explain this difference?
感谢你的工作非常棒!但是我注意到一个问题,在这个框架中一些模型跑的结果,和模型原论文的原码的结果不一致,差别挺大的。这是什么原因呢
Thank you very much for your outstanding work.
When I use SGL to run the Tiktok, Amazon_sports dataset, the final result is very good, and has surpassed most of the baselines. here are my parameters and training results, is there something I have set up incorrectly?
(The source of the dataset is "MMSSL: Multi-Modal Self-Supervised Learning for Recommendation" from your team).
Amazon_sports
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 300, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 10, 'patience': 5, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'precision', 'ndcg'], 'k': [20], 'batch_size': 1024}, 'data': {'type': 'general_cf', 'name': 'sports', 'user_num': 35598, 'item_num': 18357}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 2, 'reg_weight': 1e-05, 'cl_weight': 0.01, 'temperature': 0.1, 'embedding_size': 64, 'augmentation': 'node_drop'}
Validation set [recall@20: 0.0980 ] Validation set [precision@20: 0.0055 ] Validation set [ndcg@20: 0.0444 ]
Test set [recall@20: 0.0963 ] Test set [precision@20: 0.0051 ] Test set [ndcg@20: 0.0437 ]
Tiktok
{'optimizer': {'name': 'adam', 'lr': 0.001, 'weight_decay': 0}, 'train': {'epoch': 600, 'batch_size': 4096, 'save_model': False, 'loss': 'pairwise', 'log_loss': False, 'test_step': 10, 'patience': 5, 'reproducible': True, 'seed': 2023, 'early_stop': True}, 'test': {'metrics': ['recall', 'ndcg'], 'k': [10, 20], 'batch_size': 1024}, 'data': {'type': 'general_cf', 'name': 'tiktok', 'user_num': 9308, 'item_num': 6710}, 'model': {'name': 'sgl', 'keep_rate': 0.5, 'layer_num': 3, 'reg_weight': 1e-05, 'cl_weight': 1.0, 'temperature': 0.8, 'embedding_size': 64, 'augmentation': 'node_drop'}
Validation set [recall@20: 0.0993 ] Validation set [ndcg@20: 0.0429 ]
Test set [recall@20: 0.0905 ] Test set [ndcg@20: 0.0379 ]
Results of SGL in MMSSL.
Amazon_sports recall@20: 0.0779 ndcg@20: 0.0361
Tiktok recall@20: 0.0603 ndcg@20: 0.0238
Sincerely.
您好,你的dcrec模型的实现的93、94行的sim函数没有定义。
I want to know model performance reproduced by SSLRec for General Collaborative Filtering on Yelp and Amazon dataset. If there are experimental results, could you show them to me? Thank you a lot.
您好,非常优秀的工作,贵组以后是否考虑增加social rec的模型,我看到DGNN还没有加入到这个框架中
您好,非常感谢团队共享的代码,并已经成功复现。但是复现过程中我发现多行为模型(例如CML,KMCLR)跑出来的评价指标值远低于原文。比如当数据集用IJCAI_15时. CML原文作者提供的项目代码NDGC@10能达到0.283,而我用贵团队的代码复现出来NDGC@10只有0.0224。作为一名小白,我想不明白为什么同一个模型同一个数据集且同样的评价指标会相差十倍,可否为我解答困惑??感激不尽。
您好!
我在看KG的模型, 发现eval似乎又个bug? KGTestDataset中只返回了u但是在eval中调用的是 <u,i>.
抱歉还没有细看评测的代码, 希望能拨冗解答, 祝好🙏
# dataset_kg.py
class KGTestDataset(data.Dataset):
def __init__(self, test_user_dict) -> None:
self.user_pos_lists = test_user_dict
self.test_users = np.array(list(test_user_dict.keys()))
def __getitem__(self, idx):
return self.test_users[idx]
# metrics.py
class Metric(object):
def eval_at_one_forward(self, model, test_dataloader):
for _, tem in enumerate(test_dataloader):
batch_u, batch_i = batch_data[:2]
另, 有一个问题想咨询, 我想做一个融合KG的Sequential模型, 不知道在本框架下, 关于数据集的格式上您是否有建议? (以及您是否了解相关的 KG+SRS 的数据?)
你好,不是很明白你数据集下面几个pkl的各列的含义,以及生成的kg.txt有三列,请问分别代表什么含义呢。能否大概给一个数据集的解释呢。望回复,十分感谢
非常感谢贵团队开源的代码!阅读之后收获颇丰!
我们希望能在其它更多的数据集上进行实验,但是苦于没有数据集文件生成的相关说明,如datasets/kg下各个数据集生成的详细过程,因此我无法生成其它数据集的相关文件。期待贵团队回复❤十分感谢!
你好,感谢代码的贡献。
想问一下关于multi behavior的tmall数据集中,train set和test是怎么划分的呢?
我想请教一下:保存模型的时候,同时保存优化器参数会不会更好?
模型配置文件:只有save_model为True和False的选项,没有关于优化器的。
train.py文件:我看到train.py文件中的save_model函数,里面未对优化器保存。
还是说保存优化器参数的代码在其他地方写了,只是我没看见。
希望解答一下 😃
Can the evaluation metrics in SSLRec support AUC-ROC?
作者您好:
我在复现多行为模型kmclr时使用pickle.load加载数据集文件出现如下报错:
Traceback (most recent call last):
File "C:\Users\zwl\Downloads\SSLRec-main\data_utils\data_handler_multi_behavior.py", line 49, in _load_data
data = pickle.load(fs)
ModuleNotFoundError: No module named 'scipy.sparse._csr'
而且我的scipy库版本也是1.7.3。请问您知道这是为什么吗?期待您的答复
您好,尝试运行social recommend的model但是 social_yelp/trust_mat.pkl文件找不到
您好!
请问SSL是否有相关模型+数据的实验结果 (最好能和原始论文的结果进行比较). 没有实验结果的话, 我们自己跑之后, 对于实验的指标没有可比较的对象~
祝好🙏
很棒的工作,非常感谢贵团队开源的代码。
我们希望能够在更多数据集上进行实验,然而,我们没有找到关于数据集文件的说明,例如datasets/social/epinions/下的所有文件,我们不知道应该如何基于自己的数据集生成框架所需的数据文件
非常期待贵团队的回复~
Hello, thank you very much for developing this framework. What is the version of the DGL library used by this framework? Looking forward to your reply。
Multi-behavior Self-supervised Learning for Recommendation ( https://arxiv.org/pdf/2305.18238.pdf ) from SIGIR'23
code link: https://github.com/Scofield666/MBSSL
这个工作是否会添加到整个框架中?
可以看到MBSSL通过整合SSL在多行为推荐中来进行优化, 非常具有代表性,
🚀 🚀
您好,我在使用SSLRec的SMIN模型时为何总是loss为nan,从第一代开始就是这样
您好! 请问如何下载KG的数据? 例如下面的mind数据?
Hello! Could you give the source how to get the KG data like mind
?
class DataHandlerKG:
def __init__(self) -> None:
if configs['data']['name'] == 'mind':
predir = './datasets/kg/mind_kg/'
elif configs['data']['name'] == 'amazon-book':
predir = './datasets/kg/amazon-book_kg/'
elif configs['data']['name'] == 'last-fm':
predir = './datasets/kg/last-fm_kg/'
您好,我使用您的框架运行epinions数据集时,会提示我没有'./datasets/social/epinions/category.pkl'这个文件
您好!非常感谢您的工作!
我注意到在《Debiased Contrastive Learning for Sequential Recommendation》论文中,也就是框架中的DCRec模型,没有提供数据集选择数据方法的详细信息。比如ML-20M数据集,包含2000万个数据点,但实验中只利用了其中的一小部分。因此,我想请求你的帮助。
请问您能否将实验中使用的筛选后的原始数据集以CSV格式分享或者告知一下数据划分方法,仅供学术交流之用。如果您能够提供数据集,我将非常感激,并会在我的研究中引用您的工作。如果您需要更多信息或者有其他问题,请随时与我联系。
Hello, thank you for your valuable work.
Problem description:
I found the following warning when running the project by python main.py --model LightGCN --device 'cpu'
:
/SSLRec/models/loss_utils.py:96: SyntaxWarning: "is" with a literal. Did you mean "=="? if reduce is 'mean':
Problem cause:
Starting from Python 3.8, using "is" and "is not" operators will raise a SyntaxWarning.
Problem solution:
Replace "is" and "is not" with "=="" and "!="" in the corresponding statements.
By the way, there is also another RuntimeWarning in /SSLRec/data_utils/data_handler_general_cf.py:47:
RuntimeWarning: divide by zero encountered in power d_inv_sqrt = np.reshape(np.power(degree, -0.5), [-1])
It seems that there is a zero in the data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.