ustcml / recstudio Goto Github PK

A highly-modularized and recommendation-efficient recommendation library based on PyTorch.

License: MIT License

Python 92.80% Jupyter Notebook 7.20%

collaborative-filtering ctr-prediction deep-learning factorization-machines graph-neural-networks knowledge-graph matrix-factorization pytorch recommender-system sequential-recommendation

recstudio's People

Contributors

Stargazers

Watchers

recstudio's Issues

Wrong entry names in some dataset config files

Inter_feat with ratings lower than low_rating_thres are filtered out in the following code.

RecStudio/recstudio/data/dataset.py

Lines 486 to 487 in 2bd40a8

 def _filter(self, min_user_inter, min_item_inter): 

 self._filter_ratings(self.config.get('low_rating_thres', None))

However, the corresponding entries are misspelled as low_rating_threshold in the following dataset config files:

amazon-beauty
amazon-books
amazon-electronics
gowalla
ml-10m
ml-20m
tmall
yelp

Why user_hist and user_count of val_data is not contained by val_data itself?

the uh and uc of trn_data are added to trn_data, val_data and tst_data.
the uh and uc of val_data are added to tst_data, but why not add them to val_data itself?

The code is at the end of _build()

InfoNCE loss

The temperature hyperparameter seems to be missing from InfoNCE loss function in RecStudio.

Where to find LSH Sampling

In your paper, ref "Table 5: Samplers in RecStudio", you mentioned that LSH based samplers have been implemented. But I cannot find them in your code.

Failed to open http://recstudio.org.cn/

无法打开官网，因此获取不到官方文档的信息。
正在使用这个库，有些内容不是很明白，希望能获得帮助，或者提供文档。

AE models output invalid results on part of datasets

I find MultiVAE and MultiDAE both output nan recall on ml-1m while performing correctly on ml-100k and gowalla. But BPR (MF model) and LightGCN (graph model) are normal on all three datasets. So I guess it may be a problem with AE models.

Feature names in different tables are not allowed to be the same.

When there are two same feature names in two different tables (e.g. one in the user information table and one in the item information table), there would be hidden problems.

For example, there is a column named category in both user.csv and item.csv. When I want to get the values of both two columns, the value of category would be overwritten

Out-Of-Memory Error in Validation Phase

Why there will be a sudden Cuda memory usage increase in the validation phase?
The batch size of the validation phase set in the config file is smaller than the training phase, but there will be a sudden Cuda memory usage increase in the validation phase, which causes the OOM Error.
Specifically, when the model runs the code in run.py，model.evaluate will cost more Cuda memory than model.fit, could you please help me solve this problem? Thanks for your attention.

	def _filter(self, min_user_inter, min_item_inter):
	self._filter_ratings(self.config.get('low_rating_thres', None))

ustcml / recstudio Goto Github PK

recstudio's People

Contributors

Stargazers

Watchers

Forkers

recstudio's Issues

Recommend Projects

Recommend Topics

Recommend Org