preferredai / cornac Goto Github PK

A Comparative Framework for Multimodal Recommender Systems

License: Apache License 2.0

Python 71.26% C++ 11.97% Jupyter Notebook 2.14% C 0.29% Cython 14.29% Dockerfile 0.05%

recommender-system recommendation-algorithms recommendation-engine matrix-factorization collaborative-filtering multimodal-learning recommendation-system multimodality

cornac's Introduction

Cornac

Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxiliary data (e.g., item descriptive text and image, social network, etc). Cornac enables fast experiments and straightforward implementations of new models. It is highly compatible with existing machine learning libraries (e.g., TensorFlow, PyTorch).

Cornac is one of the frameworks recommended by ACM RecSys 2023 for the evaluation and reproducibility of recommendation algorithms.

Quick Links

Installation

Currently, we are supporting Python 3. There are several ways to install Cornac:

From PyPI (recommended):
```
pip3 install cornac
```
From Anaconda:
```
conda install cornac -c conda-forge
```

From the GitHub source (for latest updates):

pip3 install Cython numpy scipy
pip3 install git+https://github.com/PreferredAI/cornac.git

Note:

Additional dependencies required by models are listed here.

Some algorithm implementations use OpenMP to support multi-threading. For Mac OS users, in order to run those algorithms efficiently, you might need to install gcc from Homebrew to have an OpenMP compiler:

brew install gcc | brew link gcc

Getting started: your first Cornac experiment

Flow of an Experiment in Cornac

import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import MF, PMF, BPR
from cornac.metrics import MAE, RMSE, Precision, Recall, NDCG, AUC, MAP

# load the built-in MovieLens 100K and split the data based on ratio
ml_100k = cornac.datasets.movielens.load_feedback()
rs = RatioSplit(data=ml_100k, test_size=0.2, rating_threshold=4.0, seed=123)

# initialize models, here we are comparing: Biased MF, PMF, and BPR
mf = MF(k=10, max_iter=25, learning_rate=0.01, lambda_reg=0.02, use_bias=True, seed=123)
pmf = PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123)
bpr = BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123)
models = [mf, pmf, bpr]

# define metrics to evaluate the models
metrics = [MAE(), RMSE(), Precision(k=10), Recall(k=10), NDCG(k=10), AUC(), MAP()]

# put it together in an experiment, voilà!
cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run()

Output:

	MAE	RMSE	AUC	MAP	NDCG@10	Precision@10	Recall@10	Train (s)	Test (s)
MF	0.7430	0.8998	0.7445	0.0548	0.0761	0.0675	0.0463	0.13	1.57
PMF	0.7534	0.9138	0.7744	0.0671	0.0969	0.0813	0.0639	2.18	1.64
BPR	N/A	N/A	0.8695	0.1042	0.1500	0.1110	0.1195	3.74	1.49

Model serving

Here, we provide a simple way to serve a Cornac model by launching a standalone web service with Flask. It is very handy for testing or creating a demo application. First, we install the dependency:

$ pip3 install Flask

Supposed that we want to serve the trained BPR model from previous example, we need to save it:

bpr.save("save_dir", save_trainset=True)

After that, the model can be deployed easily by running Cornac serving app as follows:

$ FLASK_APP='cornac.serving.app' \
  MODEL_PATH='save_dir/BPR' \
  MODEL_CLASS='cornac.models.BPR' \
  flask run --host localhost --port 8080

# Running on http://localhost:8080

Here we go, our model service is now ready. Let's get top-5 item recommendations for the user "63":

$ curl -X GET "http://localhost:8080/recommend?uid=63&k=5&remove_seen=false"

# Response: {"recommendations": ["50", "181", "100", "258", "286"], "query": {"uid": "63", "k": 5, "remove_seen": false}}

If we want to remove seen items during training, we need to provide TRAIN_SET which has been saved with the model earlier, when starting the serving app. We can also leverage WSGI server for model deployment in production. Please refer to this guide for more details.

Efficient retrieval with ANN search

One important aspect of deploying recommender model is efficient retrieval via Approximate Nearest Neighbor (ANN) search in vector space. Cornac integrates several vector similarity search frameworks for the ease of deployment. This example demonstrates how ANN search will work seamlessly with any recommender models supporting it (e.g., matrix factorization).

Supported Framework	Cornac Wrapper	Example
spotify/annoy	AnnoyANN	quick-start, deep-dive
meta/faiss	FaissANN	quick-start, deep-dive
nmslib/hnswlib	HNSWLibANN	quick-start, hnsw-lib, deep-dive
google/scann	ScaNNANN	quick-start, deep-dive

Models

The table below lists the recommendation models/algorithms featured in Cornac. Examples are provided as quick-start showcasing an easy to run script, or as deep-dive explaining the math and intuition behind each model. Why don't you join us to lengthen the list?

Year	Model and Paper	Type	Environment	Example
2024	Hypergraphs with Attention on Reviews (HypAR), docs, paper	Hybrid / Sentiment / Explainable	requirements, CPU / GPU	quick-start
2022	Disentangled Multimodal Representation Learning for Recommendation (DMRL), docs, paper	Content-Based / Text & Image	requirements, CPU / GPU	quick-start
2021	Bilateral Variational Autoencoder for Collaborative Filtering (BiVAECF), docs, paper	Collaborative Filtering / Content-Based	requirements, CPU / GPU	quick-start, deep-dive
	Causal Inference for Visual Debiasing in Visually-Aware Recommendation (CausalRec), docs, paper	Content-Based / Image	requirements, CPU / GPU	quick-start
	Explainable Recommendation with Comparative Constraints on Product Aspects (ComparER), docs, paper	Explainable	CPU	quick-start
2020	Adversarial Multimedia Recommendation (AMR), docs, paper	Content-Based / Image	requirements, CPU / GPU	quick-start
	Hybrid Deep Representation Learning of Ratings and Reviews (HRDR), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start
	LightGCN: Simplifying and Powering Graph Convolution Network, docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start
	Predicting Temporal Sets with Deep Neural Networks (DNNTSP), docs, paper	Next-Basket	requirements, CPU / GPU	quick-start
	Recency Aware Collaborative Filtering (UPCF), docs, paper	Next-Basket	requirements, CPU	quick-start
	Temporal-Item-Frequency-based User-KNN (TIFUKNN), docs, paper	Next-Basket	CPU	quick-start
	Variational Autoencoder for Top-N Recommendations (RecVAE), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start
2019	Correlation-Sensitive Next-Basket Recommendation (Beacon), docs, paper	Next-Basket	requirements, CPU / GPU	quick-start
	Embarrassingly Shallow Autoencoders for Sparse Data (EASEᴿ), docs, paper	Collaborative Filtering	CPU	quick-start
	Neural Graph Collaborative Filtering (NGCF), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start
2018	Collaborative Context Poisson Factorization (C2PF), docs, paper	Content-Based / Graph	CPU	quick-start
	Graph Convolutional Matrix Completion (GCMC), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start
	Multi-Task Explainable Recommendation (MTER), docs, paper	Explainable	CPU	quick-start, deep-dive
	Neural Attention Rating Regression with Review-level Explanations (NARRE), docs, paper	Explainable / Content-Based	requirements, CPU / GPU	quick-start
	Probabilistic Collaborative Representation Learning (PCRL), docs, paper	Content-Based / Graph	requirements, CPU / GPU	quick-start
	Variational Autoencoder for Collaborative Filtering (VAECF), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, param-search, deep-dive
2017	Collaborative Variational Autoencoder (CVAE), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start
	Conditional Variational Autoencoder for Collaborative Filtering (CVAECF), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start
	Generalized Matrix Factorization (GMF), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive
	Indexable Bayesian Personalized Ranking (IBPR), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive
	Matrix Co-Factorization (MCF), docs, paper	Content-Based / Graph	CPU	quick-start, cross-modality
	Multi-Layer Perceptron (MLP), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive
	Neural Matrix Factorization (NeuMF) / Neural Collaborative Filtering (NCF), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive
	Online Indexable Bayesian Personalized Ranking (Online IBPR), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive
	Visual Matrix Factorization (VMF), docs, paper	Content-Based / Image	requirements, CPU / GPU	quick-start
2016	Collaborative Deep Ranking (CDR), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start
	Collaborative Ordinal Embedding (COE), docs, paper	Collaborative Filtering	requirements, CPU / GPU
	Convolutional Matrix Factorization (ConvMF), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start, deep-dive
	Learning to Rank Features for Recommendation over Multiple Categories (LRPPM), docs, paper	Explainable	CPU	quick-start
	Session-based Recommendations With Recurrent Neural Networks (GRU4Rec), docs, paper	Next-Item	requirements, CPU / GPU	quick-start
	Spherical K-means (SKM), docs, paper	Collaborative Filtering	CPU	quick-start
	Visual Bayesian Personalized Ranking (VBPR), docs, paper	Content-Based / Image	requirements, CPU / GPU	quick-start, cross-modality, deep-dive
2015	Collaborative Deep Learning (CDL), docs, paper	Content-Based / Text	requirements, CPU / GPU	quick-start, deep-dive
	Hierarchical Poisson Factorization (HPF), docs, paper	Collaborative Filtering	CPU	quick-start
	TriRank: Review-aware Explainable Recommendation by Modeling Aspects, docs, paper	Explainable	CPU	quick-start
2014	Explicit Factor Model (EFM), docs, paper	Explainable	CPU	quick-start, deep-dive
	Social Bayesian Personalized Ranking (SBPR), docs, paper	Content-Based / Social	CPU	quick-start
2013	Hidden Factors and Hidden Topics (HFT), docs, paper	Content-Based / Text	CPU	quick-start
2012	Weighted Bayesian Personalized Ranking (WBPR), docs, paper	Collaborative Filtering	CPU	quick-start
2011	Collaborative Topic Regression (CTR), docs, paper	Content-Based / Text	CPU	quick-start, deep-dive
Earlier	Baseline Only, docs, paper	Baseline	CPU	quick-start
	Bayesian Personalized Ranking (BPR), docs paper	Collaborative Filtering	CPU	quick-start, deep-dive
	Factorization Machines (FM), docs, paper	Collaborative Filtering / Content-Based	Linux, CPU	quick-start, deep-dive
	Global Average (GlobalAvg), docs, paper	Baseline	CPU	quick-start
	Global Personalized Top Frequent (GPTop), paper	Next-Basket	CPU	quick-start
	Item K-Nearest-Neighbors (ItemKNN), docs, paper	Neighborhood-Based	CPU	quick-start, deep-dive
	Matrix Factorization (MF), docs, paper	Collaborative Filtering	CPU / GPU	quick-start, pre-split-data, deep-dive
	Maximum Margin Matrix Factorization (MMMF), docs, paper	Collaborative Filtering	CPU	quick-start
	Most Popular (MostPop), docs, paper	Baseline	CPU	quick-start
	Non-negative Matrix Factorization (NMF), docs, paper	Collaborative Filtering	CPU	quick-start, deep-dive
	Probabilistic Matrix Factorization (PMF), docs, paper	Collaborative Filtering	CPU	quick-start
	Session Popular (SPop), docs, paper	Next-Item / Baseline	CPU	quick-start
	Singular Value Decomposition (SVD), docs, paper	Collaborative Filtering	CPU	quick-start, deep-dive
	Social Recommendation using PMF (SoRec), docs, paper	Content-Based / Social	CPU	quick-start, deep-dive
	User K-Nearest-Neighbors (UserKNN), docs, paper	Neighborhood-Based	CPU	quick-start, deep-dive
	Weighted Matrix Factorization (WMF), docs, paper	Collaborative Filtering	requirements, CPU / GPU	quick-start, deep-dive

Resources

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Citation

If you use Cornac in a scientific publication, we would appreciate citations to the following papers:

Cornac: A Comparative Framework for Multimodal Recommender Systems, Salah et al., Journal of Machine Learning Research, 21(95):1–5, 2020.

@article{salah2020cornac,
  title={Cornac: A Comparative Framework for Multimodal Recommender Systems},
  author={Salah, Aghiles and Truong, Quoc-Tuan and Lauw, Hady W},
  journal={Journal of Machine Learning Research},
  volume={21},
  number={95},
  pages={1--5},
  year={2020}
}

Exploring Cross-Modality Utilization in Recommender Systems, Truong et al., IEEE Internet Computing, 25(4):50–57, 2021.

@article{truong2021exploring,
  title={Exploring Cross-Modality Utilization in Recommender Systems},
  author={Truong, Quoc-Tuan and Salah, Aghiles and Tran, Thanh-Binh and Guo, Jingyao and Lauw, Hady W},
  journal={IEEE Internet Computing},
  year={2021},
  publisher={IEEE}
}

Multi-Modal Recommender Systems: Hands-On Exploration, Truong et al., ACM Conference on Recommender Systems, 2021.

@inproceedings{truong2021multi,
  title={Multi-modal recommender systems: Hands-on exploration},
  author={Truong, Quoc-Tuan and Salah, Aghiles and Lauw, Hady},
  booktitle={Fifteenth ACM Conference on Recommender Systems},
  pages={834--837},
  year={2021}
}

License

Apache License 2.0

cornac's People

Contributors

Stargazers

Watchers

Forkers

mindis llkute jinygao saghiles lytrieu guojingyao codacy-badger andrew-dungle gallellouche binhtrantt huang-junjia wubinzzu kman0 anhduc2203 volinh marcotran barseghyanartur shannonyu scaactk benjamesbabala microw allensmile awesome-archive cvnlp maryhos thalesfsp ares2013 jamim redhat6 chenzheng128 linksboy zehsilva luwen0121 sorardev iamsile crystal22 aobo-y nikogithubtest whelp-inc iamyourboss hoainam25699 tedsiweiliu ashleyxin harrytsz carmanzhang wahyudierwin juneqiong joseveloso1501 guipengxu hlhnguyen zshwuhan diegoopazo24 aminkhod khajaphysist amirj giovinnob vinhsinhnb95 jasonshere aznoryusof ruihongqiu beesitech nachigonz echo-valor yilmazerhakan hinriksnaer peternara js-ts epu-diepcd moxplayer teichinger vlainic aidenzich xurong-liang xuey99 kentchun33333 charleoy jameszhou-gl yesthing chris-boson tiamoikl makraimit abdelghaniazri leelige tiffen romakoks tituslhy xenialll almirb alitrack ayo-faks dtbinh kiminh indexxlim usadetroit georgeguo-cn mashemat soni-h graphrecommendation bekyilma theisjendal

cornac's Issues

[How to change the optimization of BPR from max number of iteration to a condition]

I would like to change the SGD optimization procedure in BPR such that instead of max number of iterations, based on a condition we can stop BPR training. For the condition, I need to check norm of p_u and q_i and do a comparison on their norm value. Here is part of the code in recom_bpr.pyx I have identified:

      with trange(self.max_iter, disable=not self.verbose) as progress:
            for epoch in progress:
                correct, skipped = self._fit_sgd(rng_pos, rng_neg, num_threads,
                                                user_ids, X.indices, neg_item_ids, X.indptr,
                                                self.u_factors, self.i_factors, self.i_biases)
                print(self.i_factors)
                progress.set_postfix({
                    "correct": "%.2f%%" % (100.0 * correct / (len(user_ids) - skipped)),
                    "skipped": "%.2f%%" % (100.0 * skipped / len(user_ids))
                })

I put a print(self.i_factors) function to check the value of i_factors. However, nothing gets printed.
Could you give me some guidelines on how I can reach my goal based on the code of BPR written by cornac?
Thank you very much.
YAS

Other Comments

[BUG] No module named 'cornac.utils.fast_sparse_funcs'

Description

cannot import cornac

In which platform does it happen?

MacOSX

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

[ASK] Image-based Dataset Interactions

Description

Hi,

Thank you for this great repo.

I have a few questions about the image-based dataset (Amazon Clothing and Tradesy) preprocessing.

(1) What's the chosen backbone to extract the image feature for these two datasets?

(2) Why are the numbers of users, items and interactions so small for Amazon Clothing (5,377 users | 3,393 items | 13,689 interactions), which has 278,677 ratings for the rich-features subset according to the source website?

(3) How could I change the explicit feedback of Amazon Clothing to the implicit feedback?

Thank you in advance for your reply!

Best regards,

Other Comments

[ASK] Sampling truly negative samples in implicit methods

Hi,

Thanks for this great package. I was wondering in the case where I have negative feedback (dislikes), how would I tweak the sampling to make sure the truly negative samples are selected?

Any of your suggestions would be greatly appreciated!

[ASK] How to get learned embedding from GMF model

Is there a way to get user and item embeddings from a trained GMF model ?

Error when fitting MCF

I'm getting an error when trying to use this model:
https://cornac.readthedocs.io/en/latest/models.html#module-cornac.models.mcf.recom_mcf

import pandas as pd
from scipy.sparse import coo_matrix
from cornac.datasets.movielens import load_feedback
from cornac.data.dataset import Dataset
from cornac.models.mcf.recom_mcf import MCF

def coo_to_cornac(X):
    return Dataset(
        X.shape[0], X.shape[1],
        {i:i for i in range(X.shape[0])},
        {i:i for i in range(X.shape[1])},
        (X.row, X.col, X.data),
        seed=1)

df = pd.DataFrame(load_feedback())
df.columns = ["UserId", "ItemId", "Rating"]
X = coo_matrix((df.Rating, (df.UserId.astype(int), df.ItemId.astype(int))))
dt = coo_to_cornac(X)
model = MCF(k=15, max_iter=15, lamda=0.05,
            verbose=False, seed=1)
model.fit(dt)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-cf5cfe50c5ba> in <module>
     19 model = MCF(k=15, max_iter=15, lamda=0.05,
     20             verbose=False, seed=1)
---> 21 model.fit(dt)

~/anaconda3/envs/py3/lib/python3.7/site-packages/cornac/models/mcf/recom_mcf.py in fit(self, train_set, val_set)
    130             # item-item affinity network
    131             map_iid = train_set.item_indices
--> 132             (net_iid, net_jid, net_val) = train_set.item_graph.get_train_triplet(
    133                 map_iid, map_iid
    134             )

AttributeError: 'Dataset' object has no attribute 'item_graph'

[ASK] ValueError: could not convert string to float: 'e

Description

Hello,

I am trying to use cornac with BookCrossing dataset. I am facing a strong issue which I did not have with other datasets.

hyperParam = 15
ds_name = 'BKcross'

path_to_samples = '/home/yas/PycharmProjects/cornac/datasets/BookCrossing/samples/*.csv'
load_ds_addr = '/home/yas/PycharmProjects/cornac/datasets/BookCrossing/BX-Book-Ratings.csv'
save_res_addr = '/home/yas/PycharmProjects/cornac/results/resultsUserLevel_'
save_feat_addr = '/home/yas/PycharmProjects/cornac/results/features_'
simple_headers = ["UserId","ItemId","Rating"]

userData_df = pd.read_csv(load_ds_addr, sep=';', skiprows=0)
userData_df.columns = simple_headers

print(userData_df.head())
print(userData_df.shape)

ratio_split = RatioSplit(data=userData_df, test_size=0.2, rating_threshold=4.0, exclude_unknowns=False, seed=123)
mf = cornac.models.MF(k=hyperParam, max_iter=25, learning_rate=0.01, lambda_reg=0.02, use_bias=True, seed=123)
ndcg_100 = cornac.metrics.NDCG(k=100)

The error I get is the following:


   UserId      ItemId  Rating
0  276725  034545104X       0
1  276726  0155061224       5
2  276727  0446520802       0
3  276729  052165615X       3
4  276729  0521795028       6
(1149780, 3)
Traceback (most recent call last):
  File "/home/yas/PycharmProjects/cornac/yd/main_BookCrossing.py", line 52, in <module>
    ratio_split = RatioSplit(data=userData_df, test_size=0.2, rating_threshold=4.0, exclude_unknowns=False, seed=123)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/ratio_split.py", line 61, in __init__
    self._split()
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/ratio_split.py", line 105, in _split
    self.build(train_data=train_data, test_data=test_data, val_data=val_data)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/base_method.py", line 584, in build
    self._build_datasets(train_data, test_data, val_data)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/base_method.py", line 462, in _build_datasets
    self.train_set = Dataset.build(
  File "/home/yas/PycharmProjects/cornac/cornac/data/dataset.py", line 357, in build
    r_values.append(float(rating))
ValueError: could not convert string to float: 'e'
<!--- Describe your general ask in detail -->

It seems the error, is mentioning in the column Rating, there is a non-integer value. I however, could not verify that column Rating has a non-integer value. Any idea why the problem occurs and how it could be resolved? thank you

[BUG] Unable to import in Anaconda 3 on macOS

Description

I was trying to run import cornac in a jupyter notebook under an anaconda environment, but an error was returned.

In which platform does it happen?

I am running on macOS Catalina 10.15.5. My jupyter notebook was running on Anaconda 3. The environment I was running on has the following package:

backcall                  0.2.0                    pypi_0    pypi
ca-certificates           2020.7.22                     0  
certifi                   2020.6.20                py37_0  
cornac                    1.7.1                    pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
ipykernel                 5.3.4                    pypi_0    pypi
ipython                   7.18.1                   pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
jedi                      0.17.2                   pypi_0    pypi
jupyter-client            6.1.7                    pypi_0    pypi
jupyter-core              4.6.3                    pypi_0    pypi
libcxx                    10.0.0                        1  
libedit                   3.1.20191231         h1de35cc_1  
libffi                    3.3                  hb1e8313_2  
ncurses                   6.2                  h0a44026_1  
numpy                     1.19.2                   pypi_0    pypi
openssl                   1.1.1h               haf1e3a3_0  
parso                     0.7.1                    pypi_0    pypi
pexpect                   4.8.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pip                       20.2.2                   py37_0  
prompt-toolkit            3.0.7                    pypi_0    pypi
ptyprocess                0.6.0                    pypi_0    pypi
pygments                  2.7.1                    pypi_0    pypi
python                    3.7.9                h26836e1_0  
python-dateutil           2.8.1                    pypi_0    pypi
pyzmq                     19.0.2                   pypi_0    pypi
readline                  8.0                  h1de35cc_0  
scipy                     1.5.2                    pypi_0    pypi
setuptools                49.6.0                   py37_1  
six                       1.15.0                   pypi_0    pypi
sqlite                    3.33.0               hffcf06c_0  
tk                        8.6.10               hb0a8c7a_0  
tornado                   6.0.4                    pypi_0    pypi
tqdm                      4.50.0                   pypi_0    pypi
traitlets                 5.0.4                    pypi_0    pypi
wcwidth                   0.2.5                    pypi_0    pypi
wheel                     0.35.1                     py_0  
xz                        5.2.5                h1de35cc_0  
zlib                      1.2.11               h1de35cc_3

How do we replicate the issue?

Run:

conda create --name new_env python=3.7 
conda activate new_env
pip install cornac

Under the new_env environment, open a python shell and run
import cornac
An error message is returned:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/__init__.py", line 16, in <module>
    from . import data
  File "/opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/data/__init__.py", line 18, in <module>
    from .text import TextModality, ReviewModality
  File "/opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/data/text.py", line 27, in <module>
    from ..utils import normalize
  File "/opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/utils/__init__.py", line 20, in <module>
    from .fast_dot import fast_dot
ImportError: dlopen(/opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/utils/fast_dot.cpython-37m-darwin.so, 2): Library not loaded: /usr/local/opt/gcc/lib/gcc/9/libgomp.1.dylib
  Referenced from: /opt/anaconda3/envs/new_env/lib/python3.7/site-packages/cornac/utils/fast_dot.cpython-37m-darwin.so
  Reason: image not found

Expected behavior (i.e. solution)

The import statement should run without error

Other Comments

[ASK] How to display an exact recommendation result for a user?

Description

Hello, may I ask how can I display the recommendation result for a user? Like, if I set a top-10 recommendation, and I want to display what 10 items are recommended for user[0], how should I achieve this?

Best.

Other Comments

[BUG] Unable to install Cornac on local M1 Mac, running miniforge 3

Description

Unable to install Cornac via all 3 methods, conda-forge install, pip install and Cython install for m1 Mac using miniforge 3.
Please refer to the 3 attached text files for the respective logs
Log for cython install.txt
Log for pip install cornac.txt
Log for conda-forge install.txt

In which platform does it happen?

M1 Mac running miniforge 3, Conda version 4.10.3

How do we replicate the issue?

Run: pip install Cornac
Run: conda install
Run:
pip3 install Cython
git clone https://github.com/PreferredAI/cornac.git
cd cornac
python3 setup.py install

Expected behavior (i.e. solution)

Other Comments

[ASK] How to use BPR with given item-item similarities

Description

Hi,

I have already computed item-item similarities, but how can I utilize it into a BPR model? Probably this is different from FeatureModality implemented in the repository.

[ASK] Why are the results from models so much lower than actual implementations of the model?

Description

For example, NCF results come out as low as 0.1377 (NDCG@10) | 0.1215 (Precision@10) | 0.1033 (Recall@10) whereas the actual paper shows in 0.40s for NDCG. Even other NCF implementations show the evaluation near 0.40s

Other Comments

[ASK] What's the difference between MF implementation in Cornac with Spotlight?

Description

Thanks for this amazing tool!
There are different MF models in Cornac including MF, NMF, PMF, WMF.
Which one is the most similar to Spotlight Explicit model?
With the same dataset, the obtained results from Spotlight Explicit model is not comparable with Cornac MF models, i.e. based on the Spotlight's implementation MF model is similar to the popularity baseline but based on Cornac's implementation MF model's performance is much lower than the popularity baseline.
Would you please help me to better understand what's the difference?

Other Comments

[ASK] How to use user defined metadata features associated with a dataset

Description

Is there any way to import user features of dataset in cornac? Let's say given the ml100K dataset, we want to split the dataset into two based on user metadata (e.g., user gender). How is this achieved in cornac?

No module named 'cornac.utils.fast_sparse_funcs' when trying to run first.example under pycharm

Description

Hello,
I use Ubunto version 20.04, and installed cornac using the pip3 option (under venv) and conda (under conda venv), and in both case, I am receiving this error when I want to run cornac, first_example.py:

Traceback (most recent call last):
  File "/home/yas/PycharmProjects/cornac/examples/first_example.py", line 17, in <module>
    import cornac
  File "/home/yas/PycharmProjects/cornac/cornac/__init__.py", line 16, in <module>
    from . import data
  File "/home/yas/PycharmProjects/cornac/cornac/data/__init__.py", line 18, in <module>
    from .text import TextModality, ReviewModality
  File "/home/yas/PycharmProjects/cornac/cornac/data/text.py", line 27, in <module>
    from ..utils import normalize
  File "/home/yas/PycharmProjects/cornac/cornac/utils/__init__.py", line 16, in <module>
    from .common import validate_format
  File "/home/yas/PycharmProjects/cornac/cornac/utils/common.py", line 21, in <module>
    from .fast_sparse_funcs import (
ModuleNotFoundError: No module named 'cornac.utils.fast_sparse_funcs'

Let me add that I use PyCharm. Any idea why the issue emerged and how it could be handled?

[ASK] Help on providing user/item features in BPR

Description

Requesting help on how to pass user/item features to BPR. I have several numeric and categorical features regarding the user and item that I would like the model to consider.

Other Comments

[ASK] About Image-based datasets

Description

Hello, first of all I would like to thank you the important work in developing this framework.
I am researching on Image-based recommendation, and my first step with your tool was to load amazon_clothing.
I am noticing that when I call load_image(), the returned data is not image information. Instead, the obtained data seems to be latent factors (4096 columns per product) extracted from some deep model. This fact also happens for tradesy dataset.

I would like to confirm that I am right regarding this concern.

how to prepare and add amazon datasets to cornac

Description

hi
how can i add amazon boys and grirls clothing dataset to cornac

Other Comments

[ASK] Discovered a strange behavior on ranking metrics

Description

I tried to evaluate some MF models (and UserKNN) with ranking metrics on the MovieLens dataset (100k). The results do not look as expected (first table), so I implemented the evaluation step with sklearn.metrics and the output looks much more realistic (second table). The implementation can be found here: https://gist.github.com/tpoerschke/1d823e1b9dbc0f290c763854e9fa2a52.

The implementation of my evaluation should be similar to that in Cornac. The metrics are evaluated per user and then averaged over all users.

Am I missing something here? Or is this a bug?

TEST:
...
        |  F1@-1 | Precision@-1 | Recall@-1 | Train (s) | Test (s)
------- + ------ + ------------ + --------- + --------- + --------
PMF     | 0.0143 |       0.0073 |    1.0000 |    6.4883 |   0.3277
NMF     | 0.0143 |       0.0073 |    1.0000 |    1.4425 |   0.4025
SVD     | 0.0143 |       0.0073 |    1.0000 |    0.4447 |   0.4037
UserKNN | 0.0143 |       0.0073 |    1.0000 |    0.1430 |   5.7097


CUSTOM EVALUATION
Model      F1 score    Precision    Recall    Train (s)
-------  ----------  -----------  --------  -----------
PMF          0.3498       0.3226    0.4506       6.8282
NMF          0.3243       0.3334    0.3707       1.4856
SVD          0.3423       0.3158    0.4398       0.3843
UserKNN      0.2628       0.2688    0.3319       0.0957

System

OS: macOS Catalina (10.15.7)
Python: 3.8.3 [Clang 10.0.0 ] :: Anaconda, Inc. on darwin

Other comments

[ASK] Question about number of users & items

Description

I am trying to figure out how the number of items/users/ratings for the Amazon datasets are so few.

For example, the number of users/items in the Office subset, without processing, is: 3404914 users, 306800 items with 5581313 interactions. After keeping only items that have been rated 5 items, and users that have rated at least 5 items, I get 125812 users, with 79508 items and 999982 items. Even using the 5-core data, the number of users, items and events are far higher than the ones you report in the tables.

This is much much more than what you report in your tables in the documentation. I'm curious to know your how you do the pre-processing for the datasets. It would also be great if you can provide these scripts, of course. Thanks in advance!

[ASK]The NDCG value seems a bit strange

Description

Hi，

Why are the NDCG values calculated by cornac.metrics.NDCG mostly at [0.01,0.1], while the results calculated by many recommendation system papers are mostly at [0.1,1]?
Are the two methods of calculating NDCG different?
What should I do if I want to reproduce the results of the paper?

BTW, cornac is awesome!

Other Comments

[ASK] Per-item metrics

Hello, thanks for this framework and for your work.
Is there a way to get / calculate per-item metrics (same as metric_user_results but for items)?

[Expand my own dataset?]

Description

hi, thanks for developing this outstanding tools, and could use my own dataset instead of using the default dataset? If it is ok, how I replace the dataset?

Other Comments

[ASK] How to interpret the rating in the UIR triplets

Description

A fairly high-level question.

So, I have data of user-interactions with items. These are either click-tuples: (user, item, has_clicked) or sale-tuples (user, item, has_bought). To combine these in the dataset, I want to use the "rating" in the cornac UIR triplet, but I am not sure how to interpret it and which values to choose. So here a couple of questions with regard to that:

If I give a rating of 1 to clicks and a rating of 2 to sales, will this reach to goal of combining them both as positive ratings (but one stronger than the other), or will it see the clicks as negative (just like 1-5 stars in some of the sample datasets)?
Is it possible to give negative ratings, e.g. by rating -1, or does it still see that as a relative rating (less positive than e.g. 1, but still positive nonetheless)?
Could you give a high level guidance how to choose my rating values?

Currently mainly working with BPR and VBPR but probably going to try more models later

[BUG] The website is down.

Description

I cannot access https://cornac.preferred.ai/

Thanks.

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

[BUG] AttributeError: module 'cornac.datasets' has no attribute 'filmtrust'

Description

Hi again, really sorry to open up another ticket so soon after, but Cornac doesn't recognize Filmtrust as an existing dataset.

When executing:
ds = cornac.datasets.filmtrust.load_feedback()

I get the following error:
AttributeError: module 'cornac.datasets' has no attribute 'filmtrust'

Filmtrust seems to be absent from the datasets' init file (on the other hand, for instance, netflix and movielens are present).

Could you please fix this ? Thank you very much in advance. :)

In which platform does it happen?

OS:
Ubuntu 20.04

Packages:
Python 3.6.4
Tensorflow 1.15
Cython 0.29.21
PyTorch 1.4

How do we replicate the issue?

Run first_example.py but replace movielens by filmtrust.

Expected behavior (i.e. solution)

The experiment should run successfully and output the results.

Other Comments

[BUG] Dataset object has no attribute 'total_users'

Description

Fitting BPR model throws an error about missing total_users attribute in Dataset object.

request for source of "cornac.models.knn.similarity.compute_similarity"

how can I view and change the source of "cornac.models.knn.similarity.compute_similarity"

[ASK]Customize the output format

Description

Hi, I would ask a minor question regarding the experiment output. Since I'm taking the accuracy of 6 decimal places, while the default accuracy is 4 places, is there any approach that I could support the new accuracy?

Other Comments

[ASK]Do we have a method to extract the trained embeddings?

Description

Do we have a method to extract the trained embeddings? I want to use the embeddings to do retrieval in production by using tools like faiss.
Anyway, this project is a neat work, thanks for your contributions!

[ASK] How to rank given items?

Description

Hi, hope that you are doing well.

I want to ask about ranking. Suppose that there are 100 items in the training set and we want to rank items 0, 12, 23, and 79. How can I do that?

Thank you.

Other Comments

[ASK] cornac on papers abstracts

Hi,

Hope you are all well !

I am looking for a recsys system and just wanted to ask you if your approach could apply to recommend arXiv papers.

More precisely, I have a dataset of 300k abstracts classified in tasks, subjects.
(available here https://paper2code.com/public/suggest_dump.txt.tar.gz)

How can I train a model based on this dataset ? Do you have a flask server for adding new entries and making queries ?

Ps. Also, do you have a dockerized version of cornac ?

Cheers,
X

'NoneType' object has no attribute 'get_train_triplet'

Description

Hi, I've been trying to use Cornac to output benchmarks with all the available models.
When I run your "first experiment", it runs fine as is.

But, as soon as I add another model (here for example I added SBPR), I get errors.

The most common one is "'NoneType' object has no attribute 'get_train_triplet'".
Full output is as follows:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
51
52 # put it together in an experiment, voilà!
---> 53 cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run()

~/anaconda3/envs/recsysenv/lib/python3.6/site-packages/cornac/experiment/experiment.py in run(self)
128 metrics=self.metrics,
129 user_based=self.user_based,
--> 130 show_validation=self.show_validation,
131 )
132

~/anaconda3/envs/recsysenv/lib/python3.6/site-packages/cornac/eval_methods/base_method.py in evaluate(self, model, metrics, user_based, show_validation)
656
657 start = time.time()
--> 658 model.fit(self.train_set, self.val_set)
659 train_time = time.time() - start
660

cornac/models/sbpr/recom_sbpr.pyx in cornac.models.sbpr.recom_sbpr.SBPR.fit()

cornac/models/sbpr/recom_sbpr.pyx in cornac.models.sbpr.recom_sbpr.SBPR._prepare_social_data()

AttributeError: 'NoneType' object has no attribute 'get_train_triplet'`

I have already tried providing a val_size as well. In that case I wrote "test_size=0.1, val_size=0.1". The error is the same.

I feel like this is probably a versioning issue.

Thank you very much in advance for your assistance.

In which platform does it happen?

OS:
Ubuntu 20.04

Packages:
Python 3.6.4
Tensorflow 1.15
Cython 0.29.21
PyTorch 1.4

How do we replicate the issue?

In first_example.py, add the SBPR model in the corresponding list, then run it.

Expected behavior (i.e. solution)

Experiment should indeed run successfully and output the results.

Other Comments

My goal here is to manage to test as many models as possible at the same time to compare their performances, and to be able to do that on several datasets (ideally movielens, netflix, and filmtrust).

[SUGGESTION] Loade model with save file path doesn't have train_set

Description

Hi there
First, thanks for Cornac source code

When I load model, the absence of model's train_set make mdoel.score(), model.rank() not work at some algorithm like 'MF', 'BPR' ... ect.

this situation raise because of additional item index confirmation

def score():
 ......
 ......
        unk_user = self.train_set.is_unk_user(user_idx)

but when use loaded model, many times original train dataset not exist.
It makes model save/load function uselss , or user must have big original dataset.

Expected behavior with the suggested feature

Model's u_factors and i_factor already have UID and IID
If unk_user = self.train_set.is_unk_user(user_idx) this code revised with i_factors not train_set,
loaded model could be used with additional dataset load.

Other Comments

[ASK] Using early stopping

Description

I am comparing different model performance and I am wondering how to implement early stopping for all the models ?

Can you give an example on how to use it, in particular with the Experiment class ?

I have seen that early stop is implemented in MF model but not for other models.

Thank you for this framework.

Implicit dataset with sensitive attributes

Hello,

I have a few feature question related to cornac:

As for the evaluation metrics, how could we use cornac to:

Find negative items, i.e., for a given user-item rating predicted, find ratings that fall below threshold let's say below 3 (for 5 Likert-scale). Could you suggest a piece of code that helps e.g., find the number of negative items in the user profile?
It seems cornac is not quipped with beyond-accuracy metrics such as novelty, diversity, and catalog coverage. Any idea how they could be implemented?

Thank you for your feedbacks.

[ASK] question about cornac with implicit dataset

Description

Hi,

When I used cornac with implicit datset, some evaluations make me puzzled(the "MAE" and "RMSE" were always 0, is it right or a bug?). Note that the implicit dataset in my experiments only have the positive feedback(the ratings is 1) while the negative feedbacks have beed removed.

when the epochs=20, the results were shown as follow:

when the epochs=40, the results were shown as follow:

when the epochs=60, the results were shown as follow:

the codes have been used were shown as follow:

Thanks for your reading and your excellent jobs in this tool!

In which platform does it happen?

How do we replicate the issue?

Expected behavior (i.e. solution)

Other Comments

[H] How to see per user evaluation result

Description

Thank you for putting up cornac, it is a very useful resource for pushing research.

I just wanted to ask if there is a way to see the per-user evaluation score for RS models? Let's assume, we do not want to average over user-obtained NDCGs but we want to compute a PDF on the results. How can we get user-related evaluation scores in order to aggregate the results in a specific manner (beyond simple average)? Thank you for your input.

[ASK] How to use parallel technology to accelerate when searching for hyperparameters？

Description

Just like the n_jobs parameter of grid search in sklearn.

Other Comments

[BUG] saving loading models broken: 'BiVAECF' object has no attribute 'train_set'

Description

Saving BiVAECF and then loading same model leads to the following error when running inference: 'BiVAECF' object has no attribute 'train_set'

In which platform does it happen?

Ubuntu 18.04.5 LTS
Python 3.6.9
Cornac 1.10.0

How do we replicate the issue?

from random import choice, random

from cornac.models import BiVAECF

from cornac.data import Dataset
import os

fake_users = list('abcdefghijklmnop')
fake_items = list('ABC')

fake_data = []
for user in fake_users:
    for item in fake_items:
        score = int(random() > 0.75)
        if score >= 1:
            fake_data.append((user, item, 1))


model1 = BiVAECF(n_epochs=100, verbose=True)

data = Dataset.build(fake_data)

trained_model = model1.fit(train_set=data)
print(trained_model.score(0))
trained_model.save('./testdir')
filename = list(os.listdir('./testdir/BiVAECF'))[0]
loaded_model = BiVAECF.load('./testdir/BiVAECF/'+filename)
print(loaded_model.score(0))
print(loaded_model.score(0) == trained_model.score(0))

yields:

 line 42, in <module>
    print(loaded_model.score(0))
  File "virtualenvs/recommender/lib/python3.6/site-packages/cornac/models/bivaecf/recom_bivaecf.py", line 213, in score
    if self.train_set.is_unk_user(user_idx):
AttributeError: 'BiVAECF' object has no attribute 'train_set'

Expected behavior (i.e. solution)

loading and saving should work for all models (not just BiVAECF)
train_set should be accessible in all models (not just BiVAECF). If I load in another model I need the train set to convert item and user ids to its UID indexes and this will be hard to keep consistent if I have to save this myself separately.

Other Comments

[ASK] When a user rates an item multiple times, how can I merge them into one rating by averaging the duplicate rating scores.

Description

When a user rates an item multiple times, how can I merge them into one rating by averaging the duplicate rating scores.

Other Comments

[ASK] Multithreading in BiVAE

Description

I have recently started to experiment with the BiVAE model. I noticed that the training of this model appears to be multithreaded when it is trained on CPU only. Could you please advise if there is a way to control or know the number of threads used by the model? Is there a default number of threads or a parameter that controls this? Thank you.

[BUG] "ValueError: could not convert string to float: 'e' " on train_data devoid of any alphabetical characters

Description

Good day, it is me again.

I have been attempting to build a BaseMethod() object using "train_data" and "test_data" I split myself, so that I may then include it in an experiment.
Both "train_data" and "test_data" dataframes are in the "UIR" format, and contain exclusively numerical values in all the columns.

To build said BaseMethod() object, I am using the class method from_splits() and providing my train/test dataframes.

Despite my best attempts, I keep getting the same error: "ValueError: could not convert string to float: 'e' "

I have tried:

Looking for rows containing 'e' (there are none, dataset is clean - I use the base Filmtrust dataset provided in Cornac and only perform the split myself),
Converting ratings column to float beforehand, even though I couldn't locate any faulty cell,
Switching from class method from_splits() to build() in an attempt to emulate the end of the RatioSplit() class, but the same error appears.

Thank you very much in advance for any help or advice you could provide.

Mae

In which platform does it happen?

Ubuntu 20.04.1 LTS

How do we replicate the issue?

Load ~/.cornac/filmtrust/ratings.txt with pandas' read_csv.
Split train & test data using this function:

def improved_split_train_test(df, ratio=0.8, timestamp=None, user_id=None):

# If no timestamp or no user_id, do a classic random split
if timestamp is None or user_id is None:
 	df['_is_train'] = np.random.uniform(0,1,len(df)) <= ratio
else:
	# Sort timestamps in ascending order then group by user_id
	df.sort_values([timestamp, ], ascending=True, inplace=True)
	grouped_df = df.groupby([user_id, ])
	# Compute rank over timestamp by user_id groups
	df_rank = grouped_df[timestamp].rank()
	# Compute size of user_id groups...
	df_size = grouped_df.size().to_frame('_size')
	# Add _rank and _size columns to the original dataframe
	df['_rank'] = df_rank
	df = df.join(df_size, on=user_id, how='left')
	# Select train is rank <= ratio * size
	df['_is_train'] = df['_rank'] <= ratio * df['_size']
	df.drop(['_rank', '_size',], axis=1, inplace=True)

# Create train and test depending on _is_train flag
train = df[df['_is_train']==True].drop(['_is_train',], axis=1).reset_index(drop=True)
test = df[df['_is_train']==False].drop(['_is_train',], axis=1).reset_index(drop=True)
return train, test

Build split object through BaseMethod() (error will pop through that last line) :
split = eval_methods.BaseMethod()
split.from_splits(train_data=train_data, test_data=test_data)

Expected behavior (i.e. solution)

The from_splits() class method should return a BaseMethod() object, ready for inputting in Experiment().

Edit 1: I investigated the problem a bit more, and found that when "rating" column has a value of 'e', it comes from cornac/data/dataset.py , line 357.

When printing the values at that stage of execution:

idx = 0
uid = 'u'
iid = 's'

So we have: 0, 'u', 's', 'e' ....
It's probably the header: "0 user_id item_id rating" that wasn't properly skipped, and is now treated as actual data.

I don't know where I can indicate to skip that first line though.

Edit2: I checked what the values were further down the loop:

idx = 1
uid = 'i'
iid = 't'
rating = 'e'

then:

idx = 2
uid = 'r'
iid = 'a'
rating = 't'

Finally, when idx == 3 it shows up empty.

It's as if the dataframe has been pivoted to the side. Really strange !

[ASK] Save and load a Cornac model

Description

I'd like some help on how to save a trained Cornac model and load it for subsequent ratings/rankings. The base recommender class does not appear to have methods to save and load models. How is this achieved in the Cornac framework?

Also if the underlying implementations use Tensorflow it would be useful to save the summaries so we can load and analyze in Tensorboard.

Other Comments

[FEATURE] [ASK] Manual input for training-testing and maintenance in local directory

Hi there,
First of all, let me applaud your work. It is beautiful in all its forms.
I just wanted to post (or consult) on the feasibility of a proposal.

Description

Training with a user-designed partition to test on 'another' test database and "receive and save" the results of the analyzed test set.

Expected behavior with the suggested feature

I imagine something like this:

As the bases are downloadable in cornac format, the complete path should be something like this
from cornac.eval_methods import RatioSplit

train_set = cornac.data.Dataset(num_users = int, num_items = int, uid_map = OrderDict_train, iid_map = OrderDict_train, uir_tuple = tuple, ...)
val_set = cornac.data.Dataset(num_users = int, num_items = int, uid_map = OrderDict_val, iid_map = OrderDict_val, uir_tuple = tuple, ...)
test_set = cornac.data.Dataset(num_users = int, num_items = int, uid_map = OrderDict_test, iid_map = OrderDict_test, uir_tuple = tuple, ...)

rs = RatioSplit(..., data_train = train_set, data_validation = val_set, data_test = test_set,... , verbose=True)

...

cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True, get_test_set = True, dir_to_save_test = <dir_local>).run()

So that the final output looks something like this
| MAE | RMSE | AUC | MAP | NDCG@10 | Precision@10 | Recall@10 | Train (s) | Test (s)
VAECF | 2.5927 | 2.7825 | 0.9133 | 0.1054 | 0.1258 | 0.0913 | 0.1264 | 5.9393 | 7.8235
test set saved on <dir_local>

Other Comments

The advantages of this would be that the user - in addition to being able to have and view the test set - will be able to have the additional results given by the evaluation pre-set by the cornac.Experiment command.

Is this possible by any chance?

Thanks again for your work. Very nice.
Have a nice day.
Best regards, Mirko

[BUG] The user graph is not updated correctly when using cross-validation

Description

I try to add a recommender model with social network. And I need to access the 'trustees' for each 'trustor' in training set, so I code as follows:

      # for each user in train_set
      for i in train_set.user_indices:
          S_i = train_set.user_graph.batch(i).data  # IndexError: row index (1484) out of range

Then an error occurred in the place shown in the comment：

Traceback (most recent call last):
File "D:/gitRepo/OLADSR(refactored)/test.py", line 97, in
eval_method=cv_split, models=[rs_dsr, rs_sorec], metrics=[rmse, ndcgs, pre, rec]
File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\experiment\experiment.py", line 130, in run
show_validation=self.show_validation,
File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\eval_methods\cross_validation.py", line 136, in evaluate
self, new_model, metrics, user_based, show_validation=False
File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\eval_methods\base_method.py", line 584, in evaluate
model.fit(self.train_set, self.val_set)
File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\hyperopt.py", line 146, in fit
model = self.model.clone(params).fit(train_set, val_set)
File "D:\gitRepo\OLADSR(refactored)\models\dsr\recom_dsr.py", line 84, in fit
self.maxItr2)
File "D:\gitRepo\OLADSR(refactored)\models\dsr\recom_dsr.py", line 141, in dcd_b
S_i = train_set.user_graph.batch(i).data
File "D:\Anaconda3\envs\DSR\lib\site-packages\cornac\data\graph.py", line 148, in batch
return self.matrix[batch_ids]
File "D:\Anaconda3\envs\DSR\lib\site-packages\scipy\sparse_index.py", line 35, in getitem
row, col = self._validate_indices(key)
File "D:\Anaconda3\envs\DSR\lib\site-packages\scipy\sparse_index.py", line 135, in _validate_indices
raise IndexError('row index (%d) out of range' % row)
IndexError: row index (1484) out of range

I set a breakpoint at the location where the error occurred and found that the size of the train_set.user_graph.matrix is smaller than train_set.num_users when the error occurred.

I don't know if my code is wrong or a bug in Cornac, but this error only appears in the last few folds of cross-validation, the first fold usually works correctly. So I think user_graph may not be updated correctly in the later fold.

In which platform does it happen?

OS: Windows 10
python：3.6
Cornac: 1.4.1

How do we replicate the issue?

Run the script below:

from cornac.exception import ScoreException
from cornac.models import Recommender
import numpy as np
import cornac.hyperopt


class Test(Recommender):

  def monitor_value(self):
      pass

  def __init__(
          self,
          name="Test",
          trainable=True,
          verbose=False,
          seed=None):
      Recommender.__init__(self, name=name, trainable=trainable, verbose=verbose)

      self.seed = seed
      self.U, self.V = None, None

  def fit(self, train_set: cornac.data.dataset.Dataset, val_set=None):
      Recommender.fit(self, train_set, val_set)
      num_users = train_set.num_users
      num_items = train_set.num_items
      self.U = np.random.rand(8, num_users)
      self.V = np.random.rand(8, num_items)

      # for each user
      for i in train_set.user_indices:
          S_i = train_set.user_graph.batch(i).data  # index out of range here

      return self

  def score(self, user_id, item_id=None):
      if item_id is None:
          if self.train_set.is_unk_user(user_id):
              raise ScoreException(
                  "Can't make score prediction for (user_id=%d)" % user_id
              )

          known_item_scores = self.V.T.dot(self.U[:, user_id])
          return known_item_scores
      else:
          if self.train_set.is_unk_user(user_id) or self.train_set.is_unk_item(
                  item_id
          ):
              raise ScoreException(
                  "Can't make score prediction for (user_id=%d, item_id=%d)"
                  % (user_id, item_id)
              )

          user_pred = self.V[:, item_id].dot(self.U[:, user_id])
          return user_pred


if __name__ == "__main__":
  from cornac.data import GraphModality
  from cornac.eval_methods import CrossValidation
  from cornac.experiment import Experiment
  from cornac import metrics
  from cornac.datasets import filmtrust

  ratings = filmtrust.load_feedback()
  trust = filmtrust.load_trust()

  user_graph_modality = GraphModality(data=trust)

  cv_split = CrossValidation(
      data=ratings,
      n_folds=5,
      exclude_unknowns=True,
      rating_threshold=0.0,
      user_graph=user_graph_modality,
      verbose=True,
  )

  # Instantiate SoRec model
  test = Test()

  # Evaluation metrics
  ndcg = metrics.NDCG(k=-1)
  rmse = metrics.RMSE()
  rec = metrics.Recall(k=20)
  pre = metrics.Precision(k=20)

  # Put everything together into an experiment and run it
  Experiment(
      eval_method=cv_split, models=[test], metrics=[rmse, ndcg, pre, rec]
  ).run()

Expected behavior (i.e. solution)

If this is a bug, please fix it, if not, please tell me how to modify my code to achieve this purpose

Other Comments

ValueError: could not convert string to float: 'e'

Hello,

In importing a default dataset using cornac, I faced this issue:

 citeu = cornac.datasets.citeulike.load_feedback()
 simple_headers = ["UserId", "ItemId", "Rating"]
 ds_df = pd.DataFrame(citeu, columns = simple_headers)
  print(ds_df.head())
  rs = RatioSplit(data=ds_df, test_size=0.2, rating_threshold=1.0, seed=123)

Traceback (most recent call last):
  File "/home/yas/PycharmProjects/cornac/yd/main_split_DS.py", line 87, in <module>
    rs = RatioSplit(data=ds_df, test_size=0.2, rating_threshold=1.0, seed=123)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/ratio_split.py", line 61, in __init__
    self._split()
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/ratio_split.py", line 105, in _split
    self.build(train_data=train_data, test_data=test_data, val_data=val_data)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/base_method.py", line 584, in build
    self._build_datasets(train_data, test_data, val_data)
  File "/home/yas/PycharmProjects/cornac/cornac/eval_methods/base_method.py", line 462, in _build_datasets
    self.train_set = Dataset.build(
  File "/home/yas/PycharmProjects/cornac/cornac/data/dataset.py", line 357, in build
    r_values.append(float(rating))
ValueError: could not convert string to float: 'e'
### Other Comments

I convert the data to pandas dataframe, if possible that is preferred. Thanks for the suggestion to handle this issue smartly.

Another related question I have is it seems most of the default datasets in Cornac are rating-based. Could you point to few implicit datasets that we could make use of? Thanks

[ASK] I'm Mac user, and I cannot install?

Description

Hello, I'm using Intel Mac (Big Sur)
and get error when installing cornac

what I did

brew install llvm

pip install cornac
(since I use python3, pip3 as default)

Other Comments

Any idea?

Fit a model using modalities

Fit using modalities

Hello,

First of all, I can't thank you enough for your efforts with Cornac, it is a really powerful tool.
I'm trying to implement PCRL using Cornac. The experiments work as shown in examples. But I can't fit the model. There seems to be a lack of documentation on this area, I can't find how to do it in any page. Here's the issue:

I hope you can help me.

my very best,

Miguel Ângelo Rebelo

[ASK] get a average or a specific value of results

Description

Hi, when I uses to Cornac to obtain results of different model as the example you've provided, how can I get a specific result of MAE of MF? In other words,If I runned a experiment 10 times, How can I get a average values of different models? Thanks for your help.