irecsys / deepcarskit Goto Github PK

A Deep Learning Based Context-Aware Recommendation Library

License: MIT License

Shell 0.12% Python 98.76% HTML 1.12%

collaborative-filtering context-aware context-aware-recommender-system deep-learning neural-collaborative-filtering neural-network pytorch recommender-system deep-recommender-system

deepcarskit's Issues

Error when running run.py script

I have cloned the repository and want to test the code, so I have started following the instructions in the README file and am getting some errors. (I cloned this repo one day before posting this issue so that you can get the exact version to reproduce the error)

Steps to reproduce Error

python3 -m venv cars
git clone https://github.com/irecsys/DeepCARSKit.git
cd DeepCARSKit
pip3 install -r requirements.txt
python3 run.py

Error is at the end of the bash area

Some more additional information about the hardware and software
Software

OS = Rocky Linux 8.5 (Green Obsidian)
Python = 3.9.9

Hardware

CUDA = 11.6
GPU = NVIDIA A2

Error

GPU availability:  True
Num of GPU:  1
NVIDIA A2
Current GPU index:  0

18 Feb 12:52    INFO  
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2022
state = INFO
reproducibility = True
data_path = dataset/tripadvisor
checkpoint_dir = saved
show_progress = False
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 50
train_batch_size = 500
learner = adam
learning_rate = 0.01
train_neg_sample_args = {'distribution': 'none', 'sample_num': 'none', 'alpha': 'none', 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'CV': 5}, 'group_by': 'user', 'mode': 'labeled', 'order': 'RO'}
repeatable = False
metrics = ['MAE', 'RMSE', 'AUC']
topk = [10, 20, 30]
valid_metric = MAE
valid_metric_bigger = False
eval_batch_size = 409600
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator = ,
seq_separator =  
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = {'rating': 0}
NEG_PREFIX = neg_
load_col = None
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = None
filter_inter_by_user_or_item = True
user_inter_num_interval = [0,inf)
item_inter_num_interval = [0,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None

Other Hyper Parameters: 
worker = 0
wandb_project = recbole
shuffle = True
require_pow = False
enable_amp = False
enable_scaler = False
transform = None
numerical_features = []
discretization = None
kg_reverse_r = False
entity_kg_num_interval = [0,inf)
relation_kg_num_interval = [0,inf)
MODEL_TYPE = ModelType.CONTEXT
CONTEXT_SITUATION_FIELD = contexts
USER_CONTEXT_FIELD = uc_id
neg_sampling = None
mf_embedding_size = 64
mlp_embedding_size = 64
mlp_hidden_size = [128, 64, 32]
dropout_prob = 0.1
mf_train = True
mlp_train = True
embedding_size = 64
ranking = False
sigmoid = False
ranking_valid_metric = Recall@10
ranking_metrics = ['Precision', 'Recall', 'NDCG', 'MRR', 'MAP']
err_valid_metric = MAE
err_metrics = ['MAE', 'RMSE', 'AUC']
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.VALUE
single_spec = True
local_rank = 0
device = cuda
eval_neg_sample_args = {'distribution': 'none', 'sample_num': 'none'}


18 Feb 12:52    INFO  tripadvisor
The number of users: 2372
Average actions of users: 5.978490088570224
The number of items: 2270
Average actions of items: 6.24724548259145
The number of inters: 14175
The sparsity of the dataset: 99.73674142529214%
Remain Fields: ['user_id', 'item_id', 'rating', 'trip', 'contexts', 'uc_id']
Context dimension - trip: 6 values: : ['BUSINESS' 'COUPLES' 'FAMILY' 'FRIENDS' 'SOLO' '[PAD]']
Traceback (most recent call last):
  File "/scratch/apeddi/DeepCARSKit/run.py", line 32, in <module>
    run(config_file_list=config_list)
  File "/scratch/apeddi/DeepCARSKit/deepcarskit/quick_start/quick_start.py", line 96, in run
    train_data, valid_data = data_preparation(config, dataset)
  File "/scratch/apeddi/DeepCARSKit/deepcarskit/data/utils.py", line 132, in data_preparation
    train_sampler, valid_sampler = create_samplers(config, dataset, built_datasets[fold])
  File "/scratch/apeddi/DeepCARSKit/deepcarskit/data/utils.py", line 301, in create_samplers
    if train_neg_sample_args['strategy'] != 'none':
KeyError: 'strategy'

@irecsys Could you please help me in resolving this error?

getting error while running run.py

KeyError: 'strategy'
Traceback (most recent call last):
File "/home/user/DeepCARSKit/run.py", line 32, in
run(config_file_list=config_list)
File "/home/user/DeepCARSKit/deepcarskit/quick_start/quick_start.py", line 96, in run
train_data, valid_data = data_preparation(config, dataset)
File "/home/user/DeepCARSKit/deepcarskit/data/utils.py", line 132, in data_preparation
train_sampler, valid_sampler = create_samplers(config, dataset, built_datasets[fold])
File "/home/user/DeepCARSKit/deepcarskit/data/utils.py", line 301, in create_samplers
if train_neg_sample_args['strategy'] != 'none':

isssue of ranking based measures

hello in config file when i make ranking:True since i need ranking based evaluation metrics, it gives errors... below is my config file code, please help
field_separator: ","
seq_separator: " "

gpu_id: 0
use_gpu: True
show_progress: False
save_dataset: False
save_dataloaders: False

############### data setting ###############
seed: 2022
dataset: depaulmovie

define data_path as the parent directory of your data folder

data_path: d:\dataset\

USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
CONTEXT_SITUATION_FIELD: contexts
USER_CONTEXT_FIELD: uc_id

note: you can use either load or unload, cannot use them both

load_col is used to load specific columns; unload_col is used to ignore selected columns

set "load_col: ~", if you want to load all cols

load_col: {'inter': ['user_id','item_id','rating','contexts','uc_id']}

unload_col: {'inter': ['contexts']}

by default, we load all cols, unless there are some special requirements

load_col: ~
#load_col: {'inter': ['user_id','item_id','rating','contexts','uc_id']} # Add 'time' if it's needed

used for topN ranking only

LABEL_FIELD: label
threshold:
rating: 0

the current library does not support negative sampling

neg_sampling: ~

############### model setting ###############
model: NeuCMFii

General model

epochs: 50
train_batch_size: 5000
eval_batch_size: 409600
learner: adam

learner: adam, RMSprop

stopping_step: 10
clip_grad_norm: ~

clip_grad_norm: {'max_norm': 5, 'norm_type': 2}

weight_decay: 0.0

NeuCF models

mf_embedding_size: 64
mlp_embedding_size: 64
mlp_hidden_size: [128,64,32]
learning_rate: 0.01
dropout_prob: 0.1

#tf_train: True
mf_train: True
mlp_train: True

FM models

embedding_size: 64
#mlp_hidden_size: [128,64,32]
#learning_rate: 0.01
#dropout_prob: 0.3

############### Evaluation setting ###############
eval_args:

split: {'RS': [0.8, 0.2]} # hold-out evaluation

split: {'CV': 5} # N-fold cross validation
group_by: user
mode: labeled # do not change it, DeepCARSKit only support this mode
order: RO

indicate the task is ranking or rating prediction

evaluation metrics automatically selected based on True/False setting here

ranking: True

indicate activation function for ranking task

LeakyReLu is the default activation function for both ranking or rating prediction

Sigmoid : True

define metrics for ranking and rating prediction tasks

ranking_valid_metric: Recall
ranking_metrics: ['Precision','Recall','NDCG','MRR','MAP']
topk: [5,10,20]

err_valid_metric: MAE
err_metrics: ['MAE','RMSE','AUC']

############### Output setting ###############
loss_decimal_place: 4
metric_decimal_place: 4

############### Negative Sampling setting ###############
train_neg_sample_args:
strategy: 'full' # Choose a strategy (e.g., 'none', 'by', 'full')
distribution: 'uniform' # Negative sampling distribution (optional)
eval_neg_sample_args:
strategy: 'full' # Choose a strategy (e.g., 'none', 'by', 'full')
distribution: 'uniform'

[ASK] Do you have any Jupyter Notebook tutorial?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.