srendle / libfm Goto Github PK
View Code? Open in Web Editor NEWLibrary for factorization machines
License: GNU General Public License v3.0
Library for factorization machines
License: GNU General Public License v3.0
i read the code, maybe i can use it well
can it be used in python?
while i try to save model train with sgd the error rise:
../bin/libFM -task c -train sampleRatingData.dat.libfm -test sampleRatingData.dat.libfm -dim ’1,1,8’ -out sample.res -rlog sample.log -method sgd -learn_rate 0.001 -regular '0,0,0.001' -save_model model.fm
----------------------------------------------------------------------------
libFM
Version: 1.4.2
Author: Steffen Rendle, [email protected]
WWW: http://www.libfm.org/
This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt.
This is free software, and you are welcome to redistribute it under certain
conditions; for details see license.txt.
----------------------------------------------------------------------------
ERROR: the parameter save_model does not exist
Some people claim the model can saved by sgm-trained model with save_model
flag, how to fix this?
您好!最近在看libfm的源码,关于自适应FM部分中部分代码不是太明白,感觉代码和原理不是太相符。在sgd_theta_step函数中对t代权重进行了更新,而在predict_scaled函数中计算t+1代的p值时为何又对权重更新了一次?望指点!
Hi,
I'm using the kitty Terminal Emulator and get this warning when opening pcmanfm:
** (pcmanfm:746350): WARNING **: 17:58:59.842: terminal kitty isn't known, consider report it to LibFM developers
So I just wanted to report this to you
i can only find rating.dat,movies.dat and ratings.dat or movies.csv,rating.csv on MovieLens .
Where can i find train and test data?
This:
** (pcmanfm:14792): WARNING **: 06:47:51.518: terminal alacritty isn't known, consider report it to LibFM developers
So I did...
In _evaluate_class the log loss computed uses base 10, it should be base e (natural log).
I.e.:
_loglikelihood -= m_log10(pll) + (1-m)_log10(1-pll);
should be,
_loglikelihood -= m_log(pll) + (1-m)_log(1-pll);
hello, i am newer to use libFM, it was a great tool
i used mcmc to train a CTR model, i met 2 pro
i hope to receive some reply
thanks!
According to the docs:
For binary classification, cases with y > 0 are regarded as the positive class and with y ≤ 0 as the negative class.
But if you have a target of 0 in the loss function for negative cases then you don't learn anything because your loss is always 0:
} else if (task == 1) {
grad_loss = target * ( (1.0/(1.0+exp(-target*p))) - 1.0);
It seems like target needs to be normalized in the classification case, but I don't see anywhere in the code where that'd be happening. (Note that I haven't actually run the code to prove it's misbehaving, but am just reading it and this didn't make sense to me. Did I miss something?)
fm_learn_sgd_element.h and fm_learn_sgd_element_adapt_reg.h have code like if (task == 0) {
.
fm_learn_sgd.h does this in a much cleaner way: if (task == TASK_REGRESSION ) {
These enum constants should be used throughout
I have data with about Million rows and 3 columns. The columns are of 3 different datatypes. NumberOfFollowers is of a numerical datatype, UserName is of a categorical data type, Embeddings is of categorical-set type.
df:
Index NumberOfFollowers UserName Embeddings Target Variable
0 15 name1 [0.5 0.3 0.2] 0
1 4 name2 [0.4 0.2 0.4] 1
2 8 name3 [0.5 0.5 0.0] 0
3 10 name1 [0.1 0.0 0.9] 0
... ... .... ... ..
I would like to convert this data into the LibSVM input format.
Desired Output:
0 0:15 4:1 1:0.5 2:0.3 3:0.2
1 0:4 5:1 1:0.4 2:0.2 3:0.4
0 0:8 6:1 1:0.5 2:0.5 3:0.0
0 0:10 4:1 1:0.1 2:0.0 3:0.9
...
The Perl script https://github.com/srendle/libfm/blob/master/scripts/triple_format_to_libfm.pl handles categorical values. But, how to handle mixture of data types as also described in this paper: https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle_et_al2011-Context_Aware.pdf
Can this problem be solved using libfm or I have to use external tools? If I need to use external tools, are you aware of any external tools which perform this operation on a very large scale data (as I have many columns of mixed data types)?
Thanks for a great package! Would it be possible to add custom loss functions (e.g. WARP: https://lyst.github.io/lightfm/docs/lightfm.html)?
I was testing libFM, and one of my tests involved running libFM with the same train and test dataset:
libFM -task c -train train.libfm -test train.libfm
This seems to work, but the intermediate performance values are different for the train and test set, while the data comes from the same file:
...
#Iter= 97 Train=0.530437 Test=0.530998 Test(ll)=0.299652
#Iter= 98 Train=0.528048 Test=0.530657 Test(ll)=0.299651
#Iter= 99 Train=0.52756 Test=0.530803 Test(ll)=0.299649
I would expect that the train and test performance are exactly the same. Is this an indication of a bug? Or do I misunderstand what is being logged here?
Hi,
how can i lookup the weights of the variables of FM after training?
Hi,
Wouldn't be a good idea to make this a multithread solution?
The rating prediction for the trainset/testset, significantly slows down the training process when the amount of ratings increase significantly. This is a modification that has little to none complexity to implement and influences the performance significantly.
Thats just a suggestion.
Thank you,
André
PS: a parameter would be a good idea. Either the multithreading is activated or not.
PS2: My solution currently does everything in multithreading then when I use LibFM it simply slows down and instead of using 32 cores uses 1 core. Unfortunately, I'm not familiarized enough with libfm source code to develop the modification.
Please enlighten me, i tried the simplest possible example:
File train.libfm
is set to
1 0:1 1:1
and ran it using
libFM -task r -method mcmc -train train.libfm -test train.libfm -iter 10 -dim ‘0,0,1’ -out output.libfm -save_model model.libfm
Hence, only the pairwise interactions should be used and its dimension is 1. The regression shows a perfect fit (as expected). However, looking at model.libfm
gives me
#pairwise interactions Vj,f
0.0139959
0.711416
My expectations is that the first number times the second number (the pairwise interaction of the two features) should be 1 (the target of the regression), but it is always clearly sth else. Tried the same trivial example with fastFM and it behaved as expected.
All links pointing to http://www.inf.uni-konstanz.de/~rendle/ don't work anymore. Maybe it makes sense to move PDFs to project's page?
I code as the MCMC algorithm in Paper Factorization Machine with libFM in python ,but it not work ,is there any detail I can do ?
Add support for interaction levels higher than 2.
I am using calibre (calibre 3.3) and started it in a shell.
Once in a while I receive the following error:
** (pcmanfm:26810): WARNING **: terminal bash isn't known, consider report it to LibFM developers
/usr/bin/xdg-open: line 709: : command not found
This is the ls output so the file is there ...
% ls -l /usr/bin/xdg-open
-rwxr-xr-x 1 root root 22746 Jan 20 2017 /usr/bin/xdg-open
and bash is installed:
% which bash
/bin/bash
% ls -l /bin/bash
-rwxr-xr-x 1 root root 725872 Dec 8 2016 /bin/bash
But the default shell is not bash but csh:
% echo $SHELL
/bin/csh
Is there anything else you need to know?
Thank you for providing this open source implementation. When I ran libFM, it uses only one thread (100% of a CPU). Is this the intended behavior or is there way to utilize multiple threads?
Hi,icountered a problem when compiling with the source code on win10,i cant figured it out,but i can use version 1.4.0 when i drop the -save model arg,could you please upload the newest version of libfm compiled on windows that i can use full function of libfm,thansks.
Hi,
I have prepared a Train.x and Train.y file after which I am trying to transpose the input matrix to obtain Train.xt and during this transpose operation, I am encountering the following error!
Assertion failed: out_cache_col_num > 0, file tools\transpose.cpp, line 125
Any idea what this error is?
Could you suggest what can be done?
Thanks,
Phani
Hi!
Is it possible somehow to adopt algorithm for the case [User, User Features, Movie, Movie Features, Watched=1] where Y (Watched) is always 1 and we don't have neither another class nor another "marks" (like in classic 1...5 scale)? Watched could be views, clicks, purchases etc.
If it's not possible or possible but requires some additional work (e.g. code modification) it would be nice to include this info into documentation. If I remember correctly, one of the Rendle's articles talks about some tag recommendation competition where code modification was applied.
Thanks, Artem.
The README of this project: https://github.com/jfloff/pywFM states:
Make sure you are compiling source from libfm repository and at this specific commit, since pywFM needs the save_model. Beware that the installers and source code in libfm.org are both dated before this commit. I know this is extremely hacky, but since a fix was deployed it only allows the save_model option for SGD or ALS. I don't know why exactly, because it was working well before.
It seems weird to me that the author hasn't approached you to find a better solution than this hack, and I'm not familiar enough with the code to suggest a PR that would solve the problem cleanly. Besides, currently there's no explicit explanation as to why you forbid to load/save for things other than SGD or ALS, so I don't know where I could do that.
Therefore, I'm making this issue to see if we could find a better solution than this! :-)
@srendle Could you explain what the problem with other models are, and why this check is in place?
@jfloff Could you tell us why pywFM needs to be able to save/load different kinds of models than SGD and ALS?
Thanks!
It would be great to have an option to save the predictive model after training. This way a trained model could be applied to a number of test sets without having to retrain.
i type "./demo.sh" to run the demo,but it stopped at
iter tr_rmse va_rmse obj
0 2.4766 1.3686 3.2943e+04
1 1.1560 1.0859 9.0002e+03
2 0.9011 1.0493 6.3830e+03
3 0.8107 1.0281 5.5837e+03
4 0.7588 1.0191 5.1730e+03
5 0.7173 1.0100 4.8799e+03
6 0.6774 1.0121 4.6092e+03
7 0.6422 1.0095 4.3969e+03
8 0.6030 1.0082 4.1780e+03
9 0.5626 1.0114 3.9763e+03
yesterday Real-valued matrix factorization can be done,stopped at binary_matrix.
Can you give me some advice?
The docs (http://www.libfm.org/libfm-1.42.manual.pdf) say at paragraph 4.3:
BS is only supported by MCMC and ALS/CD.
However, in the source, it seems it works just fine: https://github.com/srendle/libfm/blob/master/src/libfm/libfm.cpp#L189
I tracked this line down to commit b290ad8 "Commit of libFM 1.4.2", so it is really strange.
Dear all,
Hoping to get some insight into feature design here and check my understanding is correct, as I am new to FMs.
In the original Factorization Machines paper in 2010, the "Other Movies Rated" feature contains normalised values for all the other movies the user has ever rated.
Let's use the user Alice in the example, and assume the example covers the training set. We see she's rated 3 movies: NH, TI, and SW. Since there are 3 movies, the "Other Movies Rated" columns have values of (0.3, 0.3, 0.3, 0...).
Say in my test set, Alice has rated ST (Star Trek) with a target of 1. In my "Other Movies Rated" columns in the test set, should I use (0.25, 0.25, 0.25, 0.25 ...), with the fourth value updated for Alice's rating of ST? Or should I use (0.3, 0.3, 0.3, 0...), similar to the training set?
Thanks in advance! Apologies if this question has been asked elsewhere, I haven't been able to find a conclusive answer.
I suggest to reuse a higher level build system than your current small make files so that powerful checks for software features will become easier.
I have a very basic query; Is factorization machine designed to work only with binary fields? Do we need to one hot encode all features? How are real-valued featured handled?
Thank you!
If method is given als, code changes the param_method
value to mcmc as als is an mcmc without sampling and hyperparameter inference (File : libfm.cpp, Line : 123).
While saving the model it checks for the model to be either 'sgd' or 'als', but as the param_method has been changed to 'mcmc' , it won't save model file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.