srendle / libfm Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 415.0 123 KB

Library for factorization machines

License: GNU General Public License v3.0

Perl 1.84% C++ 94.95% C 2.86% Makefile 0.35%

libfm's People

Contributors

Stargazers

Watchers

Forkers

fancyspeed wangdongfrank haonest kobedeshow tongming timedcy zhufangzhou alansaid liuyepku njuhugn aficionado riskyhe309 jeongyoonlee mohit-shrma okeqinwang starsnet83 coding10000 subailong seedaily wangbing nkhuyu greeness xuanhan863 beidouw qinghanmeng snaillians lifematrix ivanliu1989 ty01csbaidu demon386 466152112 yunshuliu kymo thierry-silbermann fanfannothing lucosax chagge junwei-pan antlypls lenovor zxsted lr-erics mazefeng just4jin varadharajan runrunliuliu quantlab avtomaton mindis fangyw witgo xdwangh qiucode 0x0all stevenlol chansource bangliu timwee guomin jacky168 mquad dbtsai kevinningthu louiss007 chenbk85 lizhangzhan echohenry2006 kgierach zhchxi11 googol-lab bigsea2015 rikima stephanesbizzera livingbio peratham wangmiao1981 fmacias64 jyt109 linjer fabiopetroni jamborta fenixlin kevinhsu evanwang1990 rms15 ml-lab maejie andland liyong3forever stillkeeptry ethanhu elvinxiao dagangwood163 xmyqsh youngjt birdgun jiajiadf srrcboy ageek alexchao2012

libfm's Issues

Load/save for other models than SGD/ALS

The README of this project: https://github.com/jfloff/pywFM states:

Make sure you are compiling source from libfm repository and at this specific commit, since pywFM needs the save_model. Beware that the installers and source code in libfm.org are both dated before this commit. I know this is extremely hacky, but since a fix was deployed it only allows the save_model option for SGD or ALS. I don't know why exactly, because it was working well before.

It seems weird to me that the author hasn't approached you to find a better solution than this hack, and I'm not familiar enough with the code to suggest a PR that would solve the problem cleanly. Besides, currently there's no explicit explanation as to why you forbid to load/save for things other than SGD or ALS, so I don't know where I could do that.

Therefore, I'm making this issue to see if we could find a better solution than this! :-)

@srendle Could you explain what the problem with other models are, and why this check is in place?

@jfloff Could you tell us why pywFM needs to be able to save/load different kinds of models than SGD and ALS?

Thanks!

can i find newest windows executable version?

Hi,icountered a problem when compiling with the source code on win10,i cant figured it out,but i can use version 1.4.0 when i drop the -save model arg,could you please upload the newest version of libfm compiled on windows that i can use full function of libfm,thansks.

terminal bash isn't known

I am using calibre (calibre 3.3) and started it in a shell.
Once in a while I receive the following error:

** (pcmanfm:26810): WARNING **: terminal bash isn't known, consider report it to LibFM developers
/usr/bin/xdg-open: line 709: : command not found

This is the ls output so the file is there ...
% ls -l /usr/bin/xdg-open
-rwxr-xr-x 1 root root 22746 Jan 20 2017 /usr/bin/xdg-open

and bash is installed:
% which bash
/bin/bash
% ls -l /bin/bash
-rwxr-xr-x 1 root root 725872 Dec 8 2016 /bin/bash

But the default shell is not bash but csh:
% echo $SHELL
/bin/csh

Is there anything else you need to know?

save_model parameter not found

while i try to save model train with sgd the error rise:

../bin/libFM -task c -train sampleRatingData.dat.libfm -test sampleRatingData.dat.libfm -dim ’1,1,8’ -out sample.res -rlog sample.log -method sgd -learn_rate 0.001 -regular '0,0,0.001' -save_model model.fm 
----------------------------------------------------------------------------
libFM
  Version: 1.4.2
  Author:  Steffen Rendle, [email protected]
  WWW:     http://www.libfm.org/
This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt.
This is free software, and you are welcome to redistribute it under certain
conditions; for details see license.txt.
----------------------------------------------------------------------------

ERROR: the parameter save_model does not exist

Some people claim the model can saved by sgm-trained model with save_model flag, how to fix this?

Train and test performance seem to be calculated differently.

I was testing libFM, and one of my tests involved running libFM with the same train and test dataset:

libFM -task c -train train.libfm -test train.libfm

This seems to work, but the intermediate performance values are different for the train and test set, while the data comes from the same file:

...
#Iter= 97   Train=0.530437  Test=0.530998   Test(ll)=0.299652
#Iter= 98   Train=0.528048  Test=0.530657   Test(ll)=0.299651
#Iter= 99   Train=0.52756   Test=0.530803   Test(ll)=0.299649

I would expect that the train and test performance are exactly the same. Is this an indication of a bug? Or do I misunderstand what is being logged here?

Checking the weights of the model

Hi,

how can i lookup the weights of the variables of FM after training?

Factorization Machines Query

I have a very basic query; Is factorization machine designed to work only with binary fields? Do we need to one hot encode all features? How are real-valued featured handled?

Thank you!

d-way factorization machines

Add support for interaction levels higher than 2.

Strange results on hello world example

Please enlighten me, i tried the simplest possible example:

File train.libfm is set to

1 0:1 1:1

and ran it using

libFM -task r -method mcmc -train train.libfm -test train.libfm -iter 10 -dim ‘0,0,1’ -out output.libfm -save_model model.libfm

Hence, only the pairwise interactions should be used and its dimension is 1. The regression shows a perfect fit (as expected). However, looking at model.libfm gives me

#pairwise interactions Vj,f
0.0139959
0.711416

My expectations is that the first number times the second number (the pairwise interaction of the two features) should be 1 (the target of the regression), but it is always clearly sth else. Tried the same trivial example with fastFM and it behaved as expected.

Bad loss function for classification?

According to the docs:

For binary classification, cases with y > 0 are regarded as the positive class and with y ≤ 0 as the negative class.

But if you have a target of 0 in the loss function for negative cases then you don't learn anything because your loss is always 0:

        } else if (task == 1) {
            grad_loss = target * ( (1.0/(1.0+exp(-target*p))) -  1.0);

It seems like target needs to be normalized in the classification case, but I don't see anywhere in the code where that'd be happening. (Note that I haven't actually run the code to prove it's misbehaving, but am just reading it and this didn't make sense to me. Did I miss something?)

feature design and test set values: "Other Movies Rated"

Dear all,

Hoping to get some insight into feature design here and check my understanding is correct, as I am new to FMs.

In the original Factorization Machines paper in 2010, the "Other Movies Rated" feature contains normalised values for all the other movies the user has ever rated.

Let's use the user Alice in the example, and assume the example covers the training set. We see she's rated 3 movies: NH, TI, and SW. Since there are 3 movies, the "Other Movies Rated" columns have values of (0.3, 0.3, 0.3, 0...).

Say in my test set, Alice has rated ST (Star Trek) with a target of 1. In my "Other Movies Rated" columns in the test set, should I use (0.25, 0.25, 0.25, 0.25 ...), with the fourth value updated for Alice's rating of ST? Or should I use (0.3, 0.3, 0.3, 0...), similar to the training set?

Thanks in advance! Apologies if this question has been asked elsewhere, I haven't been able to find a conclusive answer.

Log loss should be comuted in base e, not 10

In _evaluate_class the log loss computed uses base 10, it should be base e (natural log).
I.e.:
_loglikelihood -= m_log10(pll) + (1-m)_log10(1-pll);
should be,
_loglikelihood -= m_log(pll) + (1-m)_log(1-pll);

can it be used in python?

Use defined constants instead of hardcoding magic numbers

fm_learn_sgd_element.h and fm_learn_sgd_element_adapt_reg.h have code like if (task == 0) {.

fm_learn_sgd.h does this in a much cleaner way: if (task == TASK_REGRESSION ) {

These enum constants should be used throughout

something wrong when i tried to run the demo

i type "./demo.sh" to run the demo,but it stopped at

root@ubuntu:/home/ckf/libmf-2.01/demo# ./demo.sh

Real-valued matrix factorization

iter tr_rmse va_rmse obj
0 2.4766 1.3686 3.2943e+04
1 1.1560 1.0859 9.0002e+03
2 0.9011 1.0493 6.3830e+03
3 0.8107 1.0281 5.5837e+03
4 0.7588 1.0191 5.1730e+03
5 0.7173 1.0100 4.8799e+03
6 0.6774 1.0121 4.6092e+03
7 0.6422 1.0095 4.3969e+03
8 0.6030 1.0082 4.1780e+03
9 0.5626 1.0114 3.9763e+03
yesterday Real-valued matrix factorization can be done,stopped at binary_matrix.
Can you give me some advice?

Assertion error in Transpose

Hi,

I have prepared a Train.x and Train.y file after which I am trying to transpose the input matrix to obtain Train.xt and during this transpose operation, I am encountering the following error!

Assertion failed: out_cache_col_num > 0, file tools\transpose.cpp, line 125

Any idea what this error is?
Could you suggest what can be done?

Thanks,
Phani

自适应部分权重是否进行了二次更新

您好！最近在看libfm的源码，关于自适应FM部分中部分代码不是太明白，感觉代码和原理不是太相符。在sgd_theta_step函数中对t代权重进行了更新，而在predict_scaled函数中计算t+1代的p值时为何又对权重更新了一次？望指点！

Does the Block Structure Extension work for method=SGD?

The docs (http://www.libfm.org/libfm-1.42.manual.pdf) say at paragraph 4.3:

BS is only supported by MCMC and ALS/CD.

However, in the source, it seems it works just fine: https://github.com/srendle/libfm/blob/master/src/libfm/libfm.cpp#L189

I tracked this line down to commit b290ad8 "Commit of libFM 1.4.2", so it is really strange.

Kitty Terminal Emulator

Hi,

I'm using the kitty Terminal Emulator and get this warning when opening pcmanfm:

** (pcmanfm:746350): WARNING **: 17:58:59.842: terminal kitty isn't known, consider report it to LibFM developers

So I just wanted to report this to you

it is a great job

i read the code, maybe i can use it well

where can i find train and test data

i can only find rating.dat,movies.dat and ratings.dat or movies.csv,rating.csv on MovieLens .
Where can i find train and test data?

umm... so i reported

This:

** (pcmanfm:14792): WARNING **: 06:47:51.518: terminal alacritty isn't known, consider report it to LibFM developers

So I did...

Option to Save Predictive Model

It would be great to have an option to save the predictive model after training. This way a trained model could be applied to a number of test sets without having to retrain.

Unary "rating"

Hi!

Is it possible somehow to adopt algorithm for the case [User, User Features, Movie, Movie Features, Watched=1] where Y (Watched) is always 1 and we don't have neither another class nor another "marks" (like in classic 1...5 scale)? Watched could be views, clicks, purchases etc.

If it's not possible or possible but requires some additional work (e.g. code modification) it would be nice to include this info into documentation. If I remember correctly, one of the Rendle's articles talks about some tag recommendation competition where code modification was applied.

Thanks, Artem.

Can I load model from file to predict new items without training?

CPU Usage multiple threading

Thank you for providing this open source implementation. When I ran libFM, it uses only one thread (100% of a CPU). Is this the intended behavior or is there way to utilize multiple threads?

Error while using als method

If method is given als, code changes the param_method value to mcmc as als is an mcmc without sampling and hyperparameter inference (File : libfm.cpp, Line : 123).

While saving the model it checks for the model to be either 'sgd' or 'als', but as the param_method has been changed to 'mcmc' , it won't save model file.

some confusion

hello, i am newer to use libFM, it was a great tool
i used mcmc to train a CTR model, i met 2 pro

data has 160 million features, when init_V is small such as 0.001,0.005 it seem normally that auc is 0.6-0.7 but when i set init_V 0.1,0.5 the result just like 0,0,3333,1... i hope you give me some advice
i saw "if mcmc save model it will have to save param every iter" why only save last iter param is not ok ? and way the final y_predict is avg of evey iter.

i hope to receive some reply
thanks!

Addition of a build system generator

I suggest to reuse a higher level build system than your current small make files so that powerful checks for software features will become easier.

Convert Data with mixed datatypes to LibSVM format

I have data with about Million rows and 3 columns. The columns are of 3 different datatypes. NumberOfFollowers is of a numerical datatype, UserName is of a categorical data type, Embeddings is of categorical-set type.

df:

Index  NumberOfFollowers                  UserName                    Embeddings        Target Variable

0        15                                name1                      [0.5 0.3 0.2]       0
1        4                                 name2                      [0.4 0.2 0.4]       1
2        8                                 name3                      [0.5 0.5 0.0]       0
3        10                                name1                      [0.1 0.0 0.9]       0
...      ...                               ....                       ...                 ..

I would like to convert this data into the LibSVM input format.

Desired Output:

0 0:15 4:1 1:0.5 2:0.3 3:0.2
1 0:4 5:1 1:0.4 2:0.2 3:0.4
0 0:8 6:1 1:0.5 2:0.5 3:0.0
0 0:10 4:1 1:0.1 2:0.0 3:0.9
...

The Perl script https://github.com/srendle/libfm/blob/master/scripts/triple_format_to_libfm.pl handles categorical values. But, how to handle mixture of data types as also described in this paper: https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle_et_al2011-Context_Aware.pdf

Can this problem be solved using libfm or I have to use external tools? If I need to use external tools, are you aware of any external tools which perform this operation on a very large scale data (as I have many columns of mixed data types)?

Optional Multithreading

Hi,

Wouldn't be a good idea to make this a multithread solution?

The rating prediction for the trainset/testset, significantly slows down the training process when the amount of ratings increase significantly. This is a modification that has little to none complexity to implement and influences the performance significantly.

Thats just a suggestion.

Thank you,
André

PS: a parameter would be a good idea. Either the multithreading is activated or not.
PS2: My solution currently does everything in multithreading then when I use LibFM it simply slows down and instead of using 32 cores uses 1 core. Unfortunately, I'm not familiarized enough with libfm source code to develop the modification.