alexeygrigorev / libffm-python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ycjuan/libffm

120.0 8.0 29.0 60 KB

A Python wrapper for LibFFM

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.86% C++ 78.69% Python 19.45%

libffm-python's Introduction

Python wrapper for libffm

This is a python wrapped for LibFFM library writen in C++

Installing it:

python setup.py install  # or python setup.py develop

Using it:

import ffm
from sklearn.metrics import roc_auc_score

# prepare the data
# (field, index, value) format

X = [[(1, 2, 1), (2, 3, 1), (3, 5, 1)],
     [(1, 0, 1), (2, 3, 1), (3, 7, 1)],
     [(1, 1, 1), (2, 3, 1), (3, 7, 1), (3, 9, 1)],]

y = [1, 1, 0]

ffm_data = ffm.FFMData(X, y)


# train the model for 10 iterations

n_iter = 10

model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_data)

for i in range(n_iter):
    print('iteration %d, ' % i, end='')
    model.iteration(ffm_data)

    y_pred = model.predict(ffm_data)
    auc = roc_auc_score(y, y_pred)
    print('train auc %.4f' % auc)


# save the model 
model.save_model('ololo.bin')

# load it to reuse the model
model = ffm.read_model('ololo.bin')

libffm-python's People

Contributors

Stargazers

Watchers

libffm-python's Issues

returning parameters in W as numpy array

Is it somehow possible to get the parameters in W as a numpy array?

I tried

m = model._model.m
n = model._model.n
k = model._model.k
W = np.ctypeslib.as_array(model._model.W,(n,m,k))

but this clearly didnt work and i am sadly not an expert with python-c interfacing.

NAN prediction

When I used it for classification, my prediction is all nan values.

model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_train_data)
for i in range(n_iter):
    print('iteration %d, ' % i, end='')
    model.iteration(ffm_train_data)
    train_y_pred = model.predict(ffm_train_data)
    print(train_y_pred.shape)
    print(train_y_pred)
    train_auc = roc_auc_score(np.array(train_y_class), train_y_pred)
    test_y_pred = model.predict(ffm_test_data)
    test_auc = roc_auc_score(np.array(test_y_class), test_y_pred)
    print('train auc %.4f' % train_auc,'test auc %.4f' % test_auc)

[nan nan nan ... nan nan nan]

I've no idea why this happen, I can only change the ffm.FFM() function parameters of eta, lam and k. Any idea or suggestion?

Train returns NaN with large field values

If I use large values for the field value (field/feature 40), the model predicts nan, although smaller values work fine.

Smaller values:

X = [[(0, 0, 1), (1, 18, 1), (40, 40, 32)],
     [(0, 0, 1), (1, 18, 1), (40, 40, 300)],
     [(0, 0, 1), (1, 4, 1), (40, 40, 100)]]

y = [1, 1, 0]

ffm_data = ffm.FFMData(X, y)

model = ffm.FFM(eta=0.205, lam=0.0001, k=4)
model.init_model(ffm_data)
model.iteration(ffm_data)
y_pred = model.predict(ffm_data)
print(y_pred)

[ 0.99700552  1.          0.99695754]

However changing the last value from 100 to 200:

X = [[(0, 0, 1), (1, 18, 1), (40, 40, 32)],
     [(0, 0, 1), (1, 18, 1), (40, 40, 300)],
     [(0, 0, 1), (1, 4, 1), (40, 40, 200)]]

[ nan  nan  nan]

Big dataset: Streaming data ?

Does your implementation support having X and Y given with a generator ?

If not, can we build amethods:

X, y -> write FFM-formated data in file
file -> training of the model

Only nan results

When I run this code:

import ffm

test_data = [[
     (1, 1, 1.0), (2, 12, 1.0), (3, 17, 0.9890329837799072), (4, 18, 0.7214174270629883),
     (5, 19, 0.3654496669769287), (6, 21, 1.0), (7, 25, 1.0), (8, 28, 1.0), (9, 29, 1.0), (10, 138, 1.0),
     (10, 70, 1.0), (10, 93, 1.0), (10, 147, 1.0), (10, 48, 1.0), (10, 113, 1.0), (10, 115, 1.0), (10, 77, 1.0),
     (10, 81, 1.0), (10, 46, 1.0), (10, 95, 1.0), (10, 57, 1.0), (10, 34, 1.0), (10, 66, 1.0), (10, 126, 1.0),
     (10, 87, 1.0), (10, 69, 1.0), (10, 64, 1.0), (10, 123, 1.0), (10, 41, 1.0), (10, 116, 1.0), (10, 129, 1.0),
     (11, 154, 1.0), (12, 167, 1.0), (12, 158, 1.0), (12, 168, 1.0), (12, 167, 1.0), (13, 174, 1.0), (13, 172, 1.0),
     (13, 172, 1.0), (13, 174, 1.0), (13, 172, 1.0), (13, 176, 1.0), (13, 170, 1.0), (14, 180, 1.0), (15, 185, 1.0),
     (15, 182, 1.0), (15, 185, 1.0), (15, 184, 1.0), (16, 187, 1.0), (17, 188, 1.0), (18, 192, 1.0), (19, 195, 1.0),
     (20, 196, 1.0), (21, 197, 1.0), (22, 200, 1.0), (23, 204, 1.0), (24, 205, 1.0), (25, 207, 1.0), (26, 208, 1.0),
     (27, 212, 1.0), (28, 213, 1.0), (29, 216, 1.0), (30, 217, 1.0), (31, 218, 1.0), (32, 219, 1.0)
], [
     (1, 1, 1.0), (2, 15, 1.0), (3, 17, 0.303568840026855), (4, 18, 0.4841330647468567),
     (5, 19, 0.3654496669769287), (6, 21, 1.0), (7, 25, 1.0), (8, 28, 1.0), (9, 29, 1.0), (10, 145, 1.0),
     (10, 35, 1.0), (10, 99, 1.0), (10, 141, 1.0), (10, 83, 1.0), (10, 51, 1.0), (10, 101, 1.0), (10, 142, 1.0),
     (10, 145, 1.0), (10, 37, 1.0), (10, 141, 1.0), (10, 102, 1.0), (10, 81, 1.0), (10, 108, 1.0), (10, 50, 1.0),
     (10, 112, 1.0), (10, 60, 1.0), (10, 148, 1.0), (10, 142, 1.0), (10, 112, 1.0), (10, 85, 1.0), (10, 36, 1.0),
     (10, 73, 1.0), (10, 49, 1.0), (10, 50, 1.0), (10, 142, 1.0), (10, 82, 1.0), (10, 128, 1.0), (10, 55, 1.0),
     (10, 71, 1.0), (10, 137, 1.0), (10, 52, 1.0), (10, 148, 1.0), (10, 46, 1.0), (10, 98, 1.0), (10, 107, 1.0),
     (11, 154, 1.0), (12, 167, 1.0), (12, 167, 1.0), (12, 165, 1.0), (12, 165, 1.0), (13, 176, 1.0), (13, 176, 1.0),
     (13, 178, 1.0), (14, 179, 1.0), (15, 184, 1.0), (15, 184, 1.0), (17, 188, 1.0), (18, 192, 1.0), (19, 194, 1.0),
     (21, 198, 1.0), (22, 201, 1.0), (23, 204, 1.0), (24, 205, 1.0), (25, 206, 1.0), (26, 208, 1.0), (27, 212, 1.0),
     (29, 215, 1.0)]
]
y = [0, 1]
ffm_data = ffm.FFMData(test_data, y)
model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_data)
for i in range(10):
    print('iteration %d ' % i, end='')
    model.iteration(ffm_data)

    y_pred = model.predict(ffm_data)
    print(y_pred)

I get nan results:

iteration 0 [ nan  nan]
iteration 1 [ nan  nan]
iteration 2 [ nan  nan]
iteration 3 [ nan  nan]
iteration 4 [ nan  nan]
iteration 5 [ nan  nan]
iteration 6 [ nan  nan]
iteration 7 [ nan  nan]
iteration 8 [ nan  nan]
iteration 9 [ nan  nan]

Why is that? What is wrong?
When I change y = [0.0000001, 1]
I get:

iteration 0 [ 1.  1.]
iteration 1 [ 1.  1.]
iteration 2 [ 1.  1.]
iteration 3 [ 1.  1.]
iteration 4 [ 1.  1.]
iteration 5 [ 1.  1.]
iteration 6 [ 1.  1.]
iteration 7 [ 1.  1.]
iteration 8 [ 1.  1.]
iteration 9 [ 1.  1.]

What is going on?

Making sure I understand the data format

Thanks for updating this wrapper and getting it working.

I want to make sure I'm understanding how to format the input data for training. LibFFM repo gives the example:

Click  Advertiser  Publisher
=====  ==========  =========
    0        Nike        CNN
    1        ESPN        BBC

Here, we have 

    * 2 fields: Advertiser and Publisher

    * 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC

To format this as [[(field, index, value), ...], ...], would this be correct:

# Fields: 0 = Advertiser, 1 = Publisher
# Indexes: (0,0) = Advertiser-Nike, (0,1) = Advertiser-ESPN, (1,0) = Publisher-CNN, (1,1) = Publisher-BBC
# Values: 0 = absent, 1 = present (i.e. one-hot encoding).

X = [[(0, 0, 1), (1,0,1)],
     [(0, 1, 1), (1,1,1)]]

Thanks again

Memory leak

by @4yu:

sorry,i can't find the issue link,just write here. when i try this in my online service:

ffm_data = ffm.FFMData(arr_feature, y) 
predictions = model.predict(ffm_data);

then,the server's memory increase continuously.the problem perhaps come from ffm.cpp ffm_convert_data function.

please have a look,thank you.

Doesn't install under windows

It looks like the Windows makefile hasn't been updated for the python wrapper.

'module' object has no attribute 'FFMData'

when i use it, i get the error,what wrong?

alexeygrigorev / libffm-python Goto Github PK

libffm-python's Introduction

Python wrapper for libffm

libffm-python's People

Contributors

Stargazers

Watchers

Forkers

libffm-python's Issues

returning parameters in W as numpy array

NAN prediction

Train returns NaN with large field values

Big dataset: Streaming data ?

Only nan results

Making sure I understand the data format

Memory leak

Doesn't install under windows

'module' object has no attribute 'FFMData'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent