Git Product home page Git Product logo

libffm-python's Introduction

Python wrapper for libffm

This is a python wrapped for LibFFM library writen in C++

Installing it:

python setup.py install  # or python setup.py develop

Using it:

import ffm
from sklearn.metrics import roc_auc_score

# prepare the data
# (field, index, value) format

X = [[(1, 2, 1), (2, 3, 1), (3, 5, 1)],
     [(1, 0, 1), (2, 3, 1), (3, 7, 1)],
     [(1, 1, 1), (2, 3, 1), (3, 7, 1), (3, 9, 1)],]

y = [1, 1, 0]

ffm_data = ffm.FFMData(X, y)


# train the model for 10 iterations

n_iter = 10

model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_data)

for i in range(n_iter):
    print('iteration %d, ' % i, end='')
    model.iteration(ffm_data)

    y_pred = model.predict(ffm_data)
    auc = roc_auc_score(y, y_pred)
    print('train auc %.4f' % auc)


# save the model 
model.save_model('ololo.bin')

# load it to reuse the model
model = ffm.read_model('ololo.bin')

libffm-python's People

Contributors

alexeygrigorev avatar alexklibisz avatar c-bata avatar chiahuaho avatar guestwalk-crto avatar timnon avatar vitobellini avatar ycjuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libffm-python's Issues

returning parameters in W as numpy array

Is it somehow possible to get the parameters in W as a numpy array?

I tried

m = model._model.m
n = model._model.n
k = model._model.k
W = np.ctypeslib.as_array(model._model.W,(n,m,k))

but this clearly didnt work and i am sadly not an expert with python-c interfacing.

NAN prediction

When I used it for classification, my prediction is all nan values.

model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_train_data)
for i in range(n_iter):
    print('iteration %d, ' % i, end='')
    model.iteration(ffm_train_data)
    train_y_pred = model.predict(ffm_train_data)
    print(train_y_pred.shape)
    print(train_y_pred)
    train_auc = roc_auc_score(np.array(train_y_class), train_y_pred)
    test_y_pred = model.predict(ffm_test_data)
    test_auc = roc_auc_score(np.array(test_y_class), test_y_pred)
    print('train auc %.4f' % train_auc,'test auc %.4f' % test_auc)
[nan nan nan ... nan nan nan]

I've no idea why this happen, I can only change the ffm.FFM() function parameters of eta, lam and k. Any idea or suggestion?

Train returns NaN with large field values

If I use large values for the field value (field/feature 40), the model predicts nan, although smaller values work fine.

Smaller values:

X = [[(0, 0, 1), (1, 18, 1), (40, 40, 32)],
     [(0, 0, 1), (1, 18, 1), (40, 40, 300)],
     [(0, 0, 1), (1, 4, 1), (40, 40, 100)]]

y = [1, 1, 0]

ffm_data = ffm.FFMData(X, y)

model = ffm.FFM(eta=0.205, lam=0.0001, k=4)
model.init_model(ffm_data)
model.iteration(ffm_data)
y_pred = model.predict(ffm_data)
print(y_pred)

[ 0.99700552  1.          0.99695754]

However changing the last value from 100 to 200:

X = [[(0, 0, 1), (1, 18, 1), (40, 40, 32)],
     [(0, 0, 1), (1, 18, 1), (40, 40, 300)],
     [(0, 0, 1), (1, 4, 1), (40, 40, 200)]]

[ nan  nan  nan]

Big dataset: Streaming data ?

Does your implementation support having X and Y given with a generator ?

If not, can we build amethods:

  • X, y -> write FFM-formated data in file
  • file -> training of the model

Only nan results

When I run this code:

import ffm

test_data = [[
     (1, 1, 1.0), (2, 12, 1.0), (3, 17, 0.9890329837799072), (4, 18, 0.7214174270629883),
     (5, 19, 0.3654496669769287), (6, 21, 1.0), (7, 25, 1.0), (8, 28, 1.0), (9, 29, 1.0), (10, 138, 1.0),
     (10, 70, 1.0), (10, 93, 1.0), (10, 147, 1.0), (10, 48, 1.0), (10, 113, 1.0), (10, 115, 1.0), (10, 77, 1.0),
     (10, 81, 1.0), (10, 46, 1.0), (10, 95, 1.0), (10, 57, 1.0), (10, 34, 1.0), (10, 66, 1.0), (10, 126, 1.0),
     (10, 87, 1.0), (10, 69, 1.0), (10, 64, 1.0), (10, 123, 1.0), (10, 41, 1.0), (10, 116, 1.0), (10, 129, 1.0),
     (11, 154, 1.0), (12, 167, 1.0), (12, 158, 1.0), (12, 168, 1.0), (12, 167, 1.0), (13, 174, 1.0), (13, 172, 1.0),
     (13, 172, 1.0), (13, 174, 1.0), (13, 172, 1.0), (13, 176, 1.0), (13, 170, 1.0), (14, 180, 1.0), (15, 185, 1.0),
     (15, 182, 1.0), (15, 185, 1.0), (15, 184, 1.0), (16, 187, 1.0), (17, 188, 1.0), (18, 192, 1.0), (19, 195, 1.0),
     (20, 196, 1.0), (21, 197, 1.0), (22, 200, 1.0), (23, 204, 1.0), (24, 205, 1.0), (25, 207, 1.0), (26, 208, 1.0),
     (27, 212, 1.0), (28, 213, 1.0), (29, 216, 1.0), (30, 217, 1.0), (31, 218, 1.0), (32, 219, 1.0)
], [
     (1, 1, 1.0), (2, 15, 1.0), (3, 17, 0.303568840026855), (4, 18, 0.4841330647468567),
     (5, 19, 0.3654496669769287), (6, 21, 1.0), (7, 25, 1.0), (8, 28, 1.0), (9, 29, 1.0), (10, 145, 1.0),
     (10, 35, 1.0), (10, 99, 1.0), (10, 141, 1.0), (10, 83, 1.0), (10, 51, 1.0), (10, 101, 1.0), (10, 142, 1.0),
     (10, 145, 1.0), (10, 37, 1.0), (10, 141, 1.0), (10, 102, 1.0), (10, 81, 1.0), (10, 108, 1.0), (10, 50, 1.0),
     (10, 112, 1.0), (10, 60, 1.0), (10, 148, 1.0), (10, 142, 1.0), (10, 112, 1.0), (10, 85, 1.0), (10, 36, 1.0),
     (10, 73, 1.0), (10, 49, 1.0), (10, 50, 1.0), (10, 142, 1.0), (10, 82, 1.0), (10, 128, 1.0), (10, 55, 1.0),
     (10, 71, 1.0), (10, 137, 1.0), (10, 52, 1.0), (10, 148, 1.0), (10, 46, 1.0), (10, 98, 1.0), (10, 107, 1.0),
     (11, 154, 1.0), (12, 167, 1.0), (12, 167, 1.0), (12, 165, 1.0), (12, 165, 1.0), (13, 176, 1.0), (13, 176, 1.0),
     (13, 178, 1.0), (14, 179, 1.0), (15, 184, 1.0), (15, 184, 1.0), (17, 188, 1.0), (18, 192, 1.0), (19, 194, 1.0),
     (21, 198, 1.0), (22, 201, 1.0), (23, 204, 1.0), (24, 205, 1.0), (25, 206, 1.0), (26, 208, 1.0), (27, 212, 1.0),
     (29, 215, 1.0)]
]
y = [0, 1]
ffm_data = ffm.FFMData(test_data, y)
model = ffm.FFM(eta=0.1, lam=0.0001, k=4)
model.init_model(ffm_data)
for i in range(10):
    print('iteration %d ' % i, end='')
    model.iteration(ffm_data)

    y_pred = model.predict(ffm_data)
    print(y_pred)

I get nan results:

iteration 0 [ nan  nan]
iteration 1 [ nan  nan]
iteration 2 [ nan  nan]
iteration 3 [ nan  nan]
iteration 4 [ nan  nan]
iteration 5 [ nan  nan]
iteration 6 [ nan  nan]
iteration 7 [ nan  nan]
iteration 8 [ nan  nan]
iteration 9 [ nan  nan]

Why is that? What is wrong?
When I change y = [0.0000001, 1]
I get:

iteration 0 [ 1.  1.]
iteration 1 [ 1.  1.]
iteration 2 [ 1.  1.]
iteration 3 [ 1.  1.]
iteration 4 [ 1.  1.]
iteration 5 [ 1.  1.]
iteration 6 [ 1.  1.]
iteration 7 [ 1.  1.]
iteration 8 [ 1.  1.]
iteration 9 [ 1.  1.]

What is going on?

Making sure I understand the data format

Thanks for updating this wrapper and getting it working.

I want to make sure I'm understanding how to format the input data for training. LibFFM repo gives the example:

Click  Advertiser  Publisher
=====  ==========  =========
    0        Nike        CNN
    1        ESPN        BBC

Here, we have 

    * 2 fields: Advertiser and Publisher

    * 4 features: Advertiser-Nike, Advertiser-ESPN, Publisher-CNN, Publisher-BBC

To format this as [[(field, index, value), ...], ...], would this be correct:

# Fields: 0 = Advertiser, 1 = Publisher
# Indexes: (0,0) = Advertiser-Nike, (0,1) = Advertiser-ESPN, (1,0) = Publisher-CNN, (1,1) = Publisher-BBC
# Values: 0 = absent, 1 = present (i.e. one-hot encoding).

X = [[(0, 0, 1), (1,0,1)],
     [(0, 1, 1), (1,1,1)]]

Thanks again

Memory leak

by @4yu:

sorry,i can't find the issue link,just write here. when i try this in my online service:

ffm_data = ffm.FFMData(arr_feature, y) 
predictions = model.predict(ffm_data);

then,the server's memory increase continuously.the problem perhaps come from ffm.cpp ffm_convert_data function.

please have a look,thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.