dmitryikh / leaves Goto Github PK

View Code? Open in Web Editor NEW

420.0 420.0 72.0 1.25 MB

pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks

License: MIT License

Go 83.29% Shell 0.63% Python 16.08%

boosting decision-trees go golang lightgbm machine-learning xgboost

leaves's People

Contributors

Stargazers

Watchers

leaves's Issues

add human readable text output of parsed pickle data

we have internal.pickle package to read pickle format.

It's difficult to debug what was read into the memory. We need a function to print parsed pickle data for debug purpose.

Leaves can't load xgboost model which trained for java api?

when I load xgboost model return errs,model is trained for xgboost4j。

support sklearn.ensemble.RandomForestRegressor

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Support the use of sklearn pipelines with prediction model

I found this super handy, will be great if we can not just predict based on trained model but can also used a sklearn pipeline including the transformation steps before actual prediction

Get performance gain on multicore systems

Let's use multiprocessing for batch predictions

obtain the leaf index of gbdt tree

My online prediction service wants to use GBDT + LR (Practical Lessons from Predicting Clicks on Ads at Facebook) algorithm combination, It will use the leaf index of tree. But Leaves doesn't support it.

add test&benchmark on categorical features with LightGBM

Support xgboost models

The prediction is wrong when using XGEnsembleFromFile to load model

I'm using xgbModel.nativeBooster.saveModel on spark to save the native model, then by XGEnsembleFromFile loading model to predict the validation dataset, but the results are not meet the same prediction done on spark. Here are the results predicted on leaves framework:

label: 1, pred: 0.836042
label: 1, pred: 0.836042
label: 1, pred: 0.797784
label: 1, pred: 0.934794
label: 1, pred: 0.793824
label: 1, pred: 0.797579
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.704733
label: 1, pred: 0.787566
label: 1, pred: 0.941911
label: 1, pred: 0.934794
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.839686
label: 1, pred: 0.759537
label: 1, pred: 0.813373
label: 1, pred: 0.760041
label: 1, pred: 0.793824
label: 1, pred: 0.934794
label: 1, pred: 0.759537
label: 1, pred: 0.929430
label: 1, pred: 0.945538
label: 1, pred: 0.785153
label: 1, pred: 0.959390
label: 1, pred: 0.793824
label: 1, pred: 0.779831
label: 1, pred: 0.959390
label: 1, pred: 0.749724
label: 1, pred: 0.941911
label: 1, pred: 0.798052
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.839686
label: 1, pred: 0.839686
label: 1, pred: 0.806166
label: 1, pred: 0.934794
label: 1, pred: 0.839686
label: 1, pred: 0.785153
label: 1, pred: 0.806166
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.759537
label: 1, pred: 0.806166
label: 1, pred: 0.768660
label: 1, pred: 0.797784
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.824530
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.806893
label: 1, pred: 0.929430
label: 1, pred: 0.803833
label: 1, pred: 0.797148
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.787042
label: 1, pred: 0.803833
label: 1, pred: 0.959390
label: 1, pred: 0.931993
label: 1, pred: 0.806166
label: 1, pred: 0.836042
label: 1, pred: 0.934794
label: 1, pred: 0.934794
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.779831
label: 1, pred: 0.787042
label: 1, pred: 0.785153
label: 1, pred: 0.749724
label: 1, pred: 0.749724
label: 1, pred: 0.934794
label: 1, pred: 0.929430
label: 1, pred: 0.797579
label: 1, pred: 0.945538
label: 1, pred: 0.934794
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.787042
label: 1, pred: 0.787042
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.941911
label: 1, pred: 0.749724
label: 1, pred: 0.850764
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.797579
label: 1, pred: 0.785153
label: 1, pred: 0.941911
label: 1, pred: 0.806166
label: 0, pred: 0.767173
label: 0, pred: 0.807510
label: 0, pred: 0.797784
label: 0, pred: 0.824530
label: 0, pred: 0.839686
label: 0, pred: 0.767173
label: 0, pred: 0.839686
label: 0, pred: 0.767176
label: 0, pred: 0.797579
label: 0, pred: 0.793824
label: 0, pred: 0.772110
label: 0, pred: 0.768660
label: 0, pred: 0.759537
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.929430
label: 0, pred: 0.941911
label: 0, pred: 0.822525
label: 0, pred: 0.839686
label: 0, pred: 0.945538
label: 0, pred: 0.749724
label: 0, pred: 0.929430
label: 0, pred: 0.787042
label: 0, pred: 0.797579
label: 0, pred: 0.797784
label: 0, pred: 0.797784
label: 0, pred: 0.945538
label: 0, pred: 0.785153
label: 0, pred: 0.797784
label: 0, pred: 0.836042
label: 0, pred: 0.931993
label: 0, pred: 0.836042
label: 0, pred: 0.779831
label: 0, pred: 0.945538
label: 0, pred: 0.812733
label: 0, pred: 0.945538
label: 0, pred: 0.745542
label: 0, pred: 0.779849
label: 0, pred: 0.903047
label: 0, pred: 0.816076
label: 0, pred: 0.807510
label: 0, pred: 0.749971
label: 0, pred: 0.945538
label: 0, pred: 0.804371
label: 0, pred: 0.767173
label: 0, pred: 0.934794
label: 0, pred: 0.785153
label: 0, pred: 0.767173
label: 0, pred: 0.797784
label: 0, pred: 0.785153
label: 0, pred: 0.807510
label: 0, pred: 0.768660
label: 0, pred: 0.804371
label: 0, pred: 0.787042
label: 0, pred: 0.704733
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.836042
label: 0, pred: 0.772110
label: 0, pred: 0.855798
label: 0, pred: 0.836042
label: 0, pred: 0.784896
label: 0, pred: 0.804371
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.903047
label: 0, pred: 0.787042
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.797579
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.749724
label: 0, pred: 0.806166
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.787042
label: 0, pred: 0.806166
label: 0, pred: 0.903047
label: 0, pred: 0.839686
label: 0, pred: 0.768660
label: 0, pred: 0.787042
label: 0, pred: 0.745542
label: 0, pred: 0.787042
label: 0, pred: 0.802708
label: 0, pred: 0.797784
label: 0, pred: 0.839686
label: 0, pred: 0.929430
label: 0, pred: 0.803833
label: 0, pred: 0.704733
label: 0, pred: 0.704733
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.813373
label: 0, pred: 0.836042
label: 0, pred: 0.767173
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.818067
label: 0, pred: 0.787566

Meet error when load xgboost model which trained with the API in sklearn

leaves does not support the sklearn xgboost model ?
I use the below python code to train one xgboost model, meet error when use the below API to load this model in go code
leaves.XGEnsembleFromFile("xg_iris.model", false)

The error :
mark@mark:~/golang $ go run predict_iris.go
Name: xgboost.gbtree
NFeatures: 4
NOutputGroups: 3
NEstimators: 100
panic: different sizes: len(a) = 30, len(b) = 90

goroutine 1 [running]:
main.main()
/home/mark/golang/predict_iris.go:44 +0x686
exit status 2

Below is the python code to train the xgboost model using the xgboost API in sklearn:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost.sklearn import XGBClassifier

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

xg_train = xgb.DMatrix(X_train, label=y_train)
xg_test = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'multi:softmax',
'num_class': 3,
}
n_estimators = 5
#clf = xgb.train(params, xg_train, n_estimators)
clf = XGBClassifier(**params)
clf = clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:,1]
clf.save_model('xg_iris.model')
np.savetxt('xg_iris_true_predictions.txt', y_pred, delimiter='\t')
datasets.dump_svmlight_file(X_test, y_test, 'iris_test.libsvm')

support pickle protocol 2

Could you please add support for lambdarank(lambdamart)?

This is an edge tool for ranking applications Which is implemented in lightgbm.
Really hope it can be supported! Thanks!

support multiclass predictions for LightGBM

support sklearn.ensemble.RandomForestClassifier

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Understanding the output of Predict

Hi,

I'm not sure I fully understand the output of the Predict() methods.

I have a fully trained model with 9 classes and 100 estimators. I then run:

predictions := make([]float64, 9)
err = model.Predict(values, 100, predictions)
util.SigmoidFloat64SliceInplace(predictions)
log.Infof("Prediction for %v:\n %v", values, predictions)

That yields:

Prediction for [110 0 12 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]: 
[0.2276 0.1822 0.2664 0.0594 0.0682 0.9859 0.1283 0.6349 0.0706]

I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do y_pred = model.predict(values), it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model

Question: support for objective:quantile

I have a model trained with quantile regression in light gbm. I get an error that this is not a valid option for objective when I used my model. Is there a workaround to get it working?

internal/xbin I/O

This internal package makes quite extensive use calls to 'binary.Read'.

This is very slow as it makes heavy use of reflection.
One should use the binary.ByteOrder.PutUintXX and binary.ByteOrder.UintXX methods instead.

Benchmarking C API vs pure go approaches

xgEnsemble prediction results are different from xgboost in python

I traning and testing data with xgboost in python, then use leaves in production env.
The more infos are there:

In Python
xgb testing, The data structure that I set up with pd.DataFrame is
[0:value1 1:v2, 2:v3, ... , n:v(n+1)]
the value1 is any value int type. and v2, ... , v(n+1) is float64 type. The 0 is prediction value.
This result is testing result.

And this structure:
[feature1:v2, f2:v3, ... , f(n):v(n+1)]
This result is NOT testing result.

In Golang
and I use leaves XGEnsembleFromFile->model.PredictCSR() also the result is NOT testing result.

I have tried to find how to solve it for over 5h like add {0:0} to first features group, but for my ridiculous low English level and Math level I can't find it.
What's wrong with my testing data = =

Unexpected objective field: 'lambdarank'

leaves.LGEnsembleFromFile() failed when load an objective=lambdarank model (lightGBM)

Error message: unexpected objective field: 'lambdarank', model:

tree
version=v2
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=24
objective=lambdarank
feature_names=t quality freshness navboost pctr video_type lctr_1_3 lctr_4_7 lctr_8_30 sctr_1_3 sctr_4_7 sctr_8_30 ctr_1_3 ctr_4_7 ctr_8_30 loglclick_1_3 logclick_1_3 logsclick_1_3 lctr_ins ctr_ins sctr_ins loglclick_ins logclick_ins logsclick_ins instant_navboost
feature_infos=[0:1.3200000524520874] [3.3299998904112726e-05:1] [0.36787900328636169:1] [0.36790001392364502:0.9999966025352478] [0:0.98189848661422729] [1:200] [0:10.87989330291748] [0:9.3969650268554688] [0:11.486390113830566] [0:4.2822332382202148] [0:3.7750816345214844] [0:2.8636219501495361] [0:8.7641057968139648] [0:7.1885638236999512] [0:10.401005744934082] [0:6.4371075630187988] [0:6.4220900535583496] [0:5.9216046333312988] [0:5.5910482406616211] [0:3.3260509967803955] [0:1.3753839731216431] [0:5.3813371658325195] [0:5.3396997451782227] [0:4.9291071891784668] [0.36790001392364502:0.9999929666519165]
tree_sizes=1308 911 993 1073 1235 1154 992 1316 1234 1151 997 1163 1234 1077 1090 1244 1237 1152 1400 1228 1246 1310 1240 1072 1327 1068 1242 1081 1312 1082 1162 1000 1330 1310 1408 1253 1165 1328 1082 1004 1172 1328 1161 1081 1151 1323 1325 1321 1410 1166 1073 1403 996 1242 991 1336 1232 1250 995 1309

LG: num_leaves=1 support

LightGBM tree can be like next:

Tree=2236
num_leaves=1
num_cat=0
split_feature=
split_gain=
threshold=
decision_type=
left_child=
right_child=
leaf_value=0
leaf_count=
internal_value=
internal_count=
shrinkage=1.03754e-322

But leaves treats num_leaves < 2 as input error.

TODO:

support a tree with only one leaf
add test for this case

support DART from XGBoost

what's exactly meanings of some parameters?

for such as loadTransformation and nEstimators, can any one tell me more details?

support sklearn.ensemble.GradientBoostingRegressor

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

error for load xgboost:gbtree

I got an error when I tried to load binary model of xgboost:gbtree, the error message as follow:

panic: unexpected EOF

goroutine 1 [running]:
main.main()
/Users/zhangxiatian/tuotuo/workspace/go/predictor/main.go:13 +0x1ba

Process finished with exit code 2

======================
the code as follow:

package main

import (
"fmt"

"github.com/dmitryikh/leaves"
)

func main() {
// 1. Read model
model, err := leaves.XGEnsembleFromFile("/Users/zhangxiatian/tuotuo/recsys/engin/model/model")
if err != nil {
panic(err)
}

}

is this library compatible with catboost?

I hope you will implement support for catboost Yandex library

Does leaves support go 1.11

After changing go version to 1.11 limited in "go.mod", it passed all unit test
my go version on my machine: go1.11.2 darwin/amd64

common model interface{}

Specifically for rests purpose it would be useful to have a common interface for LGEnsemble & XGEnsemble.

support v3 model encoding

lightgbm changed their model encoding to v3 in v2.3.0 to support weights in the model. Would like to see leaves support this new format.

only version=v2 is supported

code:

import (
    "fmt"
    "github.com/dmitryikh/leaves"
)

func main() {
    // 1. Read model
    useTransformation := true
    model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
    if err != nil {
        fmt.Println(err)
        panic(err)
    }
}

go build lgbuse.go, and then report an error:

only version=v2 is supported
panic: only version=v2 is supported
goroutine 1 [running]:
main.main()
lgbuse.go:14 +0x246
exit status 2

The line of code that cause probles:
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)

How can I solve this problem?

Gonum/mat

Currently, leaves uses a standalone matrix implementation that's almost completely binary compatible with gonum/mat.Dense implementation.

It would be very beneficial to just use the one from gonum/mat. (Or, at least, implement the mat.Matrix interface.)

TestXGHiggs mismatch

Here is discrepancy between original predictions and leaves predictions for XGBoost ensemble for Higgs problem (45th row of test data).

One should investigate: is it a bug or because of float tolerances on decision thresholds?

Prediction result always be 0.000

I use leaves to load my lightgbm model and predict instances, the results are always 0.00, while use python to predict, the result is not. Any one meet the problem ?
The type of feature including numerical and categorical

Support for returning feature_name in Ensemble struct

I want to get the feature name list in model, does the author have the space to support this feature?

support XGBoost GBLinear (generalized linear models)

Add documentation for `doctest.py`

testscripts/doctest.py is utility to automate examples checking.
It should contain small documentation and examples inside of it.

read XGBoost model from JSON

Total incorrect python xgboost train model,use leaves load model and predict

We use spark to generate libsvm file, then use python sklearn to load it and xgboost to train and save model， finaly use leaves load it and predict.
the predict result was total incorrect between python demo and go.
just want to ask if leve not support or we use leaves wrong.
the python code like:

my_workpath = 'D:\\project\\py\\train_demo\\'
X_train, y_train = load_svmlight_file(my_workpath + 'train')
X_test, y_test = load_svmlight_file(my_workpath + 'validation')
bst = XGBClassifier()
bst.fit(X_train, y_train)
bst.save_model(my_workpath + "train_model")
train_preds = [x[1] for x in bst.predict_proba(X_train)]
test_preds = [x[1] for x in bst.predict_proba(X_test)]

the go code like:

model, e := leaves.XGEnsembleFromFile(model_path,true)
	if e != nil{
		println(e)
	}
	if model.Transformation().Type() != transformation.Logistic {
		log.Fatalf("expected TransforType = Logistic (got %s)", model.Transformation().Name())
	}
	csr, err := mat.CSRMatFromLibsvmFile(validate_path, 0, true)
	if err != nil{
		println(err)
	}
	predictions := make([]float64, csr.Rows()*model.NOutputGroups())
	e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)
	if e != nil{
		println(e)
	}
	fmt.Printf("Prediction for %v\n", predictions)

support RandomForest from LightGBM

It seem that only thing one should do is to support average_output parameter from model file.

support pickle protocol 4

xgboost consistency failed

i build xgb model by python, and then run the results of test dataset.
but when i use leaves to load model and predict, the results is inconsistent with python results.

and i test lgb model with the same dataset, the results are consistent.

Is there the plan for xgboost Ranker with rank:pairwise?

Hi, As the title said, May leaves would support this type of model, thank for your teams' coding, haha

Does for the LightGBM have support for missing values

Does for the LightGBM have support for missing values and where can I see examples about it (read txt files with missing values)?

support transformation functions

Currently leaves outputs prediction as a raw score. Client code should transform it to probabilities (logistic), lambdarank and so.

Let's introduce this ability to leaves.

Short list for XGBoost:

"binary:logistic"
"binary:logitraw"
"multi:softmax"
"multi:softprob"
"reg:linear"

Short list for LightGBM:
todo

support DART from LightGBM?

The source code of DART class in LightGBM seems like nothing we should do in leaves and this class of models is already supported..

At least we should add test on this case..

Support for newer versions of XGBoost

Something has changed in XGBoost model's binary format. The highest versions I've managed to make leaves work with is 1.0. Starting from 1.1+ I keep getting "panic: unexpected EOF". Is support for newer versions planned?
Moreover, they've started to save models in JSON format and it looks like they're going to deprecate binaries altogether.

NOutputGroups always be 0

after I followed the example in the doc, there exist a starnge error

How complicated would it be to provide the model training part as well?

Just a general question:
How complicated would it be to provide the model training part as well?
Are there any plans for it?

dmitryikh / leaves Goto Github PK

leaves's People

Contributors

Stargazers

Watchers

Forkers

leaves's Issues

Recommend Projects

Recommend Topics

Recommend Org