dmitryikh / leaves Goto Github PK
View Code? Open in Web Editor NEWpure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks
License: MIT License
pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks
License: MIT License
we have internal.pickle
package to read pickle format.
It's difficult to debug what was read into the memory. We need a function to print parsed pickle data for debug purpose.
when I load xgboost model return errs,model is trained for xgboost4j。
Let's use multiprocessing for batch predictions
My online prediction service wants to use GBDT + LR (Practical Lessons from Predicting Clicks on Ads at Facebook) algorithm combination, It will use the leaf index of tree. But Leaves doesn't support it.
I'm using xgbModel.nativeBooster.saveModel
on spark to save the native model, then by XGEnsembleFromFile loading model to predict the validation dataset, but the results are not meet the same prediction done on spark. Here are the results predicted on leaves framework:
label: 1, pred: 0.836042
label: 1, pred: 0.836042
label: 1, pred: 0.797784
label: 1, pred: 0.934794
label: 1, pred: 0.793824
label: 1, pred: 0.797579
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.704733
label: 1, pred: 0.787566
label: 1, pred: 0.941911
label: 1, pred: 0.934794
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.839686
label: 1, pred: 0.759537
label: 1, pred: 0.813373
label: 1, pred: 0.760041
label: 1, pred: 0.793824
label: 1, pred: 0.934794
label: 1, pred: 0.759537
label: 1, pred: 0.929430
label: 1, pred: 0.945538
label: 1, pred: 0.785153
label: 1, pred: 0.959390
label: 1, pred: 0.793824
label: 1, pred: 0.779831
label: 1, pred: 0.959390
label: 1, pred: 0.749724
label: 1, pred: 0.941911
label: 1, pred: 0.798052
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.929430
label: 1, pred: 0.839686
label: 1, pred: 0.839686
label: 1, pred: 0.806166
label: 1, pred: 0.934794
label: 1, pred: 0.839686
label: 1, pred: 0.785153
label: 1, pred: 0.806166
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.759537
label: 1, pred: 0.806166
label: 1, pred: 0.768660
label: 1, pred: 0.797784
label: 1, pred: 0.931993
label: 1, pred: 0.749724
label: 1, pred: 0.824530
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.806893
label: 1, pred: 0.929430
label: 1, pred: 0.803833
label: 1, pred: 0.797148
label: 1, pred: 0.931993
label: 1, pred: 0.797579
label: 1, pred: 0.787042
label: 1, pred: 0.803833
label: 1, pred: 0.959390
label: 1, pred: 0.931993
label: 1, pred: 0.806166
label: 1, pred: 0.836042
label: 1, pred: 0.934794
label: 1, pred: 0.934794
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.779831
label: 1, pred: 0.787042
label: 1, pred: 0.785153
label: 1, pred: 0.749724
label: 1, pred: 0.749724
label: 1, pred: 0.934794
label: 1, pred: 0.929430
label: 1, pred: 0.797579
label: 1, pred: 0.945538
label: 1, pred: 0.934794
label: 1, pred: 0.959390
label: 1, pred: 0.959390
label: 1, pred: 0.787042
label: 1, pred: 0.787042
label: 1, pred: 0.931993
label: 1, pred: 0.759537
label: 1, pred: 0.941911
label: 1, pred: 0.749724
label: 1, pred: 0.850764
label: 1, pred: 0.945538
label: 1, pred: 0.803833
label: 1, pred: 0.749724
label: 1, pred: 0.797579
label: 1, pred: 0.785153
label: 1, pred: 0.941911
label: 1, pred: 0.806166
label: 0, pred: 0.767173
label: 0, pred: 0.807510
label: 0, pred: 0.797784
label: 0, pred: 0.824530
label: 0, pred: 0.839686
label: 0, pred: 0.767173
label: 0, pred: 0.839686
label: 0, pred: 0.767176
label: 0, pred: 0.797579
label: 0, pred: 0.793824
label: 0, pred: 0.772110
label: 0, pred: 0.768660
label: 0, pred: 0.759537
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.929430
label: 0, pred: 0.941911
label: 0, pred: 0.822525
label: 0, pred: 0.839686
label: 0, pred: 0.945538
label: 0, pred: 0.749724
label: 0, pred: 0.929430
label: 0, pred: 0.787042
label: 0, pred: 0.797579
label: 0, pred: 0.797784
label: 0, pred: 0.797784
label: 0, pred: 0.945538
label: 0, pred: 0.785153
label: 0, pred: 0.797784
label: 0, pred: 0.836042
label: 0, pred: 0.931993
label: 0, pred: 0.836042
label: 0, pred: 0.779831
label: 0, pred: 0.945538
label: 0, pred: 0.812733
label: 0, pred: 0.945538
label: 0, pred: 0.745542
label: 0, pred: 0.779849
label: 0, pred: 0.903047
label: 0, pred: 0.816076
label: 0, pred: 0.807510
label: 0, pred: 0.749971
label: 0, pred: 0.945538
label: 0, pred: 0.804371
label: 0, pred: 0.767173
label: 0, pred: 0.934794
label: 0, pred: 0.785153
label: 0, pred: 0.767173
label: 0, pred: 0.797784
label: 0, pred: 0.785153
label: 0, pred: 0.807510
label: 0, pred: 0.768660
label: 0, pred: 0.804371
label: 0, pred: 0.787042
label: 0, pred: 0.704733
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.836042
label: 0, pred: 0.772110
label: 0, pred: 0.855798
label: 0, pred: 0.836042
label: 0, pred: 0.784896
label: 0, pred: 0.804371
label: 0, pred: 0.813373
label: 0, pred: 0.749724
label: 0, pred: 0.903047
label: 0, pred: 0.787042
label: 0, pred: 0.839686
label: 0, pred: 0.759537
label: 0, pred: 0.797579
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.749724
label: 0, pred: 0.806166
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.787042
label: 0, pred: 0.806166
label: 0, pred: 0.903047
label: 0, pred: 0.839686
label: 0, pred: 0.768660
label: 0, pred: 0.787042
label: 0, pred: 0.745542
label: 0, pred: 0.787042
label: 0, pred: 0.802708
label: 0, pred: 0.797784
label: 0, pred: 0.839686
label: 0, pred: 0.929430
label: 0, pred: 0.803833
label: 0, pred: 0.704733
label: 0, pred: 0.704733
label: 0, pred: 0.793824
label: 0, pred: 0.793824
label: 0, pred: 0.813373
label: 0, pred: 0.836042
label: 0, pred: 0.767173
label: 0, pred: 0.803833
label: 0, pred: 0.793824
label: 0, pred: 0.818067
label: 0, pred: 0.787566
leaves does not support the sklearn xgboost model ?
I use the below python code to train one xgboost model, meet error when use the below API to load this model in go code
leaves.XGEnsembleFromFile("xg_iris.model", false)
The error :
mark@mark:~/golang $ go run predict_iris.go
Name: xgboost.gbtree
NFeatures: 4
NOutputGroups: 3
NEstimators: 100
panic: different sizes: len(a) = 30, len(b) = 90
goroutine 1 [running]:
main.main()
/home/mark/golang/predict_iris.go:44 +0x686
exit status 2
Below is the python code to train the xgboost model using the xgboost API in sklearn:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
xg_train = xgb.DMatrix(X_train, label=y_train)
xg_test = xgb.DMatrix(X_test, label=y_test)
params = {
'objective': 'multi:softmax',
'num_class': 3,
}
n_estimators = 5
#clf = xgb.train(params, xg_train, n_estimators)
clf = XGBClassifier(**params)
clf = clf.fit(X_train, y_train)
y_pred = clf.predict_proba(X_test)[:,1]
clf.save_model('xg_iris.model')
np.savetxt('xg_iris_true_predictions.txt', y_pred, delimiter='\t')
datasets.dump_svmlight_file(X_test, y_test, 'iris_test.libsvm')
This is an edge tool for ranking applications Which is implemented in lightgbm.
Really hope it can be supported! Thanks!
Hi,
I'm not sure I fully understand the output of the Predict() methods.
I have a fully trained model with 9 classes and 100 estimators. I then run:
predictions := make([]float64, 9)
err = model.Predict(values, 100, predictions)
util.SigmoidFloat64SliceInplace(predictions)
log.Infof("Prediction for %v:\n %v", values, predictions)
That yields:
Prediction for [110 0 12 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]:
[0.2276 0.1822 0.2664 0.0594 0.0682 0.9859 0.1283 0.6349 0.0706]
I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do y_pred = model.predict(values)
, it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model
I have a model trained with quantile regression in light gbm. I get an error that this is not a valid option for objective when I used my model. Is there a workaround to get it working?
This internal package makes quite extensive use calls to 'binary.Read'.
This is very slow as it makes heavy use of reflection.
One should use the binary.ByteOrder.PutUintXX and binary.ByteOrder.UintXX methods instead.
I traning and testing data with xgboost in python, then use leaves in production env.
The more infos are there:
In Python
xgb testing, The data structure that I set up with pd.DataFrame
is
[0:value1 1:v2, 2:v3, ... , n:v(n+1)]
the value1
is any value int
type. and v2, ... , v(n+1)
is float64
type. The 0
is prediction value.
This result is testing result.
And this structure:
[feature1:v2, f2:v3, ... , f(n):v(n+1)]
This result is NOT testing result.
In Golang
and I use leaves XGEnsembleFromFile->model.PredictCSR()
also the result is NOT testing result.
I have tried to find how to solve it for over 5h like add {0:0} to first features group, but for my ridiculous low English level and Math level I can't find it.
What's wrong with my testing data = =
leaves.LGEnsembleFromFile()
failed when load an objective=lambdarank model (lightGBM)
Error message: unexpected objective field: 'lambdarank'
, model:
tree
version=v2
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=24
objective=lambdarank
feature_names=t quality freshness navboost pctr video_type lctr_1_3 lctr_4_7 lctr_8_30 sctr_1_3 sctr_4_7 sctr_8_30 ctr_1_3 ctr_4_7 ctr_8_30 loglclick_1_3 logclick_1_3 logsclick_1_3 lctr_ins ctr_ins sctr_ins loglclick_ins logclick_ins logsclick_ins instant_navboost
feature_infos=[0:1.3200000524520874] [3.3299998904112726e-05:1] [0.36787900328636169:1] [0.36790001392364502:0.9999966025352478] [0:0.98189848661422729] [1:200] [0:10.87989330291748] [0:9.3969650268554688] [0:11.486390113830566] [0:4.2822332382202148] [0:3.7750816345214844] [0:2.8636219501495361] [0:8.7641057968139648] [0:7.1885638236999512] [0:10.401005744934082] [0:6.4371075630187988] [0:6.4220900535583496] [0:5.9216046333312988] [0:5.5910482406616211] [0:3.3260509967803955] [0:1.3753839731216431] [0:5.3813371658325195] [0:5.3396997451782227] [0:4.9291071891784668] [0.36790001392364502:0.9999929666519165]
tree_sizes=1308 911 993 1073 1235 1154 992 1316 1234 1151 997 1163 1234 1077 1090 1244 1237 1152 1400 1228 1246 1310 1240 1072 1327 1068 1242 1081 1312 1082 1162 1000 1330 1310 1408 1253 1165 1328 1082 1004 1172 1328 1161 1081 1151 1323 1325 1321 1410 1166 1073 1403 996 1242 991 1336 1232 1250 995 1309
LightGBM tree can be like next:
Tree=2236
num_leaves=1
num_cat=0
split_feature=
split_gain=
threshold=
decision_type=
left_child=
right_child=
leaf_value=0
leaf_count=
internal_value=
internal_count=
shrinkage=1.03754e-322
But leaves treats num_leaves < 2
as input error.
TODO:
for such as loadTransformation
and nEstimators
, can any one tell me more details?
I got an error when I tried to load binary model of xgboost:gbtree, the error message as follow:
panic: unexpected EOF
goroutine 1 [running]:
main.main()
/Users/zhangxiatian/tuotuo/workspace/go/predictor/main.go:13 +0x1ba
Process finished with exit code 2
======================
the code as follow:
package main
import (
"fmt"
"github.com/dmitryikh/leaves"
)
func main() {
// 1. Read model
model, err := leaves.XGEnsembleFromFile("/Users/zhangxiatian/tuotuo/recsys/engin/model/model")
if err != nil {
panic(err)
}
}
I hope you will implement support for catboost Yandex library
After changing go version to 1.11 limited in "go.mod", it passed all unit test
my go version on my machine: go1.11.2 darwin/amd64
Specifically for rests purpose it would be useful to have a common interface for LGEnsemble
& XGEnsemble
.
lightgbm changed their model encoding to v3 in v2.3.0
to support weights in the model. Would like to see leaves
support this new format.
code:
import (
"fmt"
"github.com/dmitryikh/leaves"
)
func main() {
// 1. Read model
useTransformation := true
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
if err != nil {
fmt.Println(err)
panic(err)
}
}
go build lgbuse.go
, and then report an error:
only version=v2 is supported
panic: only version=v2 is supported
goroutine 1 [running]:
main.main()
lgbuse.go:14 +0x246
exit status 2
The line of code that cause probles:
model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
How can I solve this problem?
Currently, leaves uses a standalone matrix implementation that's almost completely binary compatible with gonum/mat.Dense implementation.
It would be very beneficial to just use the one from gonum/mat. (Or, at least, implement the mat.Matrix interface.)
Here is discrepancy between original predictions and leaves predictions for XGBoost ensemble for Higgs problem (45th row of test data).
One should investigate: is it a bug or because of float tolerances on decision thresholds?
I use leaves to load my lightgbm model and predict instances, the results are always 0.00, while use python to predict, the result is not. Any one meet the problem ?
The type of feature including numerical and categorical
testscripts/doctest.py
is utility to automate examples checking.
It should contain small documentation and examples inside of it.
We use spark to generate libsvm file, then use python sklearn to load it and xgboost to train and save model, finaly use leaves load it and predict.
the predict result was total incorrect between python demo and go.
just want to ask if leve not support or we use leaves wrong.
the python code like:
my_workpath = 'D:\\project\\py\\train_demo\\'
X_train, y_train = load_svmlight_file(my_workpath + 'train')
X_test, y_test = load_svmlight_file(my_workpath + 'validation')
bst = XGBClassifier()
bst.fit(X_train, y_train)
bst.save_model(my_workpath + "train_model")
train_preds = [x[1] for x in bst.predict_proba(X_train)]
test_preds = [x[1] for x in bst.predict_proba(X_test)]
the go code like:
model, e := leaves.XGEnsembleFromFile(model_path,true)
if e != nil{
println(e)
}
if model.Transformation().Type() != transformation.Logistic {
log.Fatalf("expected TransforType = Logistic (got %s)", model.Transformation().Name())
}
csr, err := mat.CSRMatFromLibsvmFile(validate_path, 0, true)
if err != nil{
println(err)
}
predictions := make([]float64, csr.Rows()*model.NOutputGroups())
e = model.PredictCSR(csr.RowHeaders, csr.ColIndexes, csr.Values, predictions, 50, 5)
if e != nil{
println(e)
}
fmt.Printf("Prediction for %v\n", predictions)
It seem that only thing one should do is to support average_output
parameter from model file.
i build xgb model by python, and then run the results of test dataset.
but when i use leaves to load model and predict, the results is inconsistent with python results.
and i test lgb model with the same dataset, the results are consistent.
Hi, As the title said, May leaves would support this type of model, thank for your teams' coding, haha
Does for the LightGBM have support for missing values and where can I see examples about it (read txt files with missing values)?
Currently leaves outputs prediction as a raw score. Client code should transform it to probabilities (logistic), lambdarank and so.
Let's introduce this ability to leaves.
Short list for XGBoost:
Short list for LightGBM:
todo
The source code of DART
class in LightGBM seems like nothing we should do in leaves and this class of models is already supported..
At least we should add test on this case..
Something has changed in XGBoost model's binary format. The highest versions I've managed to make leaves work with is 1.0. Starting from 1.1+ I keep getting "panic: unexpected EOF". Is support for newer versions planned?
Moreover, they've started to save models in JSON format and it looks like they're going to deprecate binaries altogether.
Just a general question:
How complicated would it be to provide the model training part as well?
Are there any plans for it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.