kingfengji / mgbdt Goto Github PK

This is the official clone for the implementation of the NIPS18 paper Multi-Layered Gradient Boosting Decision Trees (mGBDT) .

Python 99.46% Shell 0.54%

gbdt gradient-boosting-decision-trees representation-learning target-propagation mgbdt

mgbdt's Introduction

Multi-Layered Gradient Boosting Decision Trees

This is the official clone for the implementation of mGBDT.

Package Official Website: http://lamda.nju.edu.cn/code_mGBDT.ashx

This package is provided "AS IS" and free for academic usage. You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou ([email protected]).

Description: A python implementation of mGBDT proposed in [1]. A demo implementation of mGBDT library as well as some demo client scripts to demostrate how to use the code. The implementation is flexible enough for modifying the model or fit your own datasets.

Reference: [1] J. Feng, Y. Yu, and Z.-H. Zhou. Multi-Layered Gradient Boosting Decision Trees. In:Advances in Neural Information Processing Systems 31 (NIPS'18), Montreal, Canada, 2018.

ATTN: This package was developed and maintained by Mr.Ji Feng(http://lamda.nju.edu.cn/fengj/) .For any problem concerning the codes, please feel free to contact Mr.Feng.（[email protected]) or open some issues here.

Environments

The code is developed under Python 3.5, so create a Python 3.5 environment using anaconda at first

conda create -n mgbdt python=3.5 anaconda

Install the dependent packages

source activate mgbdt
conda install pytorch=0.1.12 cuda80 -c pytorch
pip install -r requirements.txt

Demo Code

from sklearn import datasets
from sklearn.model_selection import train_test_split

# For using the mgbdt library, you have to include the library directory into your python path.
# If you are in this repository's root directory, you can do it by using the following lines
import sys
sys.path.insert(0, "lib")

from mgbdt import MGBDT, MultiXGBModel

# make a sythetic circle dataset using sklearn
n_samples = 15000
x_all, y_all = datasets.make_circles(n_samples=n_samples, factor=.5, noise=.04, random_state=0)
x_train, x_test, y_train, y_test = train_test_split(x_all, y_all, test_size=0.3, random_state=0, stratify=y_all)

# Create a multi-layerd GBDTs
net = MGBDT(loss="CrossEntropyLoss", target_lr=1.0, epsilon=0.1)

# add several target-propogation layers
# F, G represent the forward mapping and inverse mapping (in this paper, we use gradient boosting decision tree)
net.add_layer("tp_layer",
    F=MultiXGBModel(input_size=2, output_size=5, learning_rate=0.1, max_depth=5, num_boost_round=5),
    G=None)
net.add_layer("tp_layer",
    F=MultiXGBModel(input_size=5, output_size=3, learning_rate=0.1, max_depth=5, num_boost_round=5),
    G=MultiXGBModel(input_size=3, output_size=5, learning_rate=0.1, max_depth=5, num_boost_round=5))
net.add_layer("tp_layer",
    F=MultiXGBModel(input_size=3, output_size=2, learning_rate=0.1, max_depth=5, num_boost_round=5),
    G=MultiXGBModel(input_size=2, output_size=3, learning_rate=0.1, max_depth=5, num_boost_round=5))

# init the forward mapping
net.init(x_train, n_rounds=5)

# fit the dataset
net.fit(x_train, y_train, n_epochs=50, eval_sets=[(x_test, y_test)], eval_metric="accuracy")

# prediction
y_pred = net.forward(x_test)

# get the hidden outputs
# hiddens[0] represent the input data
# hiddens[1] represent the output of the first layer
# hiddens[2] represent the output of the second layer
# hiddens[3] represent the output of the final layer (same as y_pred)
hiddens = net.get_hiddens(x_test)

Expriments

circle dataset

By running the following scripts

It will train a multi-layered GBDTs with structure (input - 5 - 3 - output) on the sythetic circle dataset
The visualization of the input (which is 2D) will be saved in outputs/circle/input.jpg (as show below)
The visualization of the second layer's output (which is 3D) will be saved in outputs/circle/pred2.jpg (as show below)

python exp/circle.py

Input	Transformed

scurve dataset

By running the following scripts

It will train an autoencoder using multi-layered GBDTs with structure (input - 5 - output) on the sythetic scurve dataset
The visualization of the input (which is 3D) will be saved in outputs/circle/input.jpg (as show below)
The visualization of the resonstructed result (which is 3D) will be saved in outputs/circle/pred2.jpg (as show below)

python exp/scurve.py

Input	Reconstructed

The visualization of the encoding will also be saved, since the 5D encodings are impossible to visualize directly, here we visualize every pairs of the 5 dimentions in 2D space.
The visualization of the $i'th and $j'th dimension will be saved in outputs/scurve/pred1.$i_$j.jpg (as show below)

Dimension 1 and 2	Dimension 1 and 5

UCI Adult

At first, you need to download the dataset by running the following command:

cd dataset/uci_adult
sh get_data.sh

Then, by running the following scripts

It will train a multi-layered GBDTs with structure (input - 128 - 128 - output)
the accuracy will be logged for each epochs

python exp/uci_adult.py

UCI Yeast

By running the following scripts

It will train a multi-layered GBDTs with structure (input - 16 - 16 - output)
10-fold cross-validation is used
the accuracy will be logged for each epochs and each folds

python exp/uci_yeast.py

Happy hacking.

mgbdt's People

Contributors

Stargazers

Watchers

mgbdt's Issues

Can we use this for binary classification ?

If it’s possible, I want to determine the auc score. Auc_score(y,classifier.forward(y_test))

Performance of your model on regression tasks

Description

@kingfengji Thanks for making the code available. I believe that multi-layered decision trees is a very elegant and powerful approach! I was applying your model to the boston housing dataset but wasn't able to outperform a baseline xgboost model.

Details

To compare your approach to several alternatives, I ran a small benchmark study using the following approaches, where all models have the same hyper-parameters

baseline xgboost model (xgboost)
mGBDT with xgboost for hidden and output layer (mGBDT_XGBoost)
mGBDT with xgboost for hidden but with linear model for output layer (mGBDT_Linear)
linear model as implemented here (Linear)

I am using PyTorch's L1Loss for model training and use the MAE for evaluation, where all models are trained in serial mode. Results are as follows

In particular, I observe the following

irresepective of the hyper-parameters and number of epochs, a basline xgboost model tends to outperforms your approach
with increasing number of epochs, the runtime for an epoch increases considerably. Any idea as to why this happens?
using mGBDT_Linear,
- I wasn't able to use PyTorch's MSELoss since the loss exploded after some iterations, even after normalizing X. Should we, similar to Neural Networks, also scale y to avoid exploding gradients?
- the training loss starts at exceptionally high values, then decreases before it starts to increase again

Additional Questions

Given that you have mostly been using your approach for classification tasks, is there anything we need to change before we use it for regression tasks, except the PyTorch Loss?
Besides the loss of F, can we also track how well the target propagation is working by evaluating the reconstruction loss of G?
When using mGBDT with a linear output layer, would we expect to generally see better results compared to using xgboost for the output layer?
What is the benefit of using a linear output layer compared to a xgboost layer?
For training F and G, you are currently using the MSELoss for the xgboost models. Do you have some experience with modifying this loss?
What is the effect of the number of iterations for initializing the model before training?
What is the relationship between the number of boosting iterations (for xgboost training) and the number of epochs (for MGBDT training)?
In Section 4 of your paper you state "The experiments for this section is mainly designed to empirically examine if it is feasible to jointly train the multi-layered structure proposed by this work. That is, we make no claims that the current structure can outperform CNNs in computer vision tasks." So as a question, would that mean that your intention is not to outperform existing Deep Learning based models, say CNN, or to outperform existing GBM-models, like XGBoost, but rather to show that a Decision Tree based model can be also used for learning meaningful representations that can then be used for downstreaming tasks?
Connected to the previous question: Gradient boosting models are already very strong learners that obtain very good results in many applications. So what would be your motivation of using multiple layers of such a model? May it even happen that, based on the implicit error correction mechanism of GBM, training several of them leads to a drop in accuracy?

Code

To reproduce the results, you an use the attached notebook.

ModelComparison.zip

@kingfengji I would highly appreciate your feedback. Many thanks.

Bad accuracy for yaeast dataset

Hello, I run the yeast.py file and I found 60% accuracy mean, is it normal ?

Problem with Pop

/opt/conda/lib/python3.7/site-packages/joblib/parallel.py in (.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):

/kaggle/working/mGBDT/lib/mgbdt/model/online_xgb.py in fit_increment(self, X, y, num_boost_round, params)
13 for k, v in extra_params.items():
14 params[k] = v
---> 15 params.pop("n_estimators")
16
17 if callable(self.objective):

KeyError: 'n_estimators'

Environment

I feel that code written in python 3.5 would likely be compatible with other python 3 versions, are you sure that a build is necessary in 3.5?

Can it be used in regression?

How can we adjust it to a regression problem?

Is there a function like predict_proba ?

Can not find the uci dataset

Hi,
I wanna run the uci_year and uci_adult demo, but I can't find the get_data.sh files as ReadME said. Would you please upload it or tell me the data format so I can handle it by myself.
I find that the code uses features file, but it is not in the git too.

kingfengji / mgbdt Goto Github PK

mgbdt's Introduction

Multi-Layered Gradient Boosting Decision Trees

Environments

Demo Code

Expriments

circle dataset

scurve dataset

mgbdt's People

Contributors

Stargazers

Watchers

Forkers

mgbdt's Issues

Description

Details

Additional Questions

Code

Recommend Projects

Recommend Topics

Recommend Org