Git Product home page Git Product logo

treegrad's Introduction

TreeGrad

PyPI version

TreeGrad implements a naive approach to converting a Gradient Boosted Tree Model to an Online trainable model. It does this by creating differentiable tree models which can be learned via auto-differentiable frameworks. TreeGrad is in essence an implementation of Kontschieder, Peter, et al. "Deep neural decision forests." with extensions.

To install

python setup.py install

or alternatively from pypi

pip install treegrad

Run tests:

python -m nose2
@inproceedings{siu2019transferring,
  title={Transferring Tree Ensembles to Neural Networks},
  author={Siu, Chapman},
  booktitle={International Conference on Neural Information Processing},
  pages={471--480},
  year={2019},
  organization={Springer}
}

Link: https://arxiv.org/abs/1904.11132

Usage

from sklearn.
import treegrad as tgd

mod = tgd.TGDClassifier(num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, autograd_config={'refit_splits':False})
mod.fit(X, y)
mod.partial_fit(X, y)

Requirments

The requirements for this package are:

  • lightgbm
  • scikit-learn
  • autograd

Future plans:

  • Add implementation for Neural Architecture search for decision boundary splits (requires a bit of clean up - TBA)
    • Implementation can be done quite trivially using objects residing in tree_utils.py - Challenge is getting this working in a sane manner with scikit-learn interface.
  • GPU enabled auto differentiation framework - see notebooks/ for progress off Colab for Tensorflow 2.0 port
  • support xgboost/lightgbm additional features such as monotone constraints
  • Support RegressorMixin

Results

When decision splits are reset and subsequently re-learned, TreeGrad can be competitive in performance with popular implementations (albeit an order of magnitude slower). Below is a table showing accuracy on test dataset on UCI benchmark datasets for Boosted Ensemble models (100 trees)

Dataset TreeGrad LightGBM Scikit-Learn (Gradient Boosting Classifier)
adult 0.860 0.873 0.874
covtype 0.832 0.835 0.826
dna 0.950 0.949 0.946
glass 0.766 0.813 0.719
mandelon 0.882 0.881 0.866
soybean 0.936 0.936 0.917
yeast 0.591 0.573 0.542

Implementation

To understand the implementation of TreeGrad, we interpret a decision tree algorithm to be a three layer neural network, where the layers are as follows:

  1. Node layer, which determines the decision boundaries
  2. Routing layer, which determines which nodes are used to route to the final leaf nodes
  3. Leaf layer, the layer which determines the final predictions

In the node layer, the decision boundaries can be interpreted as axis-parallel decision boundaries from your typical Linear Classifier; i.e. a fully connected dense layer

The routing layer requires a binary routing matrix to which essentially the global product routing is applied

The leaf layer is your typical fully connected dense layer.

This approach is the same as the one taken by Kontschieder, Peter, et al. "Deep neural decision forests."

treegrad's People

Contributors

8bit-pixies avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

treegrad's Issues

Monotonicity

Awesome package, really looking forward to this function as it will allow me to use it in my packages.

lightgbm additional features such as monotone constraints

Enable GPU acceleration

Candidate approaches:

We might just wait for jax to be pip installable and support windows first; or have a wrapper that imports jax where available otherwise defaults to autograd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.