Git Product home page Git Product logo

gentun_xgboost's Introduction

gentun: genetic algorithm for hyperparameter tuning

The purpose of this project is to provide a simple framework for hyperparameter tuning of machine learning models such as Neural Networks and Gradient Boosted Trees using a genetic algorithm. Measuring the fitness of an individual of a given population implies training the machine learning model using a particular set of parameters which define the individual's genes. This is a time consuming process, therefore, a master-workers approach is used to allow several clients (workers) perform the model fitting and cross-validation of individuals passed by a server (master). Offspring generation by reproduction and mutation is handled by the server.

"Parameter tuning is a dark art in machine learning, the optimal parameters of a model can depend on many scenarios." ~ XGBoost's Notes on Parameter Tuning

Supported models (work in progress)

  • XGBoost regressor
  • XGBoost classifier
  • Scikit-learn Multilayer Perceptron Regressor
  • Scikit-learn Multilayer Perceptron Classifier
  • Keras

Sample usage

Single machine

The genetic algorithm can be run on a single box, as shown in the following example:

import pandas as pd
from gentun import GeneticAlgorithm, Population, XgboostIndividual
# Load features and response variable from train set
data = pd.read_csv('../tests/wine-quality/winequality-white.csv', delimiter=';')
y_train = data['quality']
x_train = data.drop(['quality'], axis=1)
# Generate a random population
pop = Population(XgboostIndividual, x_train, y_train, size=100, additional_parameters={'nfold': 3})
# Run the algorithm for ten generations
ga = GeneticAlgorithm(pop)
ga.run(10)

You can also add custom individuals to the population before running the genetic algorithm if you already have an intuition of which hyperparameters work well with your model. Moreover, a whole set of individuals taken from a grid search approach could be used as the initial population. An example of how to add a customized individual is the the following one:

# Best known parameters so far
custom_genes = {
    'eta': 0.1, 'min_child_weight': 1, 'max_depth': 9,
    'gamma': 0.0, 'max_delta_step': 0, 'subsample': 1.0,
    'colsample_bytree': 0.9, 'colsample_bylevel': 1.0,
    'lambda': 1.0, 'alpha': 0.0, 'scale_pos_weight': 1.0
}
# Generate a random population and add a custom individual
pop = Population(XgboostIndividual, x_train, y_train, size=99, additional_parameters={'nfold': 3})
pop.add_individual(XgboostIndividual(x_train, y_train, genes=custom_genes, nfold=3))

Multiple boxes

You can speed up the algorithm by using several machines. One of them will act as a master, generating a population and running the genetic algorithm. Each time the master needs to evaluate an individual, it will send a request to a pool of workers, which receive the model's hyperparameters from the individual and perform model fitting using n-fold cross-validation. The more workers you use, the faster the algorithm will run.

First, you need to setup a RabbitMQ message broker server. It will handle communications between the master and all the workers via a queueing system.

$ sudo apt-get install rabbitmq-server

Start the message server and add a user with privileges to communicate the master and worker nodes. The default guest user can only be used to access RabbitMQ locally, so the first time you start it, you should add a new user and set its privileges as shown below:

$ sudo service rabbitmq-server start
$ sudo rabbitmqctl add_user <username> <password>
$ sudo rabbitmqctl set_user_tags <username> administrator
$ sudo rabbitmqctl set_permissions -p / <username> ".*" ".*" ".*"

Next, start the worker nodes. Each node has to have access to the train data. You can use as many nodes as desired as long as they have network access to the message broker server.

from gentun import GentunWorker, XgboostModel
import pandas as pd

data = pd.read_csv('../tests/wine-quality/winequality-white.csv', delimiter=';')
y = data['quality']
x = data.drop(['quality'], axis=1)

gw = GentunWorker(
    XgboostModel, x, y, host='<rabbitmq_server_ip>',
    user='<username>', password='<password>'
)
gw.work()

Finally, run the genetic algorithm but this time with a DistributedPopulation which acts as the master node sending job requests to the workers each time an individual needs to be evaluated.

from gentun import GeneticAlgorithm, DistributedPopulation, XgboostIndividual

population = DistributedPopulation(
    XgboostIndividual, size=100, additional_parameters={'nfold': 3},
    host='<rabbitmq_server_ip>', user='<username>', password='<password>'
)
# Run the algorithm for ten generations using worker nodes to evaluate individuals
ga = GeneticAlgorithm(population)
ga.run(10)

References

Genetic algorithms

XGBoost parameter tuning

Master-Workers model and RabbitMQ

gentun_xgboost's People

Contributors

gmontamat avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.