Git Product home page Git Product logo

pc_smac's Introduction

pc_smac

SMAC for automatic algorithm selection and hyperparameter optimization of Scikit-learn machine learning pipelines. Inspired and based on Auto-sklearn. This version incorporates pipeline caching to speed up optimization.

Intallation

Use python 3.6.

We recommend using an anaconda environment. Some commands on this page only work in an anconda environment. Therefore, first download and install anaconda or miniconda. Then create an environment called pcsmac with following command

conda create -n pcsmac python=3.6

Unix

Install SMAC3 with pipeline caching integration from command line. Install SWIG for the C++ random forest.

conda install swig
git clone https://github.com/jtuyls/SMAC3_4_PC.git
cd SMAC3_4_PC
cat requirements.txt | xargs -n 1 -L 1 pip install
python setup.py install

Clone repository and install requirements.

pip install -r requirements.txt

MAC OS X

For installation on Mac OS X there is a problem with installing SMAC3 as above. SMAC is dependenct on pyrfr that uses the gcc compiler for c++ while Mac OS X uses the clang compiler. Therefore we install a gcc compiler within anaconda. Furthermore, install SWIG for the C++ random forest.

Install SMAC3 with pipeline caching integration from commandline in anaconda environment. Therefore first install pyrfr separately with anaconda gcc compiler.

conda install swig
conda install gcc
CC=/Users/[username]/anaconda/bin/gcc pip install pyrfr --no-cache-dir
git clone https://github.com/jtuyls/SMAC3_4_PC.git
cd SMAC3_4_PC
cat requirements.txt | xargs -n 1 -L 1 pip install
python setup.py install

Clone repository and install requirements.

pip install -r requirements.txt

Optimization of machine learning pipeline

First go to the directory above pc_smac. Then the optimization can be started using following generic command:

python -m pc_smac.run -a=[acquistion function, STRING: e.g pc-m-pceips] 
                      -di=[double intensification enabled: INT: 1 (yes) 0 (no)] 
                      -w=[wallclock time in seconds, INT: e.g. 1800]
                      -l=[Location of the dataset: STRING: DEFAULT: if not provided, data/46_bac is used as example]
                      -m=[OPTIONAL: memory limit in Mb: INT: default=6000]
                      -r=[OPTIONAL: maximum number of runs: INT: DEFAULT=10000] 
                      -c=[OPTIONAL: cutoff for each evaluation in seconds: INT: DEFAULT=1800] 
                      -s=[OPTIONAL: stamp used for identifying the directory and names of the results: STRING: DEFAULT=stamp] 
                      -o=[OPTIONAL: output directory: STRING: DEFAULT=pc_smac/output]
                      -cd=[OPTIONAL: cache directory: STRING: DEFAULT=pc_smac/cache]
                      -ps=[OPTIONAL: pipeline space used: STRING: DEFAULT=None; Example of small space: nystroem_sampler-sgd]
                      -ifs=[OPTIONAL: intensification fold size: INT: DEFAULT=10]
                      -sn=[OPTIONAL: splitting number ! only for sigmoid and mrs random search version: INT: DEFAULT=5]
                      -rs=[OPTIONAL: random splitting enabled ! only for mrs random search version: INT: DEFAULT=0 (no)]

Example

Execute following command to test optimization with pipeline caching (and marginalization) for 1800 seconds or 100 evaluations on OpenML dataset 46.

python -m pc_smac.run -a=pc-m-pceips -di=0 -w=1800 -r=100 -c=100 -m=4000 -s=pc-m-pceips

Don't forget to clean caches afterwards because they can take quite some space!

pc_smac's People

Contributors

jtuyls avatar

Watchers

 avatar  avatar

Forkers

mfeurer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.