Git Product home page Git Product logo

auto-sklearn's Introduction

auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

Find the documentation here

Automated Machine Learning in four lines of code

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Relevant publications

Efficient and Robust Automated Machine Learning
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter
Advances in Neural Information Processing Systems 28 (2015)
http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

Auto-Sklearn 2.0: The Next Generation
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter
arXiv:2007.04074 [cs.LG], 2020 https://arxiv.org/abs/2007.04074

auto-sklearn's People

Contributors

aaronkl avatar ahn1340 avatar anatolfernandez avatar axsapronov avatar borda avatar caoyi0905 avatar charlesfu4 avatar engelen avatar felixleungsc avatar franchuterivera avatar g329 avatar gui-miotto avatar herilalaina avatar hmendozap avatar iver56 avatar jaidevd avatar kakawhq avatar keggensperger avatar lgro avatar mabryj2 avatar mblum avatar mfeurer avatar mlindauer avatar motorrat avatar rabsr avatar rcalsaverini avatar stokasto avatar timothyjlaurent avatar tmielika avatar vicentealencar avatar

auto-sklearn's Issues

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

What is this?

Intel® Extension for Scikit-learn provides drop-in replacement patching functionality for a seamless way to speed up Scikit-learn application.

Our results

I used automlbenchmarks on large datasets to compare accuracy of autosklearn with patching and without.

datasetName library acc auc balacc logloss
Airlines autosklearn w patching 0.667087 0.720931 0.654484 0.664864
Albert autosklearn w patching 0.677265 0.738089 0.677265 0.642248
Covertype autosklearn w patching 0.918092 0.835118 0.214061
Airlines autosklearn w/o pathcing 0.654552 0.696719 0.663029 0.686289
Albert autosklearn w/o patching 0.652432 0.706782 0.652432 0.691148
Covertype autosklearn w/o patching 0.908678 0.829109 0.252917

The table below represent the difference of autosklearn with patching and w/o patching

datasetName diff accuracy diff auc diff balacc diff logloss
Airlines 0.012535 0.024212 -0.008545 -0.02132
Albert 0.024833 0.031307 0.024833 -0.0489
Covertype 0.009414 0 0.006009 -0.03886

Accuracy was improved because the number of trained models was increased. The full list of algorithms, that can be accelerated with intel extension for scikit-learn can be founded here.

datasetName Airlines Albert Covertype
total number of models w patching 154 180 118
total number of models w/o patching 130 142 110

How to reproduce our results

To add the intel extension for scikit-learn to the benchmark, you just need to add 2 lines at the beginning of the autosklearn exec file:

from sklearnex import patch_sklearn
patch_sklearn()

and add scikit-learn-intelex to the requirements.

I also change constraints for a more honest comparison:

test:
  folds: 2
  max_runtime_seconds: 1800
  cores: 72

And remove environment settings from autosklearn exec file.

os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

All measurements were done on AWS c5.18xlarge instance (Intel Xeon Platinum with 36 cores)

Some benefits of Intel® Extension for Scikit-learn

  • Library uses all capabilities of the hardware, which allows you to get a significant performance boost for the classic machine learning algorithms. Check their patching section and medium articles for more details.

  • All optimizations can be easily integrated into scikit-learn application by changing one line of code. Check their get started section for more details.

I also think, that Intel® Extension for Scikit-learn can help to solve these problems: automl#445, automl#923, automl#1153

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.