Light

pivovara / auto-sklearn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from automl/auto-sklearn

0.0 0.0 0.0 48.04 MB

Automated Machine Learning with scikit-learn

Home Page: https://automl.github.io/auto-sklearn

License: BSD 3-Clause "New" or "Revised" License

Python 99.60% Makefile 0.03% Shell 0.30% Dockerfile 0.07%

auto-sklearn's Introduction

auto-sklearn

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

Find the documentation here

Automated Machine Learning in four lines of code

import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)

Relevant publications

Efficient and Robust Automated Machine Learning
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter
Advances in Neural Information Processing Systems 28 (2015)
http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

Auto-Sklearn 2.0: The Next Generation
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter
arXiv:2007.04074 [cs.LG], 2020 https://arxiv.org/abs/2007.04074

auto-sklearn's People

Contributors

auto-sklearn's Issues

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

What is this?

Intel® Extension for Scikit-learn provides drop-in replacement patching functionality for a seamless way to speed up Scikit-learn application.

Our results

I used automlbenchmarks on large datasets to compare accuracy of autosklearn with patching and without.

datasetName	library	acc	auc	balacc	logloss
Airlines	autosklearn w patching	0.667087	0.720931	0.654484	0.664864
Albert	autosklearn w patching	0.677265	0.738089	0.677265	0.642248
Covertype	autosklearn w patching	0.918092		0.835118	0.214061
Airlines	autosklearn w/o pathcing	0.654552	0.696719	0.663029	0.686289
Albert	autosklearn w/o patching	0.652432	0.706782	0.652432	0.691148
Covertype	autosklearn w/o patching	0.908678		0.829109	0.252917

The table below represent the difference of autosklearn with patching and w/o patching

datasetName	diff accuracy	diff auc	diff balacc	diff logloss
Airlines	0.012535	0.024212	-0.008545	-0.02132
Albert	0.024833	0.031307	0.024833	-0.0489
Covertype	0.009414	0	0.006009	-0.03886

Accuracy was improved because the number of trained models was increased. The full list of algorithms, that can be accelerated with intel extension for scikit-learn can be founded here.

datasetName	Airlines	Albert	Covertype
total number of models w patching	154	180	118
total number of models w/o patching	130	142	110

How to reproduce our results

To add the intel extension for scikit-learn to the benchmark, you just need to add 2 lines at the beginning of the autosklearn exec file:

from sklearnex import patch_sklearn
patch_sklearn()

and add scikit-learn-intelex to the requirements.

I also change constraints for a more honest comparison:

test:
  folds: 2
  max_runtime_seconds: 1800
  cores: 72

And remove environment settings from autosklearn exec file.

os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

All measurements were done on AWS c5.18xlarge instance (Intel Xeon Platinum with 36 cores)

Some benefits of Intel® Extension for Scikit-learn

Library uses all capabilities of the hardware, which allows you to get a significant performance boost for the classic machine learning algorithms. Check their patching section and medium articles for more details.

All optimizations can be easily integrated into scikit-learn application by changing one line of code. Check their get started section for more details.

I also think, that Intel® Extension for Scikit-learn can help to solve these problems: automl#445, automl#923, automl#1153

What do you think?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.