Git Product home page Git Product logo

paypal / autosklearn-zeroconf Goto Github PK

View Code? Open in Web Editor NEW
172.0 32.0 39.0 6.94 MB

autosklearn-zeroconf is a fully automated binary classifier. It is based on the AutoML challenge winner auto-sklearn. Give it a dataset with known outcomes (labels) and it returns a list of predicted outcomes for your new data. It even estimates the precision for you! The engine is tuning massively parallel ensemble of machine learning pipelines for best precision/recall.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
machine-learning auto-sklearn binary-classification

autosklearn-zeroconf's Introduction

What is autosklearn-zeroconf

The autosklearn-zeroconf file takes a dataframe of any size and trains auto-sklearn binary classifier ensemble. No configuration is needed as the name suggests. Auto-sklearn is the recent AutoML Challenge winner more @microsoft.com.

As a result of using automl-zeroconf running auto-sklearn becomes a "fire and forget" type of operation. It greatly increases the utility and decreases turnaround time for experiments.

The main value proposition is that a data analyst or a data savvy business user can quickly run the iterations on the data (actual sources and feature design) side and on the ML side not a bit has to be changed. So it's a great tool for people not doing hardcore data science full time. Up to 90% of (marketing) data analysts may fall into this target group currently.

How Does It Work

To keep the training time reasonable autosklearn-zeroconf samples the data and tests all the models from autosklearn library on it once. The results of the test (duration) is used to calculate the per_run_time_limit, time_left_for_this_task and number of seeds parameters for autosklearn. The code also converts the pandas dataframe into a form that autosklearn can handle (categorical and float datatypes).

Algoritms included

bernoulli_nb, extra_trees, gaussian_nb, adaboost, gradient_boosting, k_nearest_neighbors, lda, liblinear_svc, multinomial_nb, passive_aggressive, random_forest, sgd

plus samplers, scalers, imputers (14 feature processing methods, and 3 data preprocessing methods, giving rise to a structured hypothesis space with 100+ hyperparameters)

Running autosklearn-zeroconf

To run autosklearn-zeroconf start

python bin/zeroconf.py -d your_dataframe.h5
from command line. The script was tested on Ubuntu and RedHat. It won't work on any WindowsOS because auto-sklearn doesn't support Windows.

Data Format

The code uses a pandas dataframe format to manage the data. It is stored in the HDF5 .h5 file for convenience. (Python module "tables")

Example

As an example you can run autosklearn-zeroconf on a "Census Income" dataset https://archive.ics.uci.edu/ml/datasets/Adult.

python ./bin/zeroconf.py -d ./data/Adult.h5

And then to evaluate the prediction stored in zerconf-result.csv against the test dataset file adult.test.withid

python ./bin/evaluate-dataset-Adult.py

Installation

The script itself needs no installation, just copy it with the rest of the files in your working directory. Alternatively you could use git clone

sudo apt-get update && sudo apt-get install git && git clone https://github.com/paypal/autosklearn-zeroconf.git

Happy path installation on Ubuntu 18.04LTS

sudo apt-get update && sudo apt-get install git gcc build-essential swig python-pip virtualenv python3-dev
git clone https://github.com/paypal/autosklearn-zeroconf.git
pip install virtualenv
virtualenv zeroconf -p /usr/bin/python3.6
source zeroconf/bin/activate
curl https://raw.githubusercontent.com/paypal/autosklearn-zeroconf/master/requirements.txt | xargs -n 1 -L 1 pip install
git clone https://github.com/paypal/autosklearn-zeroconf.git
cd autosklearn-zeroconf/ && python ./bin/zeroconf.py -d ./data/Adult.h5 2>/dev/null

License

autosklearn-zeroconf is licensed under the BSD 3-Clause License (Revised)

Example of the output

python zeroconf.py -d ./data/Adult.h5 2>/dev/null | grep [ZEROCONF]

2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO - Program Call Parameter (Arguments and Parameter File Values):
2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO -    basedir: /home/ulrich/PycharmProjects/autosklearn-zeroconf
2017-10-11 10:52:15,893 - [ZEROCONF] - zeroconf.py - INFO -    data_file: /home/ulrich/PycharmProjects/autosklearn-zeroconf/data/Adult.h5
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    id_field: cust_id
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    max_classifier_time_budget: 1200
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    max_sample_size: 100000
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    memory_limit: 15000
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    parameter_file: /home/ulrich/PycharmProjects/autosklearn-zeroconf/parameter/default.yml
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    proc: zeroconf.py
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    resultfile: /home/ulrich/PycharmProjects/autosklearn-zeroconf/data/zeroconf-result.csv
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    runid: 20171011105215
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    runtype: Fresh Run Start
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    target_field: category
2017-10-11 10:52:15,894 - [ZEROCONF] - zeroconf.py - INFO -    workdir: /home/ulrich/PycharmProjects/autosklearn-zeroconf/work/20171011105215
2017-10-11 10:52:15,944 - [ZEROCONF] - zeroconf.py - INFO - Read dataset from the store
2017-10-11 10:52:15,945 - [ZEROCONF] - zeroconf.py - INFO - Values of y [  0.   1.  nan]
2017-10-11 10:52:15,945 - [ZEROCONF] - zeroconf.py - INFO - We need to protect NAs in y from the prediction dataset so we convert them to -1
2017-10-11 10:52:15,946 - [ZEROCONF] - zeroconf.py - INFO - New values of y [ 0.  1. -1.]
2017-10-11 10:52:15,946 - [ZEROCONF] - zeroconf.py - INFO - Filling missing values in X with the most frequent values
2017-10-11 10:52:16,043 - [ZEROCONF] - zeroconf.py - INFO - Factorizing the X
2017-10-11 10:52:16,176 - [ZEROCONF] - x_y_dataframe_split - INFO - Dataframe split into X and y
2017-10-11 10:52:16,178 - [ZEROCONF] - zeroconf.py - INFO - Preparing a sample to measure approx classifier run time and select features
2017-10-11 10:52:16,191 - [ZEROCONF] - zeroconf.py - INFO - train size:21815
2017-10-11 10:52:16,191 - [ZEROCONF] - zeroconf.py - INFO - test size:10746
2017-10-11 10:52:16,192 - [ZEROCONF] - zeroconf.py - INFO - Reserved 33% of the training dataset for validation (upto 33k rows)
2017-10-11 10:52:16,209 - [ZEROCONF] - max_estimators_fit_duration - INFO - Constructing preprocessor pipeline and transforming sample data
2017-10-11 10:52:18,712 - [ZEROCONF] - max_estimators_fit_duration - INFO - Running estimators on the sample
2017-10-11 10:52:18,729 - [ZEROCONF] - zeroconf.py - INFO - adaboost starting
2017-10-11 10:52:18,734 - [ZEROCONF] - zeroconf.py - INFO - bernoulli_nb starting
2017-10-11 10:52:18,761 - [ZEROCONF] - zeroconf.py - INFO - extra_trees starting
2017-10-11 10:52:18,769 - [ZEROCONF] - zeroconf.py - INFO - decision_tree starting
2017-10-11 10:52:18,780 - [ZEROCONF] - zeroconf.py - INFO - gaussian_nb starting
2017-10-11 10:52:18,800 - [ZEROCONF] - zeroconf.py - INFO - bernoulli_nb training time: 0.06455278396606445
2017-10-11 10:52:18,802 - [ZEROCONF] - zeroconf.py - INFO - gradient_boosting starting
2017-10-11 10:52:18,808 - [ZEROCONF] - zeroconf.py - INFO - k_nearest_neighbors starting
2017-10-11 10:52:18,809 - [ZEROCONF] - zeroconf.py - INFO - decision_tree training time: 0.03273773193359375
2017-10-11 10:52:18,826 - [ZEROCONF] - zeroconf.py - INFO - lda starting
2017-10-11 10:52:18,845 - [ZEROCONF] - zeroconf.py - INFO - liblinear_svc starting
2017-10-11 10:52:18,867 - [ZEROCONF] - zeroconf.py - INFO - gaussian_nb training time: 0.08569979667663574
2017-10-11 10:52:18,882 - [ZEROCONF] - zeroconf.py - INFO - multinomial_nb starting
2017-10-11 10:52:18,905 - [ZEROCONF] - zeroconf.py - INFO - passive_aggressive starting
2017-10-11 10:52:18,943 - [ZEROCONF] - zeroconf.py - INFO - random_forest starting
2017-10-11 10:52:18,971 - [ZEROCONF] - zeroconf.py - INFO - sgd starting
2017-10-11 10:52:19,012 - [ZEROCONF] - zeroconf.py - INFO - lda training time: 0.17656564712524414
2017-10-11 10:52:19,023 - [ZEROCONF] - zeroconf.py - INFO - multinomial_nb training time: 0.13777780532836914
2017-10-11 10:52:19,124 - [ZEROCONF] - zeroconf.py - INFO - liblinear_svc training time: 0.27405595779418945
2017-10-11 10:52:19,416 - [ZEROCONF] - zeroconf.py - INFO - passive_aggressive training time: 0.508676290512085
2017-10-11 10:52:19,473 - [ZEROCONF] - zeroconf.py - INFO - sgd training time: 0.49777913093566895
2017-10-11 10:52:20,471 - [ZEROCONF] - zeroconf.py - INFO - adaboost training time: 1.7392246723175049
2017-10-11 10:52:20,625 - [ZEROCONF] - zeroconf.py - INFO - k_nearest_neighbors training time: 1.8141863346099854
2017-10-11 10:52:22,258 - [ZEROCONF] - zeroconf.py - INFO - extra_trees training time: 3.4934401512145996
2017-10-11 10:52:22,696 - [ZEROCONF] - zeroconf.py - INFO - random_forest training time: 3.7496204376220703
2017-10-11 10:52:24,215 - [ZEROCONF] - zeroconf.py - INFO - gradient_boosting training time: 5.41023063659668
2017-10-11 10:52:24,230 - [ZEROCONF] - max_estimators_fit_duration - INFO - Test classifier fit completed
2017-10-11 10:52:24,239 - [ZEROCONF] - zeroconf.py - INFO - per_run_time_limit=5
2017-10-11 10:52:24,239 - [ZEROCONF] - zeroconf.py - INFO - Process pool size=2
2017-10-11 10:52:24,240 - [ZEROCONF] - zeroconf.py - INFO - Starting autosklearn classifiers fiting on a 67% sample up to 67k rows
2017-10-11 10:52:24,252 - [ZEROCONF] - train_multicore - INFO - Max time allowance for a model 1 minute(s)
2017-10-11 10:52:24,252 - [ZEROCONF] - train_multicore - INFO - Overal run time is about 10 minute(s)
2017-10-11 10:52:24,255 - [ZEROCONF] - train_multicore - INFO - Multicore process 2 started
2017-10-11 10:52:24,258 - [ZEROCONF] - train_multicore - INFO - Multicore process 3 started
2017-10-11 10:52:24,276 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Start AutoSklearnClassifier seed=2
2017-10-11 10:52:24,278 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Start AutoSklearnClassifier seed=3
2017-10-11 10:52:24,295 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Done AutoSklearnClassifier seed=3
2017-10-11 10:52:24,297 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Done AutoSklearnClassifier seed=2
2017-10-11 10:52:26,299 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Starting seed=2
2017-10-11 10:52:27,298 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - Starting seed=3
2017-10-11 10:56:30,949 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - ####### Finished seed=2
2017-10-11 10:56:31,600 - [ZEROCONF] - spawn_autosklearn_classifier - INFO - ####### Finished seed=3
2017-10-11 10:56:31,614 - [ZEROCONF] - train_multicore - INFO - Multicore fit completed
2017-10-11 10:56:31,626 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Building ensemble
2017-10-11 10:56:31,626 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Done AutoSklearnClassifier - seed:1
2017-10-11 10:56:54,017 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Ensemble built - seed:1
2017-10-11 10:56:54,017 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - Show models - seed:1
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - [(0.400000, SimpleClassificationPipeline({'classifier:__choice__': 'adaboost', 'one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:select_percentile_classification:percentile': 85.5410729966473, 'classifier:adaboost:n_estimators': 88, 'one_hot_encoding:minimum_fraction': 0.01805038589303469, 'rescaling:__choice__': 'minmax', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'select_percentile_classification', 'classifier:adaboost:max_depth': 1, 'classifier:adaboost:learning_rate': 0.10898092508755285, 'preprocessor:select_percentile_classification:score_func': 'chi2', 'imputation:strategy': 'most_frequent', 'classifier:adaboost:algorithm': 'SAMME.R'},
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.300000, SimpleClassificationPipeline({'classifier:__choice__': 'random_forest', 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'one_hot_encoding:use_minimum_fraction': 'True', 'classifier:random_forest:criterion': 'gini', 'classifier:random_forest:min_samples_leaf': 4, 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:min_samples_split': 16, 'classifier:random_forest:bootstrap': 'False', 'one_hot_encoding:minimum_fraction': 0.1453954841364777, 'rescaling:__choice__': 'none', 'balancing:strategy': 'none', 'preprocessor:__choice__': 'select_percentile_classification', 'preprocessor:select_percentile_classification:percentile': 96.35414862145892, 'preprocessor:select_percentile_classification:score_func': 'chi2', 'imputation:strategy': 'mean', 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:max_features': 3.342759426984195, 'classifier:random_forest:n_estimators': 100},
2017-10-11 10:56:54,596 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.200000, SimpleClassificationPipeline({'classifier:extra_trees:min_weight_fraction_leaf': 0.0, 'classifier:__choice__': 'extra_trees', 'classifier:extra_trees:n_estimators': 100, 'classifier:extra_trees:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 5, 'classifier:extra_trees:min_samples_leaf': 10, 'rescaling:__choice__': 'minmax', 'classifier:extra_trees:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'classifier:extra_trees:max_features': 4.413198608615693, 'classifier:extra_trees:criterion': 'gini', 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'classifier:extra_trees:min_samples_split': 16, 'one_hot_encoding:use_minimum_fraction': 'False', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'preprocessor:extra_trees_preproc_for_classification:max_features': 1.4824479003506632, 'imputation:strategy': 'median', 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None'},
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,597 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - (0.100000, SimpleClassificationPipeline({'classifier:extra_trees:min_weight_fraction_leaf': 0.0, 'classifier:__choice__': 'extra_trees', 'classifier:extra_trees:n_estimators': 100, 'classifier:extra_trees:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 16, 'classifier:extra_trees:min_samples_leaf': 10, 'rescaling:__choice__': 'minmax', 'classifier:extra_trees:max_depth': 'None', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'preprocessor:extra_trees_preproc_for_classification:criterion': 'gini', 'classifier:extra_trees:max_features': 4.16852017424403, 'classifier:extra_trees:criterion': 'gini', 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'classifier:extra_trees:min_samples_split': 16, 'one_hot_encoding:use_minimum_fraction': 'False', 'balancing:strategy': 'weighting', 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'preprocessor:extra_trees_preproc_for_classification:max_features': 1.5781770540350555, 'imputation:strategy': 'median', 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None'},
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - dataset_properties={
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'task': 1,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'signed': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'sparse': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multiclass': False,
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'target_type': 'classification',
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO -   'multilabel': False})),
2017-10-11 10:56:54,598 - [ZEROCONF] - zeroconf_fit_ensemble - INFO - ]
2017-10-11 10:56:54,613 - [ZEROCONF] - zeroconf.py - INFO - Validating
2017-10-11 10:56:54,613 - [ZEROCONF] - zeroconf.py - INFO - Predicting on validation set
2017-10-11 10:56:57,373 - [ZEROCONF] - zeroconf.py - INFO - ########################################################################
2017-10-11 10:56:57,374 - [ZEROCONF] - zeroconf.py - INFO - Accuracy score 84%
2017-10-11 10:56:57,374 - [ZEROCONF] - zeroconf.py - INFO - The below scores are calculated for predicting '1' category value
2017-10-11 10:56:57,379 - [ZEROCONF] - zeroconf.py - INFO - Precision: 64%, Recall: 77%, F1: 0.70
2017-10-11 10:56:57,379 - [ZEROCONF] - zeroconf.py - INFO - Confusion Matrix: https://en.wikipedia.org/wiki/Precision_and_recall
2017-10-11 10:56:57,386 - [ZEROCONF] - zeroconf.py - INFO - [7058 1100]
2017-10-11 10:56:57,386 - [ZEROCONF] - zeroconf.py - INFO - [ 603 1985]
2017-10-11 10:56:57,392 - [ZEROCONF] - zeroconf.py - INFO - Baseline 2588 positives from 10746 overall = 24.1%
2017-10-11 10:56:57,392 - [ZEROCONF] - zeroconf.py - INFO - ########################################################################
2017-10-11 10:56:57,404 - [ZEROCONF] - x_y_dataframe_split - INFO - Dataframe split into X and y
2017-10-11 10:56:57,405 - [ZEROCONF] - zeroconf.py - INFO - Re-fitting the model ensemble on full known dataset to prepare for prediciton. This can take a long time.
2017-10-11 10:58:39,836 - [ZEROCONF] - zeroconf.py - INFO - Predicting. This can take a long time for a large prediction set.
2017-10-11 10:58:45,221 - [ZEROCONF] - zeroconf.py - INFO - Prediction done
2017-10-11 10:58:45,223 - [ZEROCONF] - zeroconf.py - INFO - Exporting the data
2017-10-11 10:58:45,267 - [ZEROCONF] - zeroconf.py - INFO - ##### Zeroconf Script Completed! #####
2017-10-11 10:58:45,268 - [ZEROCONF] - zeroconf.py - INFO - Clean up / Delete work directory: /home/ulrich/PycharmProjects/autosklearn-zeroconf/work/20171011105215

Process finished with exit code 0
python evaluate-dataset-Adult.py 
[ZEROCONF]  # 00:37:43 #
[ZEROCONF] ######################################################################## # 00:37:43 #
[ZEROCONF] Accuracy score 85% # 00:37:43 #
[ZEROCONF] The below scores are calculated for predicting '1' category value # 00:37:43 #
[ZEROCONF] Precision: 65%, Recall: 78%, F1: 0.71 # 00:37:43 #
[ZEROCONF] Confusion Matrix: https://en.wikipedia.org/wiki/Precision_and_recall # 00:37:43 #
[ZEROCONF] [[10835  1600] # 00:37:43 #
[ZEROCONF]  [  860  2986]] # 00:37:43 #
[ZEROCONF] Baseline 3846 positives from 16281 overall = 23.6% # 00:37:43 #
[ZEROCONF] ######################################################################## # 00:37:43 #
[ZEROCONF]  # 00:37:43 #

Workarounds

these are not related to the autosklearn-zeroconf or auto-sklearn but rather general issues depending on your python and OS installation

xgboost issues

complains about ELF header

pip uninstall xgboost; pip install --no-cache-dir -v xgboost==0.4a30

can not find libraries

conda install libgcc # for xgboost

alternatively search for them with

sudo find / -name libgomp.so.1
/usr/lib/x86_64-linux-gnu/libgomp.so.1

and explicitly add them to the libraries path

export LD_PRELOAD="/usr/lib/x86_64-linux-gnu/libstdc++.so.6":"/usr/lib/x86_64-linux-gnu/libgomp.so.1"; python zeroconf.py Titanic.h5 2>/dev/null|grep ZEROCONF

Also see automl/auto-sklearn#247

Install auto-sklearn

# A compiler (gcc) is needed to compile a few things the from auto-sklearn requirements.txt
# Chose just the line for your Linux flavor below

# On Ubuntu
sudo apt-get install gcc build-essential swig

# On CentOS 7-1611 http://www.osboxes.org/centos/ https://drive.google.com/file/d/0B_HAFnYs6Ur-bl8wUWZfcHVpMm8/view?usp=sharing
sudo yum -y update 
sudo reboot
sudo yum install epel-release python34 python34-devel python34-setuptools
sudo yum -y groupinstall 'Development Tools'

# auto-sklearn requires swig 3.0 
wget downloads.sourceforge.net/project/swig/swig/swig-3.0.12/swig-3.0.12.tar.gz -O swig-3.0.12.tar.gz
tar xf swig-3.0.12.tar.gz 
cd swig-3.0.12 
./configure --without-pcre
make
sudo make install
cd ..

sudo easy_install-3.4 pip
# if you want to use virtual environments
sudo pip3 install virtualenv
virtualenv zeroconf -p /usr/bin/python3.4
source zeroconf/bin/activate

curl https://raw.githubusercontent.com/paypal/autosklearn-zeroconf/master/requirements.txt | xargs -n 1 -L 1 pip install

Contributors

Egor Kobylkin, Ulrich Arndt

autosklearn-zeroconf's People

Contributors

alejandrohall avatar d2k avatar duaneobrien avatar motorrat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autosklearn-zeroconf's Issues

Graph in the README should be a scatter plot.

Really nitpicky remark, but it doesn't make sense to interpolate eval results between two different models, so a line plot is not appropriate (the wild drops/increases in loss make it hard to read as well). A scatter plot would convey the idea better I think!

Add csv as a source data format

CSV is a bit tricky because it does not have data type metadata. So in order to take that in we will have to guesstimate that or get this metadata in some other way. The default way would be to as the user to specify this information but then it won't be "zeroconf" anymore.
One way of approaching this would be https://github.com/bashalex/datapot

SyntaxError: invalid syntax in d2khelper.py

freshly installed CentOS7

 python3 ./bin/zeroconf.py -d ./data/Adult.h5

Traceback (most recent call last):
  File "./bin/zeroconf.py", line 22, in <module>
    import d2khelper as d2k
  File "/home/username/autosklearn-zeroconf/bin/d2khelper.py", line 48
    return {**parameter, **param}
             ^
SyntaxError: invalid syntax

@d2k do you you think you could look into this?

DeprecationWarning sklearn.cross_validation.train_test_split

from sklearn.cross_validation import train_test_split

lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.