aiworx-labs / chocolate Goto Github PK

View Code? Open in Web Editor NEW

119.0 11.0 41.0 2.39 MB

A fully decentralized hyperparameter optimization framework

Home Page: http://chocolate.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 80.71% C 17.59% C++ 1.70%

optimization hyperparameter-optimization parallelism cmaes bayesian-optimization conditional-search-space

chocolate's Introduction

Chocolate

Chocolate is a completely asynchronous optimisation framework relying solely on a database to share information between workers. Chocolate uses no master process for distributing tasks. Every task is completely independent and only gets its information from a database. Chocolate is thus ideal in controlled computing environments where it is hard to maintain a master process for the duration of the optimisation.

Chocolate has been designed and optimized for hyperparameter optimization where each function evaluation takes very long to complete and is difficult to parallelize. Chocolate allows optimization over conditional search spaces either as using conditional kernels in a Bayesian optimizer or as a multi-armed bandit problem using Thompson sampling. Chocolate also handles multi-objective optimisation where multiple loss funtions are optimized simultaneously.

Chocolate provides the following sampling/searching algorithms:

Grid
Random
QuasiRandom
CMAES
MOCMAES
Bayesian

and three useful backends:

SQlite
MongoDB
Pandas Data Frame

Chocolate is licensed under the 3-Clause BSD License

Documentation

The full documentation is available at http://chocolate.readthedocs.io.

Installation

Chocolate is installed using pip, unfortunately we don't have any PyPI package yet. Here is the line you have to type

pip install git+https://github.com/AIworx-Labs/chocolate@master

Dependencies

Chocolate has various dependencies. While the optimizers depends on NumPy, SciPy and Scikit-Learn, the SQLite database connection depends on dataset and filelock and the MongoDB database connection depends on PyMongo. Some utilities depend on pandas. All but PyMongo will be installed with Chocolate.

Simple example

The following very simple example shows how to optimize a conditional search space in Chocolate. You'll note that a single point is sampled and evaluated in the script. Since the database connections are 'parallel' safe, you can run this script in concurrent processes and achieve maximum parallelism.

import chocolate as choco

def objective_function(condition, x=None, y=None):
    """An objective function returning ``1 - x`` when *condition* is 1 and 
    ``y - 6`` when *condition* is 2.
    
    Raises:
        ValueError: If condition is different than 1 or 2.
    """
    if condition == 1:
        return 1 - x
    elif condition == 2:
        return y - 6
    raise ValueError("condition must be 1 or 2, got {}.".format(condition))

# Define the conditional search space 
space = [
            {"condition": 1, "x": choco.uniform(low=1, high=10)},
            {"condition": 2, "y": choco.log(low=-2, high=2, base=10)}
        ]

# Establish a connection to a SQLite local database
conn = choco.SQLiteConnection("sqlite:///my_db.db")

# Construct the optimizer
sampler = choco.Bayes(conn, space)

# Sample the next point
token, params = sampler.next()

# Calculate the loss for the sampled point (minimized)
loss = objective_function(**params)

# Add the loss to the database
sampler.update(token, loss)

Have a look at the documentation tutorials for more examples.

chocolate's People

Contributors

Stargazers

Watchers

chocolate's Issues

Expected Improvement fails

when I call Bayes with utility_function='ei'
I get :

File "/home/khs/.local/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/home/khs/.local/lib/python2.7/site-packages/chocolate/search/bayes.py", line 102, in
res = minimize(lambda x: -self.utility(x.reshape(1, -1), gp=gp, y_max=y_max, kappa=self.kappa),
AttributeError: 'Bayes' object has no attribute 'kappa'

resume feature

Hello,

Is it possible to resume the hyperparameter training for Bayes from where it was interrupted?
Regards
Viorel

Documentation doesn't work

I tried to access documentation on this page: https://chocolate.readthedocs.io/.
It says that: This page does not exist yet.

Bayes search takes several minutes to give new sample when existing results are present

I have a database with only 10 results in it so far (MongoDB if that matters). However, running chocolate.Bayes.next now takes several minutes. This does not happen if I have fewer complete results (i.e. including the loss), even if mixed with incomplete results. Is this a known issue?

Perhaps related, perhaps not, pip install from the git repo does not build the extension chocolate.mo.hv. There's no error, the build just doesn't happen. As a result I get the warning Using Python version of hypervolume module. Expect this to be slower.

Install fails on Mac

When trying to install on a Mac using the given pip install statement "pip install git+https://github.com/AIworx-Labs/chocolate@master" the install fails and gives this output:

Collecting git+https://github.com/AIworx-Labs/chocolate@master
Cloning https://github.com/AIworx-Labs/chocolate (to revision master) to /private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-req-build-35jovqz5
Requirement already satisfied: numpy>=1.11 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (1.16.2)
Requirement already satisfied: scipy>=0.18 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (1.1.0)
Requirement already satisfied: scikit-learn>=0.18 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (0.20.1)
Requirement already satisfied: pandas>=0.19 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (0.24.1)
Requirement already satisfied: dataset>=0.8 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (1.1.2)
Requirement already satisfied: filelock>=2.0 in /anaconda3/lib/python3.7/site-packages (from chocolate==0.6) (3.0.10)
Requirement already satisfied: pytz>=2011k in /anaconda3/lib/python3.7/site-packages (from pandas>=0.19->chocolate==0.6) (2018.7)
Requirement already satisfied: python-dateutil>=2.5.0 in /anaconda3/lib/python3.7/site-packages (from pandas>=0.19->chocolate==0.6) (2.7.5)
Requirement already satisfied: six>=1.11.0 in /anaconda3/lib/python3.7/site-packages (from dataset>=0.8->chocolate==0.6) (1.12.0)
Requirement already satisfied: sqlalchemy>=1.1.2 in /anaconda3/lib/python3.7/site-packages (from dataset>=0.8->chocolate==0.6) (1.2.15)
Requirement already satisfied: alembic>=0.6.2 in /anaconda3/lib/python3.7/site-packages (from dataset>=0.8->chocolate==0.6) (1.0.7)
Requirement already satisfied: Mako in /anaconda3/lib/python3.7/site-packages (from alembic>=0.6.2->dataset>=0.8->chocolate==0.6) (1.0.7)
Requirement already satisfied: python-editor>=0.3 in /anaconda3/lib/python3.7/site-packages (from alembic>=0.6.2->dataset>=0.8->chocolate==0.6) (1.0.4)
Requirement already satisfied: MarkupSafe>=0.9.2 in /anaconda3/lib/python3.7/site-packages (from Mako->alembic>=0.6.2->dataset>=0.8->chocolate==0.6) (1.1.0)
Building wheels for collected packages: chocolate
Running setup.py bdist_wheel for chocolate ... error
Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-req-build-35jovqz5/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-wheel-xqs95re1 --python-tag cp37:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.7
creating build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/space.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/base.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
copying chocolate/crossvalidation/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
copying chocolate/crossvalidation/repeat.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
copying chocolate/mo/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
copying chocolate/mo/pyhv.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
copying chocolate/conditional/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
copying chocolate/conditional/thompson.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/pandas.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/splitter.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/mongodb.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/sqlite.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/kernels.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/bayes.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/cmaes.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/grid.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/quasirandom.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/random.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
running build_ext
building 'chocolate.mo.hv' extension
creating build/temp.macosx-10.7-x86_64-3.7
creating build/temp.macosx-10.7-x86_64-3.7/chocolate
creating build/temp.macosx-10.7-x86_64-3.7/chocolate/mo
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c chocolate/mo/_hv.c -o build/temp.macosx-10.7-x86_64-3.7/chocolate/mo/_hv.o
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c chocolate/mo/hv.cpp -o build/temp.macosx-10.7-x86_64-3.7/chocolate/mo/hv.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
chocolate/mo/hv.cpp:24:10: fatal error: 'cstdlib' file not found
#include
^~~~~~~~~
1 warning and 1 error generated.
error: command 'gcc' failed with exit status 1

Failed building wheel for chocolate
Running setup.py clean for chocolate
Failed to build chocolate
Installing collected packages: chocolate
Running setup.py install for chocolate ... error
Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-req-build-35jovqz5/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-record-_571vfdr/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.7
creating build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/space.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
copying chocolate/base.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
copying chocolate/crossvalidation/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
copying chocolate/crossvalidation/repeat.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/crossvalidation
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
copying chocolate/mo/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
copying chocolate/mo/pyhv.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/mo
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
copying chocolate/conditional/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
copying chocolate/conditional/thompson.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/conditional
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/pandas.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/splitter.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/mongodb.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
copying chocolate/connection/sqlite.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/connection
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/kernels.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/bayes.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
copying chocolate/search/cmaes.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/search
creating build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/grid.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/quasirandom.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/init.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
copying chocolate/sample/random.py -> build/lib.macosx-10.7-x86_64-3.7/chocolate/sample
running build_ext
building 'chocolate.mo.hv' extension
creating build/temp.macosx-10.7-x86_64-3.7
creating build/temp.macosx-10.7-x86_64-3.7/chocolate
creating build/temp.macosx-10.7-x86_64-3.7/chocolate/mo
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c chocolate/mo/_hv.c -o build/temp.macosx-10.7-x86_64-3.7/chocolate/mo/_hv.o
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/anaconda3/include/python3.7m -c chocolate/mo/hv.cpp -o build/temp.macosx-10.7-x86_64-3.7/chocolate/mo/hv.o
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
chocolate/mo/hv.cpp:24:10: fatal error: 'cstdlib' file not found
#include
^~~~~~~~~
1 warning and 1 error generated.
error: command 'gcc' failed with exit status 1

----------------------------------------

Command "/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-req-build-35jovqz5/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-record-_571vfdr/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/1c/ndcpjn5n45l9zk0wg6kq_9jw0000gp/T/pip-req-build-35jovqz5/

Bayes Update Not working with Sampled Space

Hi there!

I'm running a conditional neural architecture search, an evolving neural network topology (MLP) with parameters "num_hidden_layers" and conditional on that the size of each layer.

Once I get to the gaussian process step after some bootstraps, I'm getting this error:

  File "hyperparameter_job.py", line 206, in hyperparameter_job
    token,params=sampler.next()
  File "python3.6/site-packages/chocolate/base.py", line 159, in next
    return self._next()
  File "python3.6/site-packages/chocolate/search/bayes.py", line 60, in _next
    gp, y = self._fit_gp(X, Xpending, y)
  File "python3.6/site-packages/chocolate/search/bayes.py", line 72, in _fit_gp
    X = numpy.array([[elem[k] for k in self.space.names()] for elem in X])
  File "python3.6/site-packages/chocolate/search/bayes.py", line 72, in <listcomp>
    X = numpy.array([[elem[k] for k in self.space.names()] for elem in X])
  File "python3.6/site-packages/chocolate/search/bayes.py", line 72, in <listcomp>
    X = numpy.array([[elem[k] for k in self.space.names()] for elem in X])
KeyError: 'num_hidden_layers_num_hidden_layers_0_layer_0_size'

It cannot seem to find this key, can't populate the search space to regress on, and it seems dependent on the conditional parameters? Can you help me debug what is happening?? I have a project deadline that would benefit from this debug. Thanks!

Bayes parallel samples not different.

I run bayes with kappa=1.2

The problem is that I get the same suggestion 2 times.
Experiment 6 and 8 is the same.
I ask for 8 before 6 was run.
And 11 and 12 is also the same.

id NClusters NParents R_cut _loss
0 6 4 9.7 -0.50
1 12 2 6.1 -0.04
2 3 1 6.2 -0.01
3 6 1 7.4 -0.28
4 9 1 11.9 -0.11
5 5 2 11.9 -0.59
6 3 3 11.9 -0.01
7 5 3 11.9 -0.65
8 3 3 11.9 -0.01
9 12 4 2.0 0.00
10 12 4 2.8 0.00
11 5 4 11.9 NaN
12 5 4 11.9 NaN

quantized parameters not perfect quantized

When I use quantized_uniform(0.0, 1.0,0.01) i still get parameters like:

0.27000000000000002
0.17999999999999999
0.85999999999999999
instead of
0.27
0.18
0.86

Line 155 in space.py should changed to:
float(int(numpy.floor((x * (self.high - self.low)) / self.step)) * self.step + self.low)

dlib's global optimization

Any plan on supporting dlib's new optimizer?

http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html

chocolate CMAES raise exception when use some space

my space is

space = {
    'Imputation@skipped': choice([True, False]),
    'Imputation@missing_values': choice(['NaN', 0]),
    'Imputation@strategy': choice(['mean', 'median', 'most_frequent']),
    'PCA@skipped': choice([True, False]),
    'PCA@whiten': choice([True, False]),
    'PCA@svd_solver': choice(['auto', 'full', 'arpack', 'randomized']),
    'JATA': {
        'RigdeC': {
            'RigdeC@alpha': uniform(low=0.0001, high=2),
            'RigdeC@fit_intercept': choice([True, False]),
            'RigdeC@normalize': choice([True, False]),
            'RigdeC@tol': log(low=-5, high=-1, base=10)
        },
        'XGBC': {
            'XGBC@min_child_weight': uniform(low=0, high=20),
            'XGBC@n_estimators': quantized_uniform(low=25, high=525, step=20),
            'XGBC@max_depth': quantized_uniform(low=1, high=20, step=1),
            'XGBC@subsample': uniform(low=0.7, high=1.0),
            'XGBC@learning_rate': uniform(low=0.001, high=1.0),
            'XGBC@colsample_bytree': uniform(low=0.1, high=1.0),
            'XGBC@colsample_bylevel': uniform(low=0.1, high=1.0),
            'XGBC@reg_alpha': log(low=-10, high=-1, base=10),
            'XGBC@reg_lambda': log(low=-10, high=-1, base=10),
            'XGBC@booster': {
                'gbtree': None,
                'XGBC@gblinear': {
                    'XGBC@updater': choice(['shotgun', 'coord_descent']),
                    'XGBC@feature_selector': choice(['cyclic', 'shuffle'])
                },
                'XGBC@dart': {
                    'XGBC@sample_type': choice(['uniform', 'weighted']),
                    'XGBC@normalize_type': choice(['tree', 'forest']),
                    'XGBC@rate_drop': uniform(low=0.0, high=1.0),
                    'XGBC@skip_drop': uniform(low=0.0, high=1.0)
                }
            }
        }
    }
}

and my test code is

conn = SQLiteConnection("sqlite:///my_db.db")
sampler = CMAES(conn, space, clear_db=True)
token, params = sampler.next()
print(params)

sometimes rase

Traceback (most recent call last):
  File "/Users/yang/workspace/baidu/bdg/jarvis-automl/automl/test.py", line 87, in <module>
    token, params = sampler.next()
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/base.py", line 159, in next
    return self._next()
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 86, in _next
    ancestors, ancestors_ids = self._load_ancestors(results)
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 194, in _load_ancestors
    candidate["step"] = numpy.array([c[str(k)] for k in self.space.names()])
  File "/anaconda3/lib/python3.6/site-packages/chocolate-0.6-py3.6-macosx-10.9-x86_64.egg/chocolate/search/cmaes.py", line 194, in <listcomp>
    candidate["step"] = numpy.array([c[str(k)] for k in self.space.names()])
KeyError: 'JATA_JATA_XGBC_XGBC@booster_XGBC@booster_XGBC@dart_XGBC@normalize_type'

Missing examples.

I think the manual is missing important examples.

When you start using chocolate you might already have data you what to load into the database.
The should be an example which shows you how to do that.

If your experiments is not computer-based you might want to split space creation, next operation and value update in 3 scripts. Such that you can get a new set of parameters then go to your lab and do the experiment and then update the database with the result.
That is also an example I would like to see.

KeyError: '_subspace' when using results_as_dataframe and ThompsonSampling+CMAES

Calling results_as_dataframe works fine if I use a MongoDBConnection and QuasiRandom sampler. However, changing the sampler to ThompsonSampling causes results_as_dataframe to throw an exception KeyError: '_subspace'. Here is some example code that demonstrates the problem. Uncommenting the line that uses ThompsonSamplingand commenting out the line that usesQuasiRandom` results in the error.

from chocolate import Space, ThompsonSampling, CMAES, SQLiteConnection, QuasiRandom, log, quantized_uniform

s = Space([
    {
        "algo": "svm",
        "C": log(low=-3, high=5, base=10),
        "kernel": {
            "linear": None,
            "rbf": {
                "gamma": log(low=-2, high=3, base=10)
            }
        }
    },
    {
        "algo": "knn",
        "n_neighbors": quantized_uniform(low=1, high=20, step=1)
    }])

conn = SQLiteConnection(url="sqlite:///db.db")
sampler = QuasiRandom(conn, s)
# sampler = ThompsonSampling(CMAES, conn, s)
token, params = sampler.next()
print(f'Token: {token}')
print(f'Parameters: {params}')

results = conn.results_as_dataframe()
print(results)

The output, exception and stack trace when using ThompsonSampling are:

Token: {'_chocolate_id': 0, '_arm_id': 1}
Parameters: {'C': 80716.84865011052, 'gamma': 3.7193589528638826, 'kernel': 'rbf', 'algo': 'svm'}
Traceback (most recent call last):
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\williams\.vscode\extensions\ms-python.python-2020.9.114305\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "g:\Research\WS\DeepRL\experimental\test_thompsonsampling_bug.py", line 26, in <module>
    results = conn.results_as_dataframe()
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in results_as_dataframe
    result = s([r[k] for k in s.names()])
  File "C:\Users\williams\Anaconda3\envs\deeplearning\lib\site-packages\chocolate\base.py", line 65, in <listcomp>
    result = s([r[k] for k in s.names()])
KeyError: '_subspace'

Looking at the database that is generated, I can see that the results table is lacking a _subspace column. Note that sampling new parameters works fine, as does storing losses, but I can't extract the results.

When everything works, I expect the output to look something like the following:

Token: {'_chocolate_id': 0}
Parameters: {'n_neighbors': 7, 'algo': 'knn'}
    n_neighbors algo
id
0             7  knn

Documentation does not render properly on readthedocs.

MemoryError using Random Sampler

import chocolate as choco

def get_op_space(i):
    return  {
                  "{}_posterize".format(i)        :  {"{}_l_posterize".format(i):          choco.quantized_uniform(0, 11, 1),  "{}_p_posterize".format(i):          choco.quantized_uniform(0, 1, 0.1)},
                  "{}_solarize".format(i)         :  {"{}_l_solarize".format(i):           choco.quantized_uniform(0, 11, 1),  "{}_p_solarize".format(i):           choco.quantized_uniform(0, 1, 0.1)},
                  "{}_adjust_brightness".format(i):  {"{}_l_adjust_brightness".format(i):  choco.quantized_uniform(0, 11, 1),  "{}_p_adjust_brightness".format(i):  choco.quantized_uniform(0, 1, 0.1)}
             }


space = {
    "sp1_op1": get_op_space("sp1_op1"),
    "sp1_op2": get_op_space("sp1_op2"),
    "sp2_op1": get_op_space("sp2_op1"),
    "sp2_op2": get_op_space("sp2_op2")
}

conn = choco.DataFrameConnection(from_file=None)

sampler = choco.Random(conn, space, random_state=None)

token, params = sampler.next()

Traceback (most recent call last):
  File "/src/search/test.py", line 22, in <module>
    token, params = sampler.next()
  File "/venv/lib/python3.5/site-packages/chocolate/base.py", line 159, in next
    return self._next()
  File "venv/lib/python3.5/site-packages/chocolate/sample/random.py", line 78, in _next
    choices = sorted(set(range(l)) - set(drawn))
MemoryError

I am trying to search within augmentation cascade search space as like in Googles Autoaugment. Each subpolicy (sp) consists of two operations with where each operation has a level (l) and probability (p). When I try to sample from this space I am encountering a MemoryError.

Need possibility for specifying dependent parameters.

Hi.

We are considering to use chocolate together with genetic algorithms.
We have different mutation operators and our hyper parameters is the mutation probability (p1,p2,p3,p4,p5) each between 0.0 and 1.0 but with the constrain that the sum is 1.0.
Is it possible to specify such a space ?
If yes is would be nice to have such an example in the manual.
if no it would be nice to have such a feature implemented.

So long and thanks for chocolate.
Knud

PyPI release?

I really like the "distributed + SQLite" feature of chocolate (which I couldn't find in any other hyper-parameter optimization Python tool) and therefore plan to use it in one of my project.

However, I don't want to link to the (moving) master branch of this repository.
Are there any plan to make proper PyPI releases of chocolate?

Duplicate samples for chocolate.Bayes

Reproducible Example:

import chocolate as choco

def objective_function(alpha, l1_ratio):
    return alpha + l1_ratio

space = {
    "alpha": choco.quantized_uniform(0.1, 0.3, 0.1),
    "l1_ratio": choco.quantized_uniform(0.5, 1.0, 0.1)
}

conn = choco.DataFrameConnection()

sampler = choco.Bayes(conn, space)

samples = []
for i in range(20):
    token, params = sampler.next()
    samples.append((token, params))
    loss = objective_function(**params)
    sampler.update(token, loss)

for sample in samples:
    print(sample)

Output:

({'_chocolate_id': 0}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 1}, {'alpha': 0.2, 'l1_ratio': 0.9})
({'_chocolate_id': 2}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 3}, {'alpha': 0.2, 'l1_ratio': 0.5})
({'_chocolate_id': 4}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 5}, {'alpha': 0.2, 'l1_ratio': 0.9})
({'_chocolate_id': 6}, {'alpha': 0.2, 'l1_ratio': 0.6})
({'_chocolate_id': 7}, {'alpha': 0.1, 'l1_ratio': 0.8})
({'_chocolate_id': 8}, {'alpha': 0.1, 'l1_ratio': 0.7})
({'_chocolate_id': 9}, {'alpha': 0.1, 'l1_ratio': 0.6})
({'_chocolate_id': 10}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 11}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 12}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 13}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 14}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 15}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 16}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 17}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 18}, {'alpha': 0.1, 'l1_ratio': 0.5})
({'_chocolate_id': 19}, {'alpha': 0.1, 'l1_ratio': 0.5})

Note the repetition of id 0 and 1 and id 4 and 5, respectively.

Comments:
I took a peak at the implementation and found that

During bootstrapping phase (n=10 default), there is no duplicate protection for samples that are randomly drawn.
During gaussian phase, there doesn't seem to be duplicate protection either. This is probably ok, as it would indicate convergence, but I thought I would bring it up anyway.

I can't think of a scenario where this duplication would be desirable behavior, so I am reporting this as an issue.

Algorithms suggesting NaN for continous variables.

I have been testing both the CMA-ES and Bayesian optimisers, so I only have information on these two. I have run each algorithm for thousands of iterations and have noticed that some iterations, albeit rarely, suggest NaN for values of all variable. All of the variables are bounded by choco.uniform(0, X), with X an integer, and I was wondering whether this was intentional?

quantized parameters not perfect quantized.

When I use quantized_uniform(0.0, 1.0,0.01) i still get parameters like:

0.27000000000000002
0.17999999999999999
0.85999999999999999
instead of
0.27
0.18
0.86

Example of some_dataset() on tutorial chocolate->doc->tutorials->tf.rst, please

Can we get an example of what is meant by "some_dataset()"?

Thanks!

Project status / contributions

Hi @fmder,

we found chocolate very useful and were wondering in general if contributions were welcome at this point considering it has been mostly inactive the past few years?

One immediate contribution I am thinking of is to apply semantic versioning, improve docs, and package on conda-forge.

If, on the other hand, you don't want to spend any further time, we'd probably go down the road of forking the repo.

Thanks & cheerio, Harry.

ValueError for Bayesian .next()

Hi,

I'm currently trying to used Bayesian optimisation through Chocolate, but I get an inconsistent error on the first iteration. When first calling .next() on the chocolate.Bayes object, I get the following error:

  File "c:\Users\tests.py", line 252, in StatsBayes
    token, nextParams = solver.next()
  File "C:\Users\Anaconda3\lib\site-packages\chocolate\base.py", line 159, in next
    return self._next()
  File "C:\Users\Anaconda3\lib\site-packages\chocolate\search\bayes.py", line 58, in _next
    gp, y = self._fit_gp(X, Xpending, y)
  File "C:\Users\Anaconda3\lib\site-packages\chocolate\search\bayes.py", line 73, in _fit_gp
    gp.fit(X, y)
  File "C:\Users\Anaconda3\lib\site-packages\sklearn\gaussian_process\gpr.py", line 196, in fit
    X, y = check_X_y(X, y, multi_output=True, y_numeric=True)
  File "C:\Users\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 756, in check_X_y
    estimator=estimator)
  File "C:\Users\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 552, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Should I be instantiating the model with an initial guess? The examples don't seem to suggest this.

Error in "Optimizing Over Multiple Models" example

The score_svm example on the Optimizing Over Multiple Models has the following function definition:

def score_svm(X, y, algo, **params):

When the score_svm function is called, it is called without the algo parameter (see below). What is actually supposed to be sent for the algo parameter?
loss = score_svm(X, y, **params)

How to describe search space like xgb in space.py

how to describe xgb parameters inlcude booster subparameters using space.py,
mycode

{
           "min_child_weight": choco.uniform(0, 20),
           "n_estimators": choco.quantized_uniform(25, 525, 20),
           "max_depth": choco.quantized_uniform(1, 20, 1),
           "subsample": choco.uniform(0.7, 1.0),
           "learning_rate": choco.uniform(0.001, 1.0),
           "colsample_bytree": choco.uniform(0.1, 1.0),
           "colsample_bylevel": choco.uniform(0.1, 1.0),
           "reg_alpha": choco.log(-10, -1, 10),
           "reg_lambda": choco.log(-10, -1, 10),
           "booster": {
               "gbtree": None,
               "gblinear": {
                   "updater": {
                       "shotgun": None, "coord_descent": None
                   },
                   "feature_selector": {
                       "cyclic": None, "shuffle": None,
                       "random": None, "greedy": None,
                       "thrifty": None
                   }
               },
               "dart": {
                   "sample_type": {
                       "uniform": None,
                       "weighted": None
                   },
                   "normalize_type": {
                       "tree": None,
                       "forest": None
                   },
                   "rate_drop": choco.uniform(0.0, 1.0),
                   "skip_drop": choco.uniform(0.0, 1.0),
               }
           },
       }

only generate
{'booster': 'dart', 'colsample_bylevel': 0.18988467350254096, 'colsample_bytree': 0.14279195511563256, 'learning_rate': 0.5499628359285743, 'max_depth': 12, 'min_child_weight': 11.517182050898144, 'n_estimators': 65, 'reg_alpha': 0.0008107887387917439, 'reg_lambda': 1.995310903080172e-10, 'subsample': 0.8042911086851237},sample_type and normalize_type dose not generated

Disappearance of Search Space

Hi there,

I'm optimizing using Bayes. For some reason, after 50 iterations, my search space disappeared (no job has successfully run since) and degraded from:

43|0.530283581420485|0.594507535272973|0.0956133794822555|0.183174425952106|0.324464855577522|0.18321976289463|0.447247428760473|0.494862238036925|0.594291236564883|0.13365381153193|0.621074537333421|0.985043314092192|0.673105557564046|0.596371641856695|0.239961433973444|0.875564411418133|0.68697429593277|0.816370003321507|0.522079915259265|0.299775777059482|0.317955724207088|0.254591595114779|42|0.287920729412387
44|0.0171569182510461|0.523320898649622|0.69997076052281|0.727165793748551|0.936893202909011|0.832087118258521|0.817707066616694|0.043945601495308|0.172307370813185|0.604412420534715|0.698271229512247|0.714185614077469|0.268339034524112|0.864922001768117|0.11006805138535|0.157309044576634|0.234912451561038|0.603408986782483|0.167489595204554|0.924554238220891|0.584603069868852|0.257061923179376|43|
45|0.380566059855455|0.933064088981559|0.933714630646887|0.0744530249769997|0.388265600889486|0.944433675981516|0.0569716535292384|0.622625741906597|0.824726859177882|0.286916684031226|0.0461006760876181|0.367887978131571|0.143214333369612|0.608785397554655|0.355639695538943|0.443461587061983|0.0395272464760753|0.432397938687212|0.916889202326262|0.788378929495989|0.966677421598842|0.203127582215792|44|
46|0.253411368785518|0.0777907741815148|0.876500079587662|0.321163552224698|0.261745976080779|0.567233304826923|0.614005484432338|0.561732707323025|0.303126185751605|0.108269578069825|0.742713447522607|0.668533849457911|0.0944285609757195|0.0105585222315285|0.0661138279019602|0.408501621538934|0.0442106972122589|0.555647870999506|0.964949584291862|0.92392717661155|0.971449433224484|0.487447863638309|45|0.310609024949372
47|0.0298706617302658|0.240152555751103|0.090250446753992|0.579542641511725|0.950262983355336|0.511885716113274|0.644236640943123|0.581846488931566|0.713375881433834|0.667854374379905|0.460655785871363|0.565685697090619|0.297101361852681|0.333863237438923|0.914182987219078|0.781396972753748|0.789599736757716|0.329955374513446|0.19818814113627|0.502965587267886|0.4976784227606|0.0491609756261113|46|
48|0.00885569460517077|0.118405123983955|0.557886102430734|0.0952691739938708|0.08662104381455|0.536737574589037|0.400385295217095|0.627107524687692|0.810342690307004|0.663091784708001|0.38270787596168|0.0886371654368058|0.406951667476664|0.349650275208202|0.786681378478588|0.908111690347794|0.843372779162317|0.151473114416399|0.100487967891708|0.817909681798678|0.895492049701731|0.184908076406235|47|0.254542940994725
49|0.0283817259954816|0.34723398483473|0.0874992440489671|0.264315861727955|0.444942900253486|0.21283369509211|0.00847858108027655|0.707910239646895|0.563197022676976|0.200059736335963|0.984806237389409|0.253012615556054|0.192273064061049|0.93755194497991|0.0270019825026756|0.642695827742226|0.154103034542351|0.910466695738626|0.44607513890935|0.983591745982839|0.269964783615805|0.363716220251394|48|

into:

52|0.999999|0.999999|0.0|0.999999|0.0|0.666662114854188|0.0|0.999999|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|51|
53|0.0|0.999999|0.0|0.0|0.237397505392564|0.333353925188991|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.99480679509398|52|
54|0.999999|0.999999|0.999999|0.0|0.0|0.330118936455318|0.0|0.0|0.999999|0.0|0.0|0.0|0.666662174539248|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|53|
55|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.999999|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|0.999999|0.999999|0.999999|54|
56|0.999999|0.0|0.999999|0.999999|0.0|0.152723969423227|0.999999|0.0|0.0|0.0|0.0|0.0|0.333335688983986|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|55|
57|0.999999|0.0|0.999999|0.0|0.999999|0.38312686338175|0.0|0.0526761582384528|0.0|0.0|0.0|0.0|0.333323312054163|0.0|0.900502413181104|0.0|0.0|0.0|0.0|0.967615307563784|0.0|0.0|56|
58|0.999999|0.0|0.999999|0.999999|0.0|0.666661372626746|0.0|0.0|0.0|0.999999|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|57|
59|0.999999|0.0|0.0|0.999999|0.999999|0.666688133500132|0.0|0.0|0.0|0.0|0.0|0.0|0.719368432365075|0.0|0.999999|0.999999|0.0|0.0|0.0|0.0|0.999999|0.999999|58|
60|0.999999|0.0|0.971209972397585|0.999999|0.999999|0.666666492771594|0.999999|0.0|0.0|0.0|0.0|0.0|0.941648128350468|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|0.999999|0.999999|59|
61|0.0|0.999999|0.0|0.999999|0.999999|0.333278684727739|0.0|0.0|0.0|0.0|0.0|0.0|0.241231958085866|0.0|0.999999|0.999999|0.0|0.0|0.0|0.0|0.999999|0.999999|60|
62|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|0.0|0.0|0.0|0.33341907011072|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|0.0|61|
63|0.0|0.0|0.999999|0.999999|0.0|0.872050888368733|0.0|0.0|0.0|0.0|0.0|0.0|0.666671709977091|0.0|0.0|0.0|0.0|0.999999|0.0|0.999999|0.0|0.999999|62|
64|0.999999|0.0|0.999999|0.999999|0.0|0.666666435941093|0.0|0.0|0.0|0.999999|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.999999|0.0|0.0|0.0|63|
65|0.0|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|-2.77555756156289e-17|0.999999|64|
66|0.0|0.999999|0.999999|0.999999|0.999999|0.354959525945384|0.999999|0.0|0.0|0.0|0.0|0.0|0.333398139070798|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|65|
67|0.0|0.999999|0.930855008554571|0.0|0.0414986078171875|0.666666692338916|0.0|0.999999|0.999999|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.58272672066777|0.0|0.0|0.938542037908141|0.84414554050701|66|
68|0.0|0.999999|0.999999|0.0|0.999999|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.0|0.999999|67|
69|0.0|0.0|0.999999|0.0|0.999999|0.333322400996303|0.0|0.999999|0.999999|0.0|0.0|0.0|0.391332714852607|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|68|
70|0.999999|0.651042976081047|0.0|0.999999|0.999999|0.70726506425955|0.0|0.0|0.0|0.74419997053051|0.999999|0.999999|0.666692486118609|0.0|0.0|0.0|0.999999|0.999999|0.0|0.0|0.0|0.999999|69|
71|0.999999|0.0|0.999999|-1.38777878078145e-17|0.999999|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.0|0.999999|0.999999|0.0|70|
72|0.999999|0.999999|0.999999|0.0|0.0|0.333417588213429|0.0|0.0|0.999999|0.0|0.0|0.0|0.676891573223725|0.0|0.0|0.0|0.0|0.916211729683123|0.999999|0.0|0.999999|0.999999|71|
73|0.0|0.999999|0.0|0.0|0.999999|0.33341080482224|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|72|
74|0.999999|0.0|0.999999|0.999999|0.0|0.999999|0.0|0.0|0.0|0.999999|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.999999|0.0|0.0|0.0|73|
75|0.999999|0.999999|0.0|0.0|0.999999|1.38777878078145e-17|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|0.999999|74|
76|0.0|0.0|0.0|0.999999|0.0|0.35731087627765|0.0|0.999999|0.0|0.0|0.0|0.0|0.667162676980283|0.763568451269825|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|75|
77|0.0|0.999999|0.0|0.999999|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.665914393042038|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.999999|76|
78|0.999999|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.0|77|
79|0.0|0.999999|0.999999|0.999999|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.999999|0.0|0.0|0.0|0.999999|0.999999|0.999999|0.999999|0.0|0.999999|78|

Note that a lot of losses are not returned, this is due to the nature of the algorithm which can fail sometimes under memory constraints. But these final failures seem to be due to a bad search space that causes failures for reasons not pertaining to memory. Help much appreciated!

'SQLiteConnection' object is not iterable

I am trying to run sample example from https://chocolate.readthedocs.io/tutorials/crossvalidation.html but facing an error 'SQLiteConnection' object is not iterable.

I am first time using this tool. Please help me how to fix this issue.

TypeError in database Space table.

I've had no issue in accessing a database that I generated on my personal computer. However, I have attempted to run the same python script on a distributed Linux network and the Space table entry seems to be a bunch of text rather than the usually BLOB binary that I've noticed on my local computer. When I try to transfer the database file onto my personal computer, trying to access the data to obtain the 'translated' variables returns the following error:


Traceback (most recent call last):
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\ptvsd_launcher.py", line 45, in <module>
    main(ptvsdArgs)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\__main__.py", line 265, in main
    wait=args.wait)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\__main__.py", line 256, in handle_args
    run_main(addr, name, kind, *extra, **kwargs)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_local.py", line 52, in run_main
    runner(addr, name, kind == 'module', *extra, **kwargs)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\runner.py", line 32, in run
    set_trace=False)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1283, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1290, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "c:\Users\USER\.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "c:\Users\USER\Desktop\Work Related Items\database_reader.py", line 44, in <module>
    ConfirmF(choco.SQLiteConnection("sqlite:///1.db"))
  File "c:\Users\USER\Desktop\Work Related Items\database_reader.py", line 32, in ConfirmF
    bestDict = GetBest5Results(databaseConnection, False)
  File "c:\Users\USER\Desktop\Work Related Items\database_reader.py", line 19, in GetBest5Results
    resultsTable = databaseConnection.results_as_dataframe()
  File "C:\Users\USER\Anaconda3\lib\site-packages\chocolate\base.py", line 60, in results_as_dataframe
    s = self.get_space()
  File "C:\Users\USER\Anaconda3\lib\site-packages\chocolate\connection\sqlite.py", line 189, in get_space
    return pickle.loads(db[self.space_table_name].find_one()["space"])
TypeError: a bytes-like object is required, not 'str'

I think I've managed to alleviate the issue by exporting the Space table from the database created by the script running on my personal computer and importing this on the other database, but the variables returned by this don't give the same loss value as the database suggests which is concerning.

Is there any way to prevent the Space table being generated as plain text, but as a BLOB binary as it's meant to? Additionally, if two problems have the same bounds for the same variables, are their Space tables interchangable?

Additional experiment data recording into the db

Would it be possible to allow additional data from the experiments to be recorded into the database? Similarly to how hyperopt allows arbitrary attachments. Those could be useful to record e.g. additional metrics, not used in optimisation but good to have otherwise (e.g. finding best hyperparameters based on val scores, but saving also test scores for later reporting. Or optimising accuracy while saving also confusion matrices etc).

Unable use Mongodb connection

Hello,
I am using chocolate to search over my parameter space, Currently I am unable to import MongoDBConnection libraries.

Error below:

Traceback (most recent call last):
  File "svm_opt.py", line 40, in <module>
    conn = choco.MongoDBConnection(url="mongodb://127.0.0.1:27017")
AttributeError: module 'chocolate' has no attribute 'MongoDBConnection'

Another question : why is it mandatory to connect to a database?. (Maybe I am missing something?. It would have been easier to get started without the overhead of installing Mongodb or SQLLite)

Thanks