msmbuilder / osprey Goto Github PK

View Code? Open in Web Editor NEW

72.0 12.0 26.0 997 KB

🦅Hyperparameter optimization for machine learning pipelines 🦅

Home Page: http://msmbuilder.org/osprey

License: Apache License 2.0

Python 95.90% Shell 2.63% Batchfile 0.03% TeX 1.45%

python scikit-learn hyperparameter-optimization models cross-validation machine-learning optimization pretty-logo

osprey's Introduction

Osprey

Osprey is an easy-to-use tool for hyperparameter optimization of machine learning algorithms in Python using scikit-learn (or using scikit-learn compatible APIs).

Each Osprey experiment combines an dataset, an estimator, a search space (and engine), cross validation and asynchronous serialization for distributed parallel optimization of model hyperparameters.

Documentation

For full documentation, please visit the Osprey homepage.

Installation

If you have an Anaconda Python distribution, installation is as easy as:

$ conda install -c omnia osprey

You can also install Osprey with pip:

$ pip install osprey

Alternatively, you can install directly from this GitHub repo:

$ git clone https://github.com/msmbuilder/osprey.git
$ cd osprey && git checkout 1.1.0
$ python setup.py install

Example using MSMBuilder

Below is an example of an osprey config file to cross validate Markov state models based on varying the number of clusters and dihedral angles used in a model:

estimator:
  eval_scope: msmbuilder
  eval: |
    Pipeline([
        ('featurizer', DihedralFeaturizer(types=['phi', 'psi'])),
        ('cluster', MiniBatchKMeans()),
        ('msm', MarkovStateModel(n_timescales=5, verbose=False)),
    ])

search_space:
  cluster__n_clusters:
    min: 10
    max: 100
    type: int
  featurizer__types:
    choices:
      - ['phi', 'psi']
      - ['phi', 'psi', 'chi1']
   type: enum

cv: 5

dataset_loader:
  name: mdtraj
  params:
    trajectories: ~/local/msmbuilder/Tutorial/XTC/*/*.xtc
    topology: ~/local/msmbuilder/Tutorial/native.pdb
    stride: 1

trials:
    uri: sqlite:///osprey-trials.db

Then run osprey worker. You can run multiple parallel instances of osprey worker simultaneously on a cluster too.

$ osprey worker config.yaml

...

----------------------------------------------------------------------
Beginning iteration                                              1 / 1
----------------------------------------------------------------------
History contains: 0 trials
Choosing next hyperparameters with random...
  {'cluster__n_clusters': 20, 'featurizer__types': ['phi', 'psi']}

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    1.8s finished
---------------------------------
Success! Model score = 4.080646
(best score so far   = 4.080646)
---------------------------------

1/1 models fit successfully.
time:         October 27, 2014 10:44 PM
elapsed:      4 seconds.
osprey worker exiting.

You can dump the database to JSON or CSV with osprey dump.

Dependencies

python>=2.7.11
six>=1.10.0
pyyaml>=3.11
numpy>=1.10.4
scipy>=0.17.0
scikit-learn>=0.17.0
sqlalchemy>=1.0.10
bokeh>=0.12.0
matplotlib>=1.5.0
pandas>=0.18.0
GPy (optional, required for gp strategy)
hyperopt (optional, required for hyperopt_tpe strategy)
nose (optional, for testing)

Contributing

In case you encounter any issues with this package, please consider submitting a ticket to the GitHub Issue Tracker. We also welcome any feature requests and highly encourage users to submit pull requests for bug fixes and improvements.

For more detailed information, please refer to our documentation.

Citing

If you use Osprey in your research, please cite:

@misc{osprey,
  author       = {Robert T. McGibbon and
                  Carlos X. Hernández and
                  Matthew P. Harrigan and
                  Steven Kearnes and
                  Mohammad M. Sultan and
                  Stanislaw Jastrzebski and
                  Brooke E. Husic and
                  Vijay S. Pande},
  title        = {Osprey: Hyperparameter Optimization for Machine Learning},
  month        = sep,
  year         = 2016,
  doi          = {10.21105/joss.000341},
  url          = {http://dx.doi.org/10.21105/joss.00034}
}

osprey's People

Contributors

Stargazers

Watchers

osprey's Issues

PBS array job example?

Would it make more sense to control jobs using PBS array jobs? For example, wouldn't the following be the easiest way to control a number of "rounds" of optimization with finite resources?

cat submit.sh
[...]
#PBS -l nodes=1:ppn=8
[...]
osprey worker tica.yaml -n 1 > osprey.$PBS_JOBID.log

qsub submit.sh -q gpu -t 0-25%3

I can file a PR if people think this is a clean way to do things...

Merge Osprey with MSMBuilder

This was brought up as a possibility while discussing #100. Consolidating the two repos would be nice because they're fairly complimentary and directing people to another set of documentation seems somewhat cumbersome from a UX perspective.

On the other hand, osprey isn't limited to working with MSMs....

Remove trailing / on moe url

The http request fails if there are two slashes //

Might be nice to handle this elegantly

Repository moved to pandegroup/osprey

The repository is located now at https://github.com/pandegroup/osprey. Please update your remote URLs.

DEAP cross-validation

rsteca/sklearn-deap seemed interesting

Tune MOE runtime

With 0.3-6-g1188d75, in a six dimensional search space. Each of the models take 60-90 seconds to fit, so this is too long. I would be fine with reducing the convergence criteria or the number of stochastic restarts during the GPEI computation.

$ grep 'moe tooks'  osprey-worker.1993610.biox3-frontend-1.stanford.edu.*.log
osprey-worker.1993610.biox3-frontend-1.stanford.edu.1.log:(moe tooks 1.128 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.1.log:(moe tooks 113.229 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.1.log:(moe tooks 190.670 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.1.log:(moe tooks 88.083 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.1.log:(moe tooks 98.537 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.2.log:(moe tooks 0.016 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.2.log:(moe tooks 39.760 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.2.log:(moe tooks 43.069 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.2.log:(moe tooks 169.744 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.2.log:(moe tooks 93.119 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 0.016 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 0.030 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 108.028 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 48.432 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 158.361 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.3.log:(moe tooks 209.495 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 0.016 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 15.662 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 40.181 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 166.436 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 152.925 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 0.340 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.4.log:(moe tooks 89.860 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 0.008 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 56.567 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 32.840 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 81.684 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 99.516 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.5.log:(moe tooks 177.335 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 0.016 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 26.384 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 102.025 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 120.774 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 140.556 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.6.log:(moe tooks 191.513 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 0.016 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 62.712 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 82.180 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 0.166 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 94.804 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.7.log:(moe tooks 238.581 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 2.645 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 58.871 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 61.745 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 180.421 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 108.552 s)
osprey-worker.1993610.biox3-frontend-1.stanford.edu.8.log:(moe tooks 130.263 s)

MOE not being detected.

I am having trouble getting osprey to detect moe even though it was working before. I already did the conda install moe and ipython seems to pick it up just fine. Any ideas how to go about debugging this? Already tried rebuilding osprey and reinstalling moe.

$ ipython 
Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 15 2014, 10:33:51) 
Type "copyright", "credits" or "license" for more information.

IPython 2.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import moe


In [2]: moe.__version__ 
Out[2]: '0.2.1'

In [3]:

Fewer connections

Currently, the osprey worker process opens up a DB connection when it boots up, and it hold the connection for the duration of its lifetime. Most of the time, that connection is unused -- for each model that gets fit (>1 using --n-iters), it makes 2 requests to the DB, one to get the history and one to store the result. And each one of these iterations takes minutes/hours, since it involves fitting the model. But if you want to run 500 simultaneous workers, this means that the DB still has to think about 500 simultaneous connections, when in fact, since the runtime of each model fit is a little stochastic, there's probably only 1 or 2 connections at a time at max.

cc @mpharrigan was seeing some limitations from mysql complaining about too many active connections.

The way to fix this is to only open the session for a short time inside the loop, and then close it, and not to hold a persistent session handle.

Push docs to msmbuilder.org

The last successful build was 6 months ago. @rmcgibbo: 1) How can I become an admin for this project on ReadTheDocs and 2) any ideas?

Seems to be failing to install numpy:

Building wheels for collected packages: numpy, scipy, scikit-learn, sqlalchemy
  Running setup.py bdist_wheel for numpy
  Stored in directory: /home/docs/checkouts/readthedocs.org/user_builds/osprey/.cache/pip/wheels/66/f5/d7/f6ddd78b61037fcb51a3e32c9cd276e292343cdd62d5384efd
  Running setup.py bdist_wheel for scipy
  Complete output from command /home/docs/checkouts/readthedocs.org/user_builds/osprey/envs/master/bin/python -c "import setuptools;__file__='/tmp/pip-build-bhTBoJ/scipy/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /tmp/tmpFpfE9apip-wheel-:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-build-bhTBoJ/scipy/setup.py", line 265, in <module>
      setup_package()
    File "/tmp/pip-build-bhTBoJ/scipy/setup.py", line 253, in setup_package
      from numpy.distutils.core import setup
  ImportError: No module named numpy.distutils.core

Sporadic input / output error

Python 3.4 using MySQL database and oursql plugin for sqlalchemy

20 / 358 runs failed with the following

Traceback (most recent call last):
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 95, in run_single_trial
    grid.fit(X, y)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/grid_search.py", line 599, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/grid_search.py", line 381, in _fit
    for parameters in parameter_iterable
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 644, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 391, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 129, in __init__
    self.results = func(*args, **kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/cross_validation.py", line 1231, in _fit_and_score
    estimator.fit(X_train, **fit_params)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/pipeline.py", line 129, in fit
    Xt, fit_params = self._pre_transform(X, y, **fit_params)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/pipeline.py", line 119, in _pre_transform
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/mixtape-0.2-py3.4-linux-x86_64.egg/mixtape/tica.py", line 365, in fit_transform
    self.fit(sequences)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/mixtape-0.2-py3.4-linux-x86_64.egg/mixtape/tica.py", line 271, in fit
    self._fit(X)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/mixtape-0.2-py3.4-linux-x86_64.egg/mixtape/tica.py", line 379, in _fit
    self._sum_0_to_TminusTau += X[:-self.lag_time].sum(axis=0)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/numpy/core/_methods.py", line 25, in _sum
    out=out, keepdims=keepdims)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 50, in signal_hander
    print_footer(statuses, start_time, signum)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 145, in print_footer
    print()
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/utils.py", line 47, in write
    self.stream.write(data)
OSError: [Errno 5] Input/output error

Dealing with REST API failures

For L207 in search_engines.py, you could try using this nifty echo function.

Integers from MOE search_space are passed as Floats into grid search params

Here are my parameters from config.yaml:

search_space:
  # the search space is specified by listing the variables you want to
  # optimize over and their bounds for float and int typed variables,
  # or the possible choices for enumeration-typed variables.
  tica__lag_time:
    min: 750
    max: 2000
    type: int 

  tica__n_components: 
    min: 2
    max: 12 
    type: int  

  cluster__n_clusters:
    min: 12
    max: 500
    type: int       # from 10 to 100 (with inclusive endpoints)

  tica__gamma: 
    min: 1e-10
    max: 1e-1
    type: float
    warp: log       # optimize using the log of the parameter

Here's the traceback:

------------------------------------------------------------------------------
Exception encountered while fitting model
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/osprey/execute_worker.py", line 116, in run_single_trial
    grid.fit(X, y)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/grid_search.py", line 596, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/grid_search.py", line 378, in _fit
    for parameters in parameter_iterable
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 653, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 400, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 138, in __init__
    self.results = func(*args, **kwargs)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1237, in _fit_and_score
    estimator.fit(X_train, **fit_params)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/pipeline.py", line 129, in fit
    Xt, fit_params = self._pre_transform(X, y, **fit_params)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/pipeline.py", line 119, in _pre_transform
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "/home/cxh/anaconda/lib/python2.7/site-packages/mixtape/cluster/__init__.py", line 149, in fit_transform
    return self.fit_predict(sequences, y)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/mixtape/cluster/__init__.py", line 134, in fit_predict
    labels = s.fit_predict(sequences)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 740, in fit_predict
    return self.fit(X).labels_
  File "/home/cxh/anaconda/lib/python2.7/site-packages/mixtape/cluster/__init__.py", line 67, in fit
    s.fit(self._concat(sequences))
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 1201, in fit
    init_size=init_size)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 555, in _init_centroids
    x_squared_norms=x_squared_norms)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sklearn/cluster/k_means_.py", line 101, in _k_init
    for c in range(1, n_clusters):
TypeError: range() integer end argument expected, got float.
------------------------------------------------------------------------------

Release v1.0

We should cut a v1.0 release and upload it to pypi at some point

Some TODOs:

GMRQ score greater than N+1 Implied Timescales

Haven't seen this one before... I'm using 5 implied timescales, and one of the CVs got a score of 21.6???

[FR] Print MSMBuilder / MDTraj version

When osprey worker runs using MSMBuilder models, it should print the version of MSMBuilder being used. If other types of models are being used, like just scikit-learn stuff, then obviously it should not.

Sporadic "no iteritems" error

4 / 358 runs failed with:

Traceback (most recent call last):
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 95, in run_single_trial
    grid.fit(X, y)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/grid_search.py", line 599, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/grid_search.py", line 381, in _fit
    for parameters in parameter_iterable
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 644, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 391, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/parallel.py", line 129, in __init__
    self.results = func(*args, **kwargs)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/cross_validation.py", line 1231, in _fit_and_score
    estimator.fit(X_train, **fit_params)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/pipeline.py", line 129, in fit
    Xt, fit_params = self._pre_transform(X, y, **fit_params)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/pipeline.py", line 119, in _pre_transform
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/base.py", line 426, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/biox3/home/harrigan/compute/wetmsm/kchan/osprey/custom_loader.py", line 32, in transform
    for ftype in do
  File "/biox3/home/harrigan/compute/wetmsm/kchan/osprey/custom_loader.py", line 33, in <listcomp>
    if do[ftype]]
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 428, in load
    obj = unpickler.load()
  File "/home/harrigan/opt/python/lib/python3.4/pickle.py", line 1036, in load
    dispatch[key[0]](self)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 293, in load_build
    array = nd_array_wrapper.read(self)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 112, in read
    mmap_mode=unpickler.mmap_mode)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/numpy/lib/npyio.py", line 394, in load
    return format.read_array(fid)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/numpy/lib/format.py", line 450, in read_array
    array = numpy.fromfile(fp, dtype=dtype, count=count)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 50, in signal_hander
    print_footer(statuses, start_time, signum)
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 148, in print_footer
    sigmap = dict((k, v) for v, k in signal.__dict__.iteritems()
AttributeError: 'dict' object has no attribute 'iteritems'

Database disk image is malformed

An unexpected error has occurred with osprey (version 0.2_15_gf7419e1-py3.4.egg), please
consider sending the following traceback to the osprey GitHub issue tracker at:

        https://github.com/rmcgibbo/osprey/issues                              

Traceback (most recent call last):                                             
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 525, in _commit_impl
    self.engine.dialect.do_commit(self.connection)                             
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 409, in do_commit
    dbapi_connection.commit()                                                  
sqlite3.OperationalError: disk I/O error                                       

The above exception was the direct cause of the following exception:           

Traceback (most recent call last):                                             
  File "/home/harrigan/opt/python/bin/osprey", line 9, in <module>             
    load_entry_point('osprey==0.2-15-gf7419e1', 'console_scripts', 'osprey')() 
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/cli/main.py", line 33, in main
    args_func(args, p)                                                         
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/cli/main.py", line 38, in       args_func
    args.func(args, p)                                                         
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/cli/parser_worker.py", line 8,  in func
    execute(args, parser)                                                      
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 71, in execute
    params=params, cv=cv, config_sha1=config_sha1, session=session)            
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.2_15_gf7419e1-py3.4.egg/osprey/execute_worker.py", line 89, in run_single_trial
    session.commit()                                                           
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 776, in commit
    self.transaction.commit()                                                  
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 381, in commit
    t[1].commit()                                                              
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1333, in commit
    self._do_commit()                                                          
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1364, in _do_commit
    self.connection._commit_impl()                                             
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 527, in _commit_impl
    self._handle_dbapi_exception(e, None, None, None, None)                    
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info                                                                   
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 188, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=exc_value)            
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 181, in reraise
    raise value.with_traceback(tb)                                             
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 525, in _commit_impl
    self.engine.dialect.do_commit(self.connection)                             
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 409, in do_commit
    dbapi_connection.commit()                                                  
sqlalchemy.exc.OperationalError: (OperationalError) disk I/O error None None

Problem with doc deployment

Somehow, the top level index.html that is supposed to redirect to development/ or 0.4/ got overwritten

[FR] Support testing of a variable number of features from a datatable

e.g. if I'm loading an hdf5 dataset which has pre-featurized trajectory data, I can make an enum that selects which columns to use. The config.yaml might look like:

    eval: |
        Pipeline([
                ('featurizer', AbstractFeaturizer(columns=[0, 1,2])),
                ('tica', tICA(n_components=4)),
                ('cluster', MiniBatchKMeans()),
                ('msm', MarkovStateModel(n_timescales=5, verbose=False)),
        ])

search_space:
  featurizer__types:
    choices:
      - [0, 1, 2]
      - [3, 4, 5]
      - [range(3), 4] #support for ranges would be nice
      - 'LOO' # leave one out
    type: enum

Hyperparameter optimization stalls

I'm not sure MOE is working like it's supposed to at times. In this case seems like it's sample along the parameter bounds and sampling the same points.

----------------------------------------------------------------------
Beginning iteration                                             8 / 10
----------------------------------------------------------------------
Loading trials database: sqlite:///villin.db...
History contains: 125 trials
Choosing next hyperparameters with moe...
  {'tica__n_components': 2, 'tica__lag_time': 1, 'cluster__n_clusters': 100, 'tica__gamma': 0.099999999999999895, 'featurizer__types': ['phi', 'psi', 'chi1']}
(moe took 6.656 s)

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:  1.0min
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  5.6min finished
[CV] Using MSMBuilder API n_samples averaging
[CV]   n_train_samples: [717706, 716515, 717335, 711555, 723199]
[CV]   n_test_samples: [730784, 731975, 731155, 736935, 725291]
Loading trials database: sqlite:///villin.db...

....

----------------------------------------------------------------------
Beginning iteration                                            10 / 10
----------------------------------------------------------------------
Loading trials database: sqlite:///villin.db...
History contains: 171 trials
Choosing next hyperparameters with moe...
  {'tica__n_components': 2, 'tica__lag_time': 1, 'cluster__n_clusters': 100, 'tica__gamma': 0.099999999999999895, 'featurizer__types': ['phi', 'psi', 'chi1']}
(moe took 10.258 s)

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:  1.2min
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  6.3min finished
[CV] Using MSMBuilder API n_samples averaging
[CV]   n_train_samples: [708766, 708440, 725087, 722092, 727750]
[CV]   n_test_samples: [739724, 740050, 723403, 726398, 720740]
Loading trials database: sqlite:///villin.db...

Boolean options aren't supported by Bokeh

I'm using a Boolean enum as one of my parameters, but it seems like JSON encoding in Bokeh can't handle True/False values.

Session output file 'plot.html' already exists, will be overwritten.
An unexpected error has occurred with osprey (version 0.3_47_ge3604d0-py2.7.egg), please
consider sending the following traceback to the osprey GitHub issue tracker at:

        https://github.com/rmcgibbo/osprey/issues

Traceback (most recent call last):
  File "/Users/cu3alibre/anaconda/bin/osprey", line 9, in <module>
    load_entry_point('osprey==0.3-47-ge3604d0', 'console_scripts', 'osprey')()
  File "build/bdist.macosx-10.5-x86_64/egg/osprey/cli/main.py", line 35, in main
  File "build/bdist.macosx-10.5-x86_64/egg/osprey/cli/main.py", line 40, in args_func
  File "build/bdist.macosx-10.5-x86_64/egg/osprey/cli/parser_plot.py", line 8, in func
  File "build/bdist.macosx-10.5-x86_64/egg/osprey/execute_plot.py", line 41, in execute
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/plotting.py", line 254, in show
    save(filename, obj=plot)
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/plotting.py", line 305, in save
    html = file_html(doc, resources, _default_file['title'])
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/embed.py", line 120, in file_html
    script, div = components(plot_object, resources)
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/embed.py", line 48, in components
    all_models = serialize_json(plot_object.dump()),
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/protocol.py", line 111, in serialize_json
    return json.dumps(obj, cls=encoder, **kwargs)
  File "/Users/cu3alibre/anaconda/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/Users/cu3alibre/anaconda/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/cu3alibre/anaconda/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/protocol.py", line 108, in default
    return self.transform_python_types(obj)
  File "/Users/cu3alibre/anaconda/lib/python2.7/site-packages/bokeh/protocol.py", line 89, in transform_python_types
    return super(BokehJSONEncoder, self).default(obj)
  File "/Users/cu3alibre/anaconda/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: False is not JSON serializable

Add warp=log option to search space variables with type=int

https://github.com/pandegroup/osprey/blob/master/osprey/search_space.py, cc @brookehus

Strange MOE Behavior

I'm finding that MOE's constant liars only suggests boundary cases, even with a randomly sampled history of 800+ points.

Fix sklearn 0.16 check_arrays issue

Logo for docs

Replace MOE with GPy

Found this GP module: http://sheffieldml.github.io/GPy/

Seems to have better documentation and some nice tutorials

Problem in sklearn version check

Traceback (most recent call last):                                              
  File "/home/harrigan/opt/python/bin//osprey", line 9, in <module>             
    load_entry_point('osprey==0.4', 'console_scripts', 'osprey')()              
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.4-py3.4.egg/osprey/cli/main.py", line 35, in main
    args_func(args, p)                                                          
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.4-py3.4.egg/osprey/cli/main.py", line 40, in args_func
    args.func(args, p)                                                          
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.4-py3.4.egg/osprey/cli/parser_worker.py", line 7, in func
    from ..execute_worker import execute                                        
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.4-py3.4.egg/osprey/execute_worker.py", line 20, in <module>
    from .fit_estimator import fit_and_score_estimator                          
  File "/home/harrigan/opt/python/lib/python3.4/site-packages/osprey-0.4-py3.4.egg/osprey/fit_estimator.py", line 19, in <module>
    if LooseVersion(sklearn.__version__) < LooseVersion('0.15.0'):              
  File "/home/harrigan/opt/python/lib/python3.4/distutils/version.py", line 58, in __lt__
    c = self._cmp(other)                                                        
  File "/home/harrigan/opt/python/lib/python3.4/distutils/version.py", line 343, in _cmp
    if self.version < other.version:                                            
TypeError: unorderable types: str() < int()

Update for 'mixtape' -> 'msmbuilder' changeover

CSV dumps aren't readily usable

I'm not sure why yet, but CSV files from osprey won't load into an Excel spreadsheet

Fix msmbuilder test

This really should be fixed or removed instead of relying on msmbuilder not being installed so it gets skipped

Connection to MySQL server lost during query

Traceback (most recent call last):
  File "/home/cxh/anaconda/bin/osprey", line 9, in <module>
    load_entry_point('osprey==0.3-47-ge3604d0-dirty', 'console_scripts', 'osprey')()
  File "build/bdist.linux-x86_64/egg/osprey/cli/main.py", line 35, in main
  File "build/bdist.linux-x86_64/egg/osprey/cli/main.py", line 40, in args_func
  File "build/bdist.linux-x86_64/egg/osprey/cli/parser_worker.py", line 8, in func
  File "build/bdist.linux-x86_64/egg/osprey/execute_worker.py", line 61, in execute
  File "build/bdist.linux-x86_64/egg/osprey/execute_worker.py", line 100, in initialize_trial
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 776, in commit
    self.transaction.commit()
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 377, in commit
    self._prepare_impl()
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 357, in _prepare_impl
    self.session.flush()
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
    self._flush(objects)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
    transaction.rollback(_capture_exception=True)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
    flush_context.execute()
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
    rec.execute(self)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
    uow
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 65, in save_obj
    mapper, table, insert)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 602, in _emit_insert_statements
    execute(statement, params)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 321, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1160, in _handle_dbapi_exception
    exc_info
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
sqlalchemy.exc.OperationalError: (OperationalError) (2013, 'Lost connection to MySQL server during query') 'INSERT INTO trials_v1 (project_name, status, parameters, mean_cv_score, cv_scores, started, completed, elapsed, host, user, traceback, config_sha1) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)' ('default', 'PENDING', '{"cluster__compute_labels": true, "cluster__max_iter": 100, "cluster__verbose": 0, "cluster__init": "k-means++", "cluster__n_init": 3, "msm__verbose": false, "cluster__random_state": null, "msm__ergodic_cutoff": 1, "msm__n_timescales": 5, "cluster__batch_size": 10000, "tica__weighted_transform": false, "cluster__max_no_improvement": 10, "cluster__reassignment_ratio": 0.01, "tica__gamma": 1e-10, "msm__reversible_type": "mle", "tica__lag_time": 1035, "cluster__init_size": null, "cluster__tol": 0.0, "msm__lag_time": 1000, "tica__n_components": 4, "cluster__n_clusters": 93, "msm__prior_counts": 0}', None, None, datetime.datetime(2014, 11, 8, 17, 43, 9, 180020), None, None, 'biox3-4-9.stanford.edu', 'cxh', None, '0c218ee0d6ed4559c1214e5d3039b62bd740da89')

update travis.yml

running on legacy

PCA causes a value error

For ('pca', PCA()) in the configuration file, this is the error I get. cc: @cxhernandez

------------------------------------------------------------------------------
Exception encountered while fitting model
------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bhusic/source/osprey/osprey/execute_worker.py", line 125, in run_single_trial
    estimator, params, cv=cv, scoring=scoring, X=X, y=y, verbose=1)
  File "/home/bhusic/source/osprey/osprey/fit_estimator.py", line 59, in fit_and_score_estimator
    for train, test in cv)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/home/bhusic/source/osprey/osprey/fit_estimator.py", line 124, in _fit_and_score
    estimator.fit(X_train, **fit_params)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/pipeline.py", line 164, in fit
    Xt, fit_params = self._pre_transform(X, y, **fit_params)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/pipeline.py", line 145, in _pre_transform
    Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 241, in fit_transform
    U, S, V = self._fit(X)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 268, in _fit
    X = check_array(X)
  File "/home/bhusic/miniconda3/lib/python2.7/site-packages/sklearn/utils/validation.py", line 393, in check_array
    array = array.astype(np.float64)
ValueError: setting an array element with a sequence.
------------------------------------------------------------------------------

Nans cause database failure

sqlalchemy.exc.DataError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (DataError) 

(1264, "Out of range value for column 'mean_test_score' at row 1", None)
'UPDATE trials_v3 SET status=?, mean_test_score=?, mean_train_score=?, train_scores=?, test_scores=?, n_train_samples=?, n_test_samples=? WHERE trials_v3.id = ?'
('SUCCEEDED', nan, 5.9117118738995851, '[5.875094311095048, 5.964270097753747, 5.826748523773368, 5.970600009795744, 5.927208228433113]', '[NaN, 5.4756195605412055, -0.22265625, 5.674961419430565, 4.636925810916765]', '[33863, 33863, 36000, 33863, 33863]', '[40000, 40000, 37863, 40000, 40000]', 68)

This is with mysql. It would be nice if it could insert a nan instead of raising an exception

edit: added some line breaks to make it slightly more readable

cross-validated rmsd k-medoids without hogging memory?

I was hoping to use MiniBatchKMedoids with the RMSD metric in Osprey. Am I correct in believing that the 5fold cross-validation requires loading all of the data into memory up front? For example, the following (non-cross validated) script seems to use lazy loading and low memory:

from msmbuilder import example_datasets, cluster, msm, featurizer, lumping, utils, dataset, decomposition
from sklearn.pipeline import make_pipeline

n_clusters = 2

trajectories = dataset.MDTrajDataset("./trajectories/*.h5")

clusterer = cluster.MiniBatchKMedoids(n_clusters=n_clusters, metric="rmsd")
clusterer.fit(trajectories)

[FR] Implement ShuffleSplit

see: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.ShuffleSplit.html#sklearn.cross_validation.ShuffleSplit

More interactive result summaries

It could be nice if we shifted towards presenting the Osprey results in Jupyter notebooks rather than just generate a static HTML page:

example: https://github.com/JosPolfliet/pandas-profiling
demo: http://nbviewer.jupyter.org/github/JosPolfliet/pandas-profiling/blob/master/examples/meteorites.ipynb

Although, this might be in the form of recommending tools like this to the end-user rather than implementing anything within Osprey.

Remove references to MOE

The docs still reference MOE

bokeh is broken

Need to replace from bokeh.objects import HoverTool, ColumnDataSource with from bokeh.models import HoverTool, ColumnDataSource (as of Nov. 2014)
bk.hold() and that style of plotting has been deprecated (https://github.com/bokeh/bokeh/blob/7af50c4dc2dd9c2335a7a8a7855a4bb2f90bc6d2/sphinx/source/docs/releases/0.7.0.rst)
The API is so unstable that I suspect we should stay away.

----------------------------------------------------------------------
Beginning iteration                                            1 / 500
----------------------------------------------------------------------
Loading trials database: sqlite:///osprey-trials.db...
History contains: 160 trials
Choosing next hyperparameters with moe...
An unexpected error has occurred with osprey (version 0.3), please
consider sending the following traceback to the osprey GitHub issue tracker at:

        https://github.com/rmcgibbo/osprey/issues

Traceback (most recent call last):
  File "/home/rmcgibbo/miniconda/bin/osprey", line 9, in <module>
    load_entry_point('osprey==0.3', 'console_scripts', 'osprey')()
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/cli/main.py", line 35, in main
    args_func(args, p)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/cli/main.py", line 40, in args_func
    args.func(args, p)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/cli/parser_worker.py", line 8, in func
    execute(args, parser)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/execute_worker.py", line 60, in execute
    sessionbuilder=config.trialscontext)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/execute_worker.py", line 82, in initialize_trial
    params = strategy.suggest(history, searchspace)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/strategies.py", line 210, in suggest
    return self._build_response(results, searchspace)
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/strategies.py", line 261, in _build_response
    out[var.name] = var.point_from_moe(float(moevalue))
  File "/home/rmcgibbo/miniconda/lib/python2.7/site-packages/osprey/search_space.py", line 194, in point_from_moe
    return self.choices[int(np.round(moevalue))]
IndexError: list index out of range

[FR] Run from MOE API

Just add:

from argparse import Namespace
from moe.views.rest.gp_next_points_epi import GpNextPointsEpi
response = GpNextPointsEpi(Namespace(json_body=request)).gp_next_points_epi_view()

Nan checking in "best score so far"

So I think there needs to be an explicit nan check in the "best score so far" printout, as it currently will print nan occasionally. I think this gets fixed upon insertion in the db.

----------------------------------------------------------------------
Beginning iteration                                           17 / 500
----------------------------------------------------------------------
Loading trials database: sqlite:///osprey-trials.db...
History contains: 16 trials
Choosing next hyperparameters with random...
  {'tica__n_components': 7, 'tica__gamma': 2.8307579721219806e-09, 'cluster__n_clusters': 20}
(random took 0.000 s)

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    6.4s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   30.4s finished
[CV] Using MSMBuilder API n_samples averaging
[CV]   n_train_samples: [249997, 249227, 254918, 251878, 254748]
[CV]   n_test_samples: [65195, 65965, 60274, 63314, 60444]
Loading trials database: sqlite:///osprey-trials.db...
Success! Model score = nan
(best score so far   = nan)

----------------------------------------------------------------------
Beginning iteration                                           18 / 500
----------------------------------------------------------------------
Loading trials database: sqlite:///osprey-trials.db...
History contains: 17 trials
Choosing next hyperparameters with random...
  {'tica__n_components': 6, 'tica__gamma': 1.5482447952196421e-09, 'cluster__n_clusters': 28}
(random took 0.000 s)

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    8.1s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   32.0s finished
[CV] Using MSMBuilder API n_samples averaging
[CV]   n_train_samples: [249997, 249227, 254918, 251878, 254748]
[CV]   n_test_samples: [65195, 65965, 60274, 63314, 60444]
Loading trials database: sqlite:///osprey-trials.db...
Success! Model score = 3.157471
(best score so far   = 6.148823)

[cxh@cn97 /nobackup/cxh/osprey]$ osprey worker config.yaml 
======================================================================
= osprey is a tool for machine learning hyperparameter optimization. =
======================================================================

osprey version:  0.3_42_gf489100_dirty-py2.7.egg
time:            November 05, 2014  3:55 PM
hostname:        cn97.local
cwd:             /hsgs/nobackup/cxh/osprey
pid:             54372

Loading config file:     config.yaml...
e libcudart.so.5.0: cannot open shared object file: No such file or directory
e libcudart.so.5.0: cannot open shared object file: No such file or directory

Loading dataset...
  4453 elements without labels
Instantiated estimator:
  Pipeline(steps=[('tica', tICA(gamma=0.05, lag_time=1, n_components=None, weighted_transform=False)), ('cluster', MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++',
        init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8,
        n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0,
        verbose=0)), ('msm', MarkovStateModel(ergodic_cutoff=1, lag_time=1, n_timescales=5, prior_counts=0,
         reversible_type='mle', verbose=False))])
Hyperparameter search space:
  tica__n_components        (int)          2 <= x <= 12
  tica__lag_time            (int)        750 <= x <= 2000
  tica__gamma               (float) 0.000000 <= x <  0.100000
  cluster__n_clusters       (int)         12 <= x <= 500

----------------------------------------------------------------------
Beginning iteration                                              1 / 1
----------------------------------------------------------------------
Loading trials database: mysql://cxh:*****@mysql-1.98ef623d-cxhernandez.node.tutum.io:49155/sirtuin...
History contains: 0 trials
Choosing next hyperparameters with moe...
/home/cxh/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
An unexpected error has occurred with osprey (version 0.3_42_gf489100_dirty-py2.7.egg), please
consider sending the following traceback to the osprey GitHub issue tracker at:

        https://github.com/rmcgibbo/osprey/issues

Traceback (most recent call last):
  File "/home/cxh/anaconda/bin/osprey", line 9, in <module>
    load_entry_point('osprey==0.3-42-gf489100-dirty', 'console_scripts', 'osprey')()
  File "build/bdist.linux-x86_64/egg/osprey/cli/main.py", line 35, in main
  File "build/bdist.linux-x86_64/egg/osprey/cli/main.py", line 40, in args_func
  File "build/bdist.linux-x86_64/egg/osprey/cli/parser_worker.py", line 8, in func
  File "build/bdist.linux-x86_64/egg/osprey/execute_worker.py", line 61, in execute
  File "build/bdist.linux-x86_64/egg/osprey/execute_worker.py", line 84, in initialize_trial
  File "build/bdist.linux-x86_64/egg/osprey/strategies.py", line 217, in suggest
  File "build/bdist.linux-x86_64/egg/osprey/strategies.py", line 317, in _call_moe_locally
  File "/home/cxh/anaconda/lib/python2.7/site-packages/moe/views/rest/gp_next_points_constant_liar.py", line 85, in gp_next_points_constant_liar_view
    self.get_lie_value(params),
  File "/home/cxh/anaconda/lib/python2.7/site-packages/moe/views/rest/gp_next_points_constant_liar.py", line 57, in get_lie_value
    return numpy.amin(points_sampled_values)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2211, in amin
    out=out, keepdims=keepdims)
  File "/home/cxh/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity