Git Product home page Git Product logo

optuna's Introduction

Optuna: A hyperparameter optimization framework

Python pypi conda GitHub license Read the Docs Codecov

๐Ÿ”— Website | ๐Ÿ“ƒ Docs | โš™๏ธ Install Guide | ๐Ÿ“ Tutorial | ๐Ÿ’ก Examples

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.

๐Ÿ”ฅ Key Features

Optuna has modern functionalities as follows:

Basic Concepts

We use the terms study and trial as follows:

  • Study: optimization based on an objective function
  • Trial: a single execution of the objective function

Please refer to the sample code below. The goal of a study is to find out the optimal set of hyperparameter values (e.g., regressor and svr_c) through multiple trials (e.g., n_trials=100). Optuna is a framework designed for automation and acceleration of optimization studies.

Sample code with scikit-learn

Open in Colab

import ...

# Define an objective function to be minimized.
def objective(trial):

    # Invoke suggest methods of a Trial object to generate hyperparameters.
    regressor_name = trial.suggest_categorical('regressor', ['SVR', 'RandomForest'])
    if regressor_name == 'SVR':
        svr_c = trial.suggest_float('svr_c', 1e-10, 1e10, log=True)
        regressor_obj = sklearn.svm.SVR(C=svr_c)
    else:
        rf_max_depth = trial.suggest_int('rf_max_depth', 2, 32)
        regressor_obj = sklearn.ensemble.RandomForestRegressor(max_depth=rf_max_depth)

    X, y = sklearn.datasets.fetch_california_housing(return_X_y=True)
    X_train, X_val, y_train, y_val = sklearn.model_selection.train_test_split(X, y, random_state=0)

    regressor_obj.fit(X_train, y_train)
    y_pred = regressor_obj.predict(X_val)

    error = sklearn.metrics.mean_squared_error(y_val, y_pred)

    return error  # An objective value linked with the Trial object.

study = optuna.create_study()  # Create a new study.
study.optimize(objective, n_trials=100)  # Invoke optimization of the objective function.

Note

More examples can be found in optuna/optuna-examples.

The examples cover diverse problem setups such as multi-objective optimization, constrained optimization, pruning, and distributed optimization.

Installation

Optuna is available at the Python Package Index and on Anaconda Cloud.

# PyPI
$ pip install optuna
# Anaconda Cloud
$ conda install -c conda-forge optuna

Important

Optuna supports Python 3.7 or newer.

Also, we provide Optuna docker images on DockerHub.

Integrations

Optuna has integration features with various third-party libraries. Integrations can be found in optuna/optuna-integration and the document is available here.

Supported integration libraries

Web Dashboard

Optuna Dashboard is a real-time web dashboard for Optuna. You can check the optimization history, hyperparameter importance, etc. in graphs and tables. You don't need to create a Python script to call Optuna's visualization functions. Feature requests and bug reports are welcome!

optuna-dashboard

optuna-dashboard can be installed via pip:

$ pip install optuna-dashboard

Tip

Please check out the convenience of Optuna Dashboard using the sample code below.

Sample code to launch Optuna Dashboard

Save the following code as optimize_toy.py.

import optuna


def objective(trial):
    x1 = trial.suggest_float("x1", -100, 100)
    x2 = trial.suggest_float("x2", -100, 100)
    return x1 ** 2 + 0.01 * x2 ** 2


study = optuna.create_study(storage="sqlite:///db.sqlite3")  # Create a new study with database.
study.optimize(objective, n_trials=100)

Then try the commands below:

# Run the study specified above
$ python optimize_toy.py

# Launch the dashboard based on the storage `sqlite:///db.sqlite3`
$ optuna-dashboard sqlite:///db.sqlite3
...
Listening on http://localhost:8080/
Hit Ctrl-C to quit.

Communication

Contribution

Any contributions to Optuna are more than welcome!

If you are new to Optuna, please check the good first issues. They are relatively simple, well-defined, and often good starting points for you to get familiar with the contribution workflow and other developers.

If you already have contributed to Optuna, we recommend the other contribution-welcome issues.

For general guidelines on how to contribute to the project, take a look at CONTRIBUTING.md.

Reference

If you use Optuna in one of your research projects, please cite our KDD paper "Optuna: A Next-generation Hyperparameter Optimization Framework":

BibTeX
@inproceedings{akiba2019optuna,
  title={{O}ptuna: A Next-Generation Hyperparameter Optimization Framework},
  author={Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori},
  booktitle={The 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={2623--2631},
  year={2019}
}

optuna's People

Contributors

alnusjaponica avatar c-bata avatar contramundum53 avatar crcrpar avatar crissman avatar cross32768 avatar eukaryo avatar g-votte avatar gen740 avatar harupy avatar hideakiimamura avatar himkt avatar hvy avatar iwiwi avatar keisuke-umezawa avatar knshnb avatar nabenabe0928 avatar norihitoishida avatar not522 avatar nyanhi avatar nzw0301 avatar philipmay avatar rotaki avatar sile avatar toshihikoyanase avatar vladskripniuk avatar xadrianzetx avatar y0z avatar ytknzw avatar ytsmiling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

optuna's Issues

Feature request: conda recipe

Hello,

Conda is a centralized repository for many libraries, open source or otherwise. I'd like to request that a conda package be made for this project.

Thanks very much for your time,
Sean

Error with 'set_user_attr' when using ChainerMNStudy

I believe this issue is related to multi-thread mode and arises when trying to set node-dependent user-attributes. This is encountered when attempting to do more advanced forms of hyper-parameter adjustments (e.g. pbt)

Trace shown below:

`trial.set_user_attr(key, value) [168/1994]

File "/opt/conda/lib/python3.6/site-packages/optuna/trial.py", line 379, in set_user_attr
self.storage.set_trial_user_attr(self.trial_id, key, value)

File "/opt/conda/lib/python3.6/site-packages/optuna/storages/rdb/storage.py", line 362, in set_trial_user_attr
self._commit(session)

File "/opt/conda/lib/python3.6/site-packages/optuna/storages/rdb/storage.py", line 541, in _commit
sys.exc_info()[2])

File "/opt/conda/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)

File "/opt/conda/lib/python3.6/site-packages/optuna/storages/rdb/storage.py", line 531, in _commit
session.commit()

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 954, in commit
self.transaction.commit()

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 467, in commit
self._prepare_impl()

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl
self.session.flush()

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2313, in flush
self._flush(objects)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2440, in _flush
transaction.rollback(_capture_exception=True)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in exit
compat.reraise(exc_type, exc_value, exc_tb)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
raise value

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2404, in _flush
flush_context.execute()

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 395, in execute
rec.execute(self)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 560, in execute
uow

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
mapper, table, insert)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 872, in _emit_insert_statements
execute(statement, params)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
return meth(self, multiparams, params)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
compiled_sql, distilled_params

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
context)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
exc_info

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 248, in reraise
raise value.with_traceback(tb)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
context)

File "/opt/conda/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
cursor.execute(statement, parameters)

optuna.structs.StorageInternalError: An exception is raised during the commit. This typically happens due to invalid data in the commit, e.g. exceeding max length. (The actual exception is as follows: Int
egrityError('(sqlite3.IntegrityError) UNIQUE constraint failed: trial_user_attributes.trial_id, trial_user_attributes.key',))`

Incompatibility with old versions of SQLAlchemy.

Connecting to PostgreSQL fails with old versions of SQLAlchemy raising an error: sqlalchemy.exc.CompileError: Postgresql ENUM type requires a name. This error is resolved once sqlalchemy version is updated.

For example:

>>> import sqlalchemy
>>> sqlalchemy.__version__
'1.0.13'
>>> from pfnopt.storages import RDBStorage
>>> RDBStorage(url='postgresql://pfnopt:somepassword@localhost:5432/some_db')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/storages/rdb.py", line 85, in __init__
    Base.metadata.create_all(self.engine)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/schema.py", line 3695, in create_all
    tables=tables)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1856, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1481, in _run_visitor
    **kwargs).traverse_single(element)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
    return meth(obj, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 720, in visit_metadata
    _ddl_runner=self)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/event/attr.py", line 256, in __call__
    fn(*args, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/util/langhelpers.py", line 546, in __call__
    return getattr(self.target, self.name)(*arg, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/sqltypes.py", line 1040, in _on_metadata_create
    t._on_metadata_create(target, bind, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/base.py", line 1379, in _on_metadata_create
    self.create(bind=bind, checkfirst=checkfirst)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/base.py", line 1317, in create
    bind.execute(CreateEnumType(self))
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 962, in _execute_ddl
    compiled = ddl.compile(dialect=dialect)
  File "<string>", line 1, in <lambda>
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 494, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 26, in _compiler
    return dialect.ddl_compiler(dialect, self, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/compiler.py", line 190, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/compiler.py", line 213, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/base.py", line 1613, in visit_create_enum_type
    self.preparer.format_type(type_),
  File "/Users/sano/anaconda3/envs/pfnopt-35/lib/python3.5/site-packages/sqlalchemy/dialects/postgresql/base.py", line 1857, in format_type
    raise exc.CompileError("Postgresql ENUM type requires a name.")
sqlalchemy.exc.CompileError: Postgresql ENUM type requires a name.

Population Based Training [ChainerMN Feature request]

A very useful addition to the pruning feature with chainermn, would be the ability to automatically use Population Based Training. It greatly reduces the hyper-parameter search cost (in terms of time and compute) in certain cases.

Basically every n trials, prune/overwrite the models for the m worst preforming trials (model parameters and hyper-parameters) with those of the o best trials, sample new hyper-parameters according to some scheme (e.g. TPE) and continue the optimization process from that point forward (instead of re-initializing the model at the start of every trial). This is pretty straight forward in the single-node multi-gpu setting but is abit more cumbersome in the multi-node multi-gpu setting.

Getting information about each trial as soon as it completes?

It would be nice to have a programmatic way to get information about each trial as soon as it completes (at least in the sequential optimization case...).

Something like:

for trial in study.optimize_iter(objective, n_trials=10):
    # trial.params contains the set of parameters that were last tested
    # trial.value contains the corresponding objective value
    # trial.trial_id contains the trial ID
    pass

My main use case for this is to be able to run some kind of "callback function" every time a new best set of hyper-parameters is found.

As far as I understand, there is no easy way to do this -- except maybe by doing this directly in the objective function but this is not very convenient.

Does that make sense?

Any plan to support ensemble formation?

This is a feature request.

When we are in the state of hyperparameter tuning, we usually form ensembles to further boost its performance. Do you plan to implement (semi-) automatic way to form ensembles?

Resampling failed points

Hello,

Thanks again for an amazing repository.

While Optuna allows for failed and pruned trials, the sampler does not seem to adjust for them. As a result, the sampler tends to resample the failed point a whole lot. For reference, of 11k points in my database, 27 did not fail.

I rely on the failure mechanism to handle model configurations that are too large for memory. In the case that the sampler cannot or will not be updated to account for failed or pruned trials, what mechanism do you suggest using with optuna to handle constraints?

Thanks very much,
Sean

Optuna Firestore Storage

Hi,

Thanks for great product!

Today, I have released Optuna Firestore Storage. It's an adapter for optuna to use Firestore as a backend DB. I'm glad to hear any feedback.

The code of optuna is really well designed, so it's easy to implement this kind of extensions. I hope optuna will have more and more users.

As this is not an issue, I will close this thread soon.

Thanks again,

View multiple studies in dashboard

Currently, you have to pass the study name to the dashboard cli when the dashboard is launched.
e.g. "example-study":
optuna dashboard --storage "sqlite:///example.db" --study "example-study"

It would be helpful if the study cli argument was optional and we could select and view different studies within the database via the dashboard.

Thanks,

Improve timeout with multiple threads

Currently, timeout with multiple threads is implemented using next(timeout) method of iterator returned by ThreadPool.imap. However, as it immediately kills threads suddenly, some internal state may become inconsistent. We need to make the behavior similar to that of single thread version, i.e., when the execution time exceeds the time limit, we wait until the last job will be complete.

`TPESampler._sample_categorical` fails with PostgreSQL backend

TPESampler._sample_categorical fails with PostgreSQL backend. This happens because:

  • TPESampler._sample_categorical returns an integer as numpy.int32.
  • The integer value is input to storage class without any cast.
  • SQLAlchemy with psycopg2 backend does not support numpy.int32 input but does int one.

Repro Steps
With any objective function using categorical sampling (e.g., example one in chainer_mnist.py), invoke minimize as:

study = pfnopt.create_study(storage=SOME_POSTGRES_URL)
pfnopt.minimize(objective, n_trials=100, study=study)

It fails after running trials n_startup_trails times.

Dynamic sampling value ranges fail with `RDBStorage`

Under the current define-by-run search space specification, some users might naturally want to use dynamic sampling value ranges as follows:

import pfnopt
import pfnopt.storages


def obj(client):
    x = client.sample_uniform('x', 0, 1)
    y = client.sample_uniform('y', 0, x)
    return x - y


def main():
    storage = pfnopt.storages.RDBStorage('sqlite:///:memory:')
    pfnopt.minimize(obj, n_trials=2, storage=storage)


if __name__ == '__main__':
    main()

Currently, it fails:

  File "(...)rdb.py", line 135, in set_study_param_distribution
    assert distribution_rdb == distribution
AssertionError

Support multi-thread mode with RDB storage

In current implementation, RDBStorage is incompatible with multi-thread mode. This is because db object cannot be share among multi-threads, in some RDB backends. E.g., sqlite3 raises the following error when a session is used in different threads:

(sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140736163832640 and this is thread id 123145433608192 

Though we've decided not to support this combination for now, we could revisit this issue after wrapping up multi-node/multi-process mode.

sample_categorical does not work with RDBStorage

sample_categorical raises errors when storage is an RDBStorage, due to a cast problem.

Repro:

from pfnopt import minimize
from pfnopt.storage import RDBStorage
from pfnopt.study import create_new_study


def objective(client):
    
    y = client.sample_categorical('y', (-1.0, 1.0))
    return y


if __name__ == '__main__':

    storage = RDBStorage('sqlite:///:memory:')
    study = create_new_study(storage)

    minimize(objective, storage=storage, n_trials=500)

Error messages:

Traceback (most recent call last):
  File "/Users/sano/PycharmProjects/pfnopt/repro.py", line 17, in <module>
    minimize(objective, storage=storage, n_trials=500)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 145, in minimize
    study.run(func, n_trials, timeout_seconds, n_jobs)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 66, in run
    self._run_sequential(func, n_trials, timeout_seconds)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 88, in _run_sequential
    result = func(client)
  File "/Users/sano/PycharmProjects/pfnopt/repro.py", line 8, in objective
    y = client.sample_categorical('y', (-1.0, 1.0))
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 33, in sample_categorical
    return self._sample(name, distributions.CategoricalDistribution(choices=choices))
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 90, in _sample
    self.storage, self.study_id, name, distribution)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/samplers/tpe.py", line 54, in sample
    param_distribution, below_param_values, above_param_values)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/samplers/tpe.py", line 80, in _sample_categorical
    upper=len(choices), size=(self.n_ei_candidates, ), rng=self.rng)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/samplers/_hyperopt.py", line 627, in sample_categorical
    counts_b = np.bincount(obs_below, minlength=upper, weights=weights_b)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

This does not happen when n_trials is small (e.g., 5).

`pfnopt.minimize` fails under `strorage=None` (default)

import pfnopt


def obj(client):
    x = client.sample_uniform('x', 0.1, 0.2)
    return x


def main():
    pfnopt.minimize(obj, n_trials=2)


if __name__ == '__main__':
    main()
AttributeError: 'NoneType' object has no attribute 'get_study_uuid_from_id'

Complete full trial count regardless of pruned trials.

Related to this discussion

In the case of pruned trials, it is useful to offer an option to automatically re-run the failed trials so that the full trial count is achieved. i.e. chainermn_study.optimize(objective, n_trials=25, rerun_pruned=True) such that len(study.trials) == n_trials == n_trials + (rerun_trials - pruned_trials)

Timing of Pruning

I want to ask the timing of pruning like these pattern.

  1. The difference of the interval trial.report (ex. 100iter, 1000iter).
  2. If multiple training are running
  3. Over-fitting

Optuna is very great project. Thanks.

Client interface improvement

  • Mimic Trial 's member variables by properties
  • complete should not be called by users, so we should hide that method.

Using pruning cause x2 more computation time

Hello, Iโ€™d like to know about the implementation of pruning.
As I timed pruned.py snipest in https://optuna.readthedocs.io/en/stable/tutorial/pruning.html with a bit of change, I found that pruning cause much more computation time compared to not using it as below.

Code

import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection
from functools import partial

import optuna

def objective(trial, prune=False):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, test_x, train_y, test_y = \
        sklearn.model_selection.train_test_split(iris.data, iris.target, test_size=0.25, random_state=0)

    alpha = trial.suggest_loguniform('alpha', 1e-5, 1e-1)
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Report intermediate objective value.
        intermediate_value = 1.0 - clf.score(test_x, test_y)
        trial.report(intermediate_value, step)

        # Handle pruning based on the intermediate value.
        if prune:
            if trial.should_prune(step):
                raise optuna.structs.TrialPruned()

    return 1.0 - clf.score(test_x, test_y)

# with pruning
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
%time study.optimize(partial(objective, prune=True), n_trials=30)
    # out
    # CPU times: user 6.82 s, sys: 93.8 ms, total: 6.92 s
    # Wall time: 7.05 s

# without pruning
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
%time study.optimize(partial(objective, prune=False), n_trials=30)
    # out
    # CPU times: user 2.62 s, sys: 78.4 ms, total: 2.7 s
    # Wall time: 2.93 s

What I expected from pruning is a shorter computation time, since it stops optimisation before it converges.
Why this behaviour occurs ? Am I misunderstanding the behaviour of pruning ?

dashboard catches ValueError

Currently, PFNOpt allows nan objective value, e.g,

def objective(client: BaseClient) -> float:
    client.sample_uniform('x', 0, 1)

    return np.nan

Against nan, dashboard catches the following errors when it reloads data.

[E 2018-04-11 12:06:31,858] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x117b426a8>, <Future finished exception=ValueError('Out-of bounds index (66) in patch for column: trial_id',)>)
Traceback (most recent call last):
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/bokeh/server/session.py", line 51, in _needs_document_lock_wrapper
    result = yield yield_for_all_futures(func(self, *args, **kwargs))
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/bokeh/util/tornado.py", line 39, in yield_for_all_futures
    result = yield future
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 296, in wrapper
    result = func(*args, **kwargs)
  File "/Users/sano/anaconda3/lib/python3.6/types.py", line 248, in wrapped
    coro = func(*args, **kwargs)
  File "/Users/sano/PycharmProjects/reviews/pfnopt/pfnopt/dashboard.py", line 207, in update_callback
    self.all_trials_widget.update(current_trials, new_trials)
  File "/Users/sano/PycharmProjects/reviews/pfnopt/pfnopt/dashboard.py", line 135, in update
    self.cds.patch(patch_dict)
  File "/Users/sano/anaconda3/lib/python3.6/site-packages/bokeh/models/sources.py", line 569, in patch
    raise ValueError("Out-of bounds index (%d) in patch for column: %s" % (ind, name))
ValueError: Out-of bounds index (66) in patch for column: trial_id

It search only small `parameter` area when using pruning.

I encounted a problem of similar parameter on many trial when I'm using pruning.
I can not original code but I made similar case toy script bellow.

import optuna


count = 0

def objective(trial: optuna.Trial):
    global count
    a = trial.suggest_uniform('a', 0, 1)
    b = trial.suggest_uniform('b', 0, 1)

    print(a, b)

    if count > 10:
        raise optuna.structs.TrialPruned('pruned!')

    count += 1
    return 1

if __name__ == '__main__':
    study = optuna.create_study()
    study.optimize(objective, 100)

It running that output. (sorry)

0.9193297609499109 0.2951020724586255
[I 2018-12-12 22:41:26,449] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.07133285126371469 0.7913286482928995
[I 2018-12-12 22:41:26,660] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.18891975710874032 0.5979379868952668
[I 2018-12-12 22:41:26,870] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.4037803166887389 0.5556959131058304
[I 2018-12-12 22:41:27,081] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.5486918451302252 0.008951124018864665
[I 2018-12-12 22:41:27,290] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.22330875913552062 0.14336088783151546
[I 2018-12-12 22:41:27,497] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.2873568183345594 0.4439617700115277
[I 2018-12-12 22:41:27,708] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.5538432424257924 0.2169753830498714
[I 2018-12-12 22:41:27,919] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.5127323722154805 0.5340531117541805
[I 2018-12-12 22:41:28,129] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.7841730503441033 0.2835078605459713
[I 2018-12-12 22:41:28,336] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.9546588242819174 0.16202682451250522
[I 2018-12-12 22:41:28,543] Finished a trial resulted in value: 1.0. Current best value is 1.0 with parameters: {'a': 0.9193297609499109, 'b': 0.2951020724586255}.
0.9983748946513037 0.6555629703151846
[I 2018-12-12 22:41:28,754] Setting trial status as TrialState.PRUNED. pruned!
0.9927668704654466 0.7035276767714436
[I 2018-12-12 22:41:28,966] Setting trial status as TrialState.PRUNED. pruned!
0.9987580086217962 0.6628887019713245
[I 2018-12-12 22:41:29,180] Setting trial status as TrialState.PRUNED. pruned!
0.9284772067129109 0.6705110328580711
[I 2018-12-12 22:41:29,390] Setting trial status as TrialState.PRUNED. pruned!
0.9848210255379823 0.6671137737909589
[I 2018-12-12 22:41:29,603] Setting trial status as TrialState.PRUNED. pruned!
0.9953697388365024 0.6755573530081922
[I 2018-12-12 22:41:29,810] Setting trial status as TrialState.PRUNED. pruned!
0.9991738276818234 0.7173958962072531
[I 2018-12-12 22:41:30,023] Setting trial status as TrialState.PRUNED. pruned!
0.8991800344271141 0.6927683451172044
[I 2018-12-12 22:41:30,232] Setting trial status as TrialState.PRUNED. pruned!
0.9934512890815811 0.7088344944318075
[I 2018-12-12 22:41:30,444] Setting trial status as TrialState.PRUNED. pruned!
0.9903050248879492 0.689579918125413
[I 2018-12-12 22:41:30,656] Setting trial status as TrialState.PRUNED. pruned!
0.9163965495782573 0.6780475754902275
[I 2018-12-12 22:41:30,864] Setting trial status as TrialState.PRUNED. pruned!
0.9617450047338343 0.6806464718971055
[I 2018-12-12 22:41:31,077] Setting trial status as TrialState.PRUNED. pruned!
0.9832071113177652 0.7012388287383466
[I 2018-12-12 22:41:31,291] Setting trial status as TrialState.PRUNED. pruned!
0.9323896427950873 0.70081701699815
[I 2018-12-12 22:41:31,502] Setting trial status as TrialState.PRUNED. pruned!
0.9476696183631225 0.6904819670780827
[I 2018-12-12 22:41:31,714] Setting trial status as TrialState.PRUNED. pruned!
0.9283879624992482 0.7026143097522725
[I 2018-12-12 22:41:31,925] Setting trial status as TrialState.PRUNED. pruned!
0.9564054553207888 0.7236265488557467
[I 2018-12-12 22:41:32,138] Setting trial status as TrialState.PRUNED. pruned!
0.975684885384283 0.6761515105075317
[I 2018-12-12 22:41:32,350] Setting trial status as TrialState.PRUNED. pruned!
0.9745888444263756 0.709527987409857
[I 2018-12-12 22:41:32,563] Setting trial status as TrialState.PRUNED. pruned!

I think that it seems that the last one repeats the similar parameter.

Does anyone have insight on this problem?
Or Is this the expected behavior?

Race condition in set_study_param_distribution

Running a study from multiple processes with RDBStorage, it fails as the same param is set twice.

Error log:

(pfnopt) bash-3.2$ python experiment.py 6f42b83d-f4b5-4081-810b-e79d625fc7dc & python experiment.py 6f42b83d-f4b5-4081-810b-e79d625fc7dc &
[1] 39942
[2] 39943
(pfnopt) bash-3.2$
(pfnopt) bash-3.2$
(pfnopt) bash-3.2$
(pfnopt) bash-3.2$
(pfnopt) bash-3.2$ Traceback (most recent call last):
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2837, in one
    ret = self.one_or_none()
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2816, in one_or_none
    "Multiple rows were found for one_or_none()")
sqlalchemy.orm.exc.MultipleResultsFound: Multiple rows were found for one_or_none()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "experiment.py", line 23, in <module>
    minimize(objective, study=study, storage=storage, n_trials=1000)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 145, in minimize
    study.run(func, n_trials, timeout_seconds, n_jobs)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 66, in run
    self._run_sequential(func, n_trials, timeout_seconds)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 88, in _run_sequential
    result = func(client)
  File "experiment.py", line 9, in objective
    x = client.sample_uniform('x', -10000, 10000)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 24, in sample_uniform
    return self._sample(name, distributions.UniformDistribution(low=low, high=high))
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 91, in _sample
    self.storage.set_trial_param(self.trial_id, name, param_value_in_internal_repr)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/storage/rdb.py", line 170, in set_trial_param
    filter(StudyParam.param_name == param_name).one()
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2840, in one
    "Multiple rows were found for one()")
sqlalchemy.orm.exc.MultipleResultsFound: Multiple rows were found for one()
Traceback (most recent call last):
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2837, in one
    ret = self.one_or_none()
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2816, in one_or_none
    "Multiple rows were found for one_or_none()")
sqlalchemy.orm.exc.MultipleResultsFound: Multiple rows were found for one_or_none()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "experiment.py", line 23, in <module>
    minimize(objective, study=study, storage=storage, n_trials=1000)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 145, in minimize
    study.run(func, n_trials, timeout_seconds, n_jobs)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 66, in run
    self._run_sequential(func, n_trials, timeout_seconds)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/study.py", line 88, in _run_sequential
    result = func(client)
  File "experiment.py", line 9, in objective
    x = client.sample_uniform('x', -10000, 10000)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 24, in sample_uniform
    return self._sample(name, distributions.UniformDistribution(low=low, high=high))
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/client.py", line 91, in _sample
    self.storage.set_trial_param(self.trial_id, name, param_value_in_internal_repr)
  File "/Users/sano/PycharmProjects/pfnopt/pfnopt/storage/rdb.py", line 170, in set_trial_param
    filter(StudyParam.param_name == param_name).one()
  File "/Users/sano/anaconda3/envs/pfnopt/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2840, in one
    "Multiple rows were found for one()")
sqlalchemy.orm.exc.MultipleResultsFound: Multiple rows were found for one()

[1]-  Exit 1                  python experiment.py 6f42b83d-f4b5-4081-810b-e79d625fc7dc
[2]+  Exit 1                  python experiment.py 6f42b83d-f4b5-4081-810b-e79d625fc7dc

where experiment.py is as follows:

import sys

from pfnopt import Study, minimize
from pfnopt.storage import RDBStorage


def objective(client):

    x = client.sample_uniform('x', -10000, 10000)
    y = client.sample_uniform('y', -10000, 10000)

    return (x - 2) ** 2 + y


if __name__ == '__main__':

    url = 'postgresql://user:password@dburl/dbname'  # modify here
    study_uuid = sys.argv[1]

    storage = RDBStorage(url)
    study = Study(study_uuid, storage)

    minimize(objective, study=study, storage=storage, n_trials=1000)

how to make use of existent experimental results?

Hi~ I want to search for the best hyper-parameters with optuna, but I don't want to start from scratch. cause I have already done some experiments by manually setting hyper-parameters and some of them got good results. so is that possible to write these hyper-parameter settings and their corresponding results to optuna storage? and then I start optuna to continue searching with the help of these already done experiments.

like, can I do something like optuna import --path /some/existent/experiments/path, and then optuna study optimize.

on the other hand, currently I think I have a workaround, that is I just set this manually set hyper-parameter as the only possible value for suggest_* functions, and these functions will write hyper-parameters to the storage. after this stage, I change suggest_* to a broad range which I want to search in and start optuna, is this workable?

like, if the word dimension is manually set to 100, then I use trial.suggest_int("word_dim", 100, 101) to store this setting.

`test_minimize_timeout` sometimes fails under multi-thread settings

________________________ test_minimize_timeout[None--1] ________________________

n_trials = None, n_jobs = -1

    @pytest.mark.parametrize('n_trials, n_jobs', itertools.product(
        (1, 2, 50, None),  # n_trials
        (1, 2, 10, -1),  # n_jobs
    ))
    def test_minimize_timeout(n_trials, n_jobs):
        # type: (int, int) -> None
    
        sleep_sec = 0.1
        timeout_sec = 1.0
    
        f = Func(sleep_sec=sleep_sec)
        study = pfnopt.minimize(f, n_trials=n_trials, n_jobs=n_jobs, timeout_seconds=timeout_sec)
    
>       assert f.n_calls == len(study.trials)
E       assert 37 == 40
E        +  where 37 = <tests.test_study.Func object at 0x7f893ddc9f28>.n_calls
E        +  and   40 = len([<pfnopt.trial.Trial object at 0x7f893e69c780>, <pfnopt.trial.Trial object at 0x7f893e69c668>, <pfnopt.trial.Trial obj... at 0x7f893e69ca90>, <pfnopt.trial.Trial object at 0x7f893e69c0f0>, <pfnopt.trial.Trial object at 0x7f893e69c2b0>, ...])
E        +    where [<pfnopt.trial.Trial object at 0x7f893e69c780>, <pfnopt.trial.Trial object at 0x7f893e69c668>, <pfnopt.trial.Trial obj... at 0x7f893e69ca90>, <pfnopt.trial.Trial object at 0x7f893e69c0f0>, <pfnopt.trial.Trial object at 0x7f893e69c2b0>, ...] = <pfnopt.study.Study object at 0x7f893ddc98d0>.trials

tests/test_study.py:99: AssertionError
===================== 1 failed, 38 passed in 14.17 seconds =====================
Exited with code 1

Visualization not working

Hello.
I faced on error when:

import optuna

study = optuna.create_study()
...
optuna.visualization.plot_intermediate_values(study)

this shows:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-04cb7fde5fca> in <module>
----> 1 optuna.visualization.plot_intermediate_values(study)

AttributeError: module 'optuna' has no attribute 'visualization'

Do I have to do something for visualization?

Document or user-friendly error message about psycopg2 installation

Without psycopg2 or psycopg2-binary installation, RDBStorage does not work with PostgreSQL.

$ pfnopt mkstudy --url postgresql://postgres:mysecretpassword@localhost:15432/postgres
No module named 'psycopg2'

Even though psycopg2 is not going to be added as a dependency, we've better clarify in document/error-message that user need an additional pip install for integration with PostgreSQL.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.