Git Product home page Git Product logo

optuna-distributed's People

Contributors

xadrianzetx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

optuna-distributed's Issues

Can't use plot_optimization_history

import optuna_distributed
client = None  # Enables local asynchronous optimization.
study = optuna_distributed.from_study(optuna.create_study(direction="maximize"), client=client)
study.optimize(lambda trial: run_algo(trial, df), n_trials=10)
print(study.best_params)
from optuna.visualization import plot_optimization_history
fig = plot_optimization_history(study)
fig.show()

As a workaround I did

class DistributedStudy(Study):

Continuation of distributed study from pickled study

Describe the bug
Continuing optimization from pickled distributed study with multiple objectives raises error.

To Reproduce

  1. Define multi-objective function
  2. Create distributed study with n_jobs > 1
  3. Save it to pickle using joblib
  4. Load it using joblib
  5. Create distributed study from the loaded pickled study
  6. Call optimize again with n_jobs > 1
  7. get error

Expected behavior
Expected behavior is continuation of study.

Additional context
Error looks like this:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 65, in run
    message.process(self.study, self.manager)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/messages/suggest.py", line 44, in process
    trial = Trial(study, self._trial_id)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna/trial/_trial.py", line 53, in __init__
    self._study_id = self.study._study_id
AttributeError: 'DistributedStudy' object has no attribute '_study_id'. Did you mean: '_study'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 192, in optimize
    event_loop.run(terminal, timeout, catch)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 72, in run
    self._fail_unfinished_trials()
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 94, in _fail_unfinished_trials
    self.study._storage.set_trial_state_values(trial._trial_id, TrialState.FAIL)
AttributeError: 'DistributedStudy' object has no attribute '_storage'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/repos/private/project/optimize_simulation.py", line 178, in <module>
    study.optimize(objective, n_trials=args.trials, n_jobs=args.n_jobs)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 205, in optimize
    self._study._storage.remove_session()
AttributeError: 'DistributedStudy' object has no attribute '_storage'

I'm on Ubuntu.

Support Python 3.11

Is your feature request related to a problem? Please describe.
Optuna started supporting Python 3.11 in 3.1.0, so we should follow.

Describe the solution you'd like
Add Python 3.11 to pyproject.toml and CI

Describe alternatives you've considered

Additional context

Implement `timeout` argument in `DistributedStudy.optimize`

Is your feature request related to a problem? Please describe.
Currently timeout argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.

Describe the solution you'd like
Implement functionality similar to Optuna's timeout.

Describe alternatives you've considered

Additional context
This requires fully functional DistributedManager.stop_optimization (#14).

AttributeError: Can't pickle local object '_distributable.<locals>._wrapper'

Describe the bug
Can't run the sample code in the README.md. After running it I get:

python optuna-distributed.py
[I 2023-01-14 19:19:16,896] A new study created in memory with name: no-name-50b57926-30b7-4a96-82d8-35a8a0f13503
Traceback (most recent call last):
  File "~/project/optuna-distributed.py", line 21, in <module>
    study.optimize(objective, n_trials=10)
  File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/study.py", line 184, in optimize
    event_loop.run(terminal, timeout, catch)
  File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/eventloop.py", line 61, in run
    self.manager.create_futures(self.study, self.objective)
  File "project/venv/lib/python3.9/site-packages/optuna_distributed/managers/local.py", line 63, in create_futures
    p.start()
  File "~/miniforge3/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "~/miniforge3/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_distributable.<locals>._wrapper'

To Reproduce
I have a M1 Apple-silicon Mac Mini. Created a Python 3.9 virtual environment. Created a file called optuna-distributed.py and added the code:

import random
import time

import optuna
import optuna_distributed
from dask.distributed import Client


def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    y = trial.suggest_categorical("y", [-1, 0, 1])
    # Some expensive model fit happens here...
    time.sleep(random.uniform(1.0, 2.0))
    return x**2 + y


if __name__ == "__main__":
    # client = Client("<your.cluster.scheduler.address>")  # Enables distributed optimization.
    client = None  # Enables local asynchronous optimization.
    study = optuna_distributed.from_study(optuna.create_study(), client=client)
    study.optimize(objective, n_trials=10)
    print(study.best_value)

Expected behavior
To get logs and the study.best_value printed in the console.

Default to `multiprocessing` backend when `LocalCluster` is used

Is your feature request related to a problem? Please describe.
Cleanup after timeouts or interrupts is problematic when LocalCluster is used and it gets even more convoluted when we consider other environments such as jupyter notebook.

Describe the solution you'd like
All these problems can be avoided if we default to multiprocessing backend when user only wants local asynchronous optimization.

Describe alternatives you've considered
Keep the current design and cover new edge cases in distributed optimization manager.

Additional context

Send kill signal to running trials when `stop_optimization` is called

Is your feature request related to a problem? Please describe.
Currently manager.stop_optimization will only cancel scheduled trials, but is unable to stop already running ones. This is a limitation of Dask (or actually Python itself) as described in this so thread.

Describe the solution you'd like
Make stop_optimization emit a signal to all running trials (e.g. using variable) and make sure trial is periodically checking for this signal, as we can't assume users will do it manually in objective functions. If trial reads reads stop signal, optimization should be immediately halted.

Describe alternatives you've considered
Do nothing and wait for Dask to implement thread interrupts as discussed in dask/distributed#4694.

Additional context

Drop support for Python 3.7

Is your feature request related to a problem? Please describe.
Dask dropped Python 3.7 in 2022.02.1. Since this release is quite old now, it makes sense to stop supporting 3.7 here as well, since eol is few months away anyway.

Describe the solution you'd like
Drop Python 3.7 from pyproject.toml, adjust classifiers, adjust README, remove typing-extensions, drop 3.7 from CI matrix.

Describe alternatives you've considered

Additional context

Switch to PEP 518 style distribution

Is your feature request related to a problem? Please describe.
Since setup.py is considered legacy, we should switch completely to pyproject.toml.

Describe the solution you'd like
Build requirements are moved to pyproject.toml, setup.py is removed.

Describe alternatives you've considered

Additional context
https://peps.python.org/pep-0518/

Implement `catch` argument in `DistributedStudy.optimize`

Is your feature request related to a problem? Please describe.
Currently catch argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.

Describe the solution you'd like
Implement functionality similar to Optuna's catch.

Describe alternatives you've considered

Additional context
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize

Log level

Hi Adrian,

Thanks for this work.

I am using your API, but I cannot configure optuna log level. Every time I run a study, a lot of messages, for each try, fill the log no matter if I previously configure it :

optuna.logging.disable_default_handler()
optuna.logging.set_verbosity(optuna.logging.WARNING)
optuna.logging.disable_propagation()

¿Is there a way to avoid it?

Thanks in advance
Aura

Optuna-distributed in HPC environment

Dear Adrian Zuber,

I would like to ask whether optuna-distributed could be used with a MySQL storage on an HPC cluster. I am currently trying to parallelize my optimization by submitting multiple processes to Slurm, which usually comes along with a concurrency overhead. Having discovered other approaches recently (thread-based vs process-based, InMemory storage vs RDB storage, dask-optuna vs optuna-distributed, etc.), I wonder what parallelization config you consider as the best. I would highly appreciate if you could perhaps share your experience on this.

Best regards
Ábel

Use builtin logging

Is your feature request related to a problem? Please describe.
Currently logging is handled by Optuna's logger. This introduces unnecessary dependency and causes some minor bugs (e.g. no logs for pruned trials). Logging should be handled via Python builtins to avoid those problems.

Describe the solution you'd like
Use builtin logging APIs to handle all logging within package.

Describe alternatives you've considered

Additional context

Enforce mypy in CI

Typing is quite lax in places, which should be fixed as soon as possible to avoid piling on debt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.