xadrianzetx / optuna-distributed Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 0.0 195 KB

Distributed hyperparameter optimization made easy

License: MIT License

Python 100.00%

distributed-computing hyperparameter-optimization machine-learning parallel-computing

optuna-distributed's People

Contributors

Stargazers

Watchers

optuna-distributed's Issues

Add test coverage to `DistributedTrial`

Can't use plot_optimization_history

import optuna_distributed
client = None  # Enables local asynchronous optimization.
study = optuna_distributed.from_study(optuna.create_study(direction="maximize"), client=client)
study.optimize(lambda trial: run_algo(trial, df), n_trials=10)
print(study.best_params)
from optuna.visualization import plot_optimization_history
fig = plot_optimization_history(study)
fig.show()

As a workaround I did

class DistributedStudy(Study):

Continuation of distributed study from pickled study

Describe the bug
Continuing optimization from pickled distributed study with multiple objectives raises error.

To Reproduce

Define multi-objective function
Create distributed study with n_jobs > 1
Save it to pickle using joblib
Load it using joblib
Create distributed study from the loaded pickled study
Call optimize again with n_jobs > 1
get error

Expected behavior
Expected behavior is continuation of study.

Additional context
Error looks like this:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 65, in run
    message.process(self.study, self.manager)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/messages/suggest.py", line 44, in process
    trial = Trial(study, self._trial_id)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna/trial/_trial.py", line 53, in __init__
    self._study_id = self.study._study_id
AttributeError: 'DistributedStudy' object has no attribute '_study_id'. Did you mean: '_study'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 192, in optimize
    event_loop.run(terminal, timeout, catch)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 72, in run
    self._fail_unfinished_trials()
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 94, in _fail_unfinished_trials
    self.study._storage.set_trial_state_values(trial._trial_id, TrialState.FAIL)
AttributeError: 'DistributedStudy' object has no attribute '_storage'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/repos/private/project/optimize_simulation.py", line 178, in <module>
    study.optimize(objective, n_trials=args.trials, n_jobs=args.n_jobs)
  File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 205, in optimize
    self._study._storage.remove_session()
AttributeError: 'DistributedStudy' object has no attribute '_storage'

I'm on Ubuntu.

Support Python 3.11

Is your feature request related to a problem? Please describe.
Optuna started supporting Python 3.11 in 3.1.0, so we should follow.

Describe the solution you'd like
Add Python 3.11 to pyproject.toml and CI

Describe alternatives you've considered

Additional context

Implement `timeout` argument in `DistributedStudy.optimize`

Is your feature request related to a problem? Please describe.
Currently timeout argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.

Describe the solution you'd like
Implement functionality similar to Optuna's timeout.

Describe alternatives you've considered

Additional context
This requires fully functional DistributedManager.stop_optimization (#14).

optuna.visualization plotting support

Any plans to add optuna.visualization plotting support?

AttributeError: Can't pickle local object '_distributable.<locals>._wrapper'

Describe the bug
Can't run the sample code in the README.md. After running it I get:

python optuna-distributed.py
[I 2023-01-14 19:19:16,896] A new study created in memory with name: no-name-50b57926-30b7-4a96-82d8-35a8a0f13503
Traceback (most recent call last):
  File "~/project/optuna-distributed.py", line 21, in <module>
    study.optimize(objective, n_trials=10)
  File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/study.py", line 184, in optimize
    event_loop.run(terminal, timeout, catch)
  File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/eventloop.py", line 61, in run
    self.manager.create_futures(self.study, self.objective)
  File "project/venv/lib/python3.9/site-packages/optuna_distributed/managers/local.py", line 63, in create_futures
    p.start()
  File "~/miniforge3/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "~/miniforge3/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_distributable.<locals>._wrapper'

To Reproduce
I have a M1 Apple-silicon Mac Mini. Created a Python 3.9 virtual environment. Created a file called optuna-distributed.py and added the code:

import random
import time

import optuna
import optuna_distributed
from dask.distributed import Client


def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    y = trial.suggest_categorical("y", [-1, 0, 1])
    # Some expensive model fit happens here...
    time.sleep(random.uniform(1.0, 2.0))
    return x**2 + y


if __name__ == "__main__":
    # client = Client("<your.cluster.scheduler.address>")  # Enables distributed optimization.
    client = None  # Enables local asynchronous optimization.
    study = optuna_distributed.from_study(optuna.create_study(), client=client)
    study.optimize(objective, n_trials=10)
    print(study.best_value)

Expected behavior
To get logs and the study.best_value printed in the console.

Default to `multiprocessing` backend when `LocalCluster` is used

Is your feature request related to a problem? Please describe.
Cleanup after timeouts or interrupts is problematic when LocalCluster is used and it gets even more convoluted when we consider other environments such as jupyter notebook.

Describe the solution you'd like
All these problems can be avoided if we default to multiprocessing backend when user only wants local asynchronous optimization.

Describe alternatives you've considered
Keep the current design and cover new edge cases in distributed optimization manager.

Additional context

Add test coverage to managers

Change the log message format for failed trials

Is your feature request related to a problem? Please describe.
Following Optuna log message changes in 3.1.0 release.

Describe the solution you'd like
In our case, log line that needs to be adjusted to match Optuna is in messages/failed.py

Describe alternatives you've considered

Additional context
optuna/optuna#3857

Send kill signal to running trials when `stop_optimization` is called

Is your feature request related to a problem? Please describe.
Currently manager.stop_optimization will only cancel scheduled trials, but is unable to stop already running ones. This is a limitation of Dask (or actually Python itself) as described in this so thread.

Describe the solution you'd like
Make stop_optimization emit a signal to all running trials (e.g. using variable) and make sure trial is periodically checking for this signal, as we can't assume users will do it manually in objective functions. If trial reads reads stop signal, optimization should be immediately halted.

Describe alternatives you've considered
Do nothing and wait for Dask to implement thread interrupts as discussed in dask/distributed#4694.

Additional context

Occasionally missing signals to end optimization on local backend

Sometimes optimization continues past n_trials. This only happens with local manager, so most likely it's related to multiprocess pool controller.

Drop support for Python 3.7

Is your feature request related to a problem? Please describe.
Dask dropped Python 3.7 in 2022.02.1. Since this release is quite old now, it makes sense to stop supporting 3.7 here as well, since eol is few months away anyway.

Describe the solution you'd like
Drop Python 3.7 from pyproject.toml, adjust classifiers, adjust README, remove typing-extensions, drop 3.7 from CI matrix.

Describe alternatives you've considered

Additional context

Switch to PEP 518 style distribution

Is your feature request related to a problem? Please describe.
Since setup.py is considered legacy, we should switch completely to pyproject.toml.

Describe the solution you'd like
Build requirements are moved to pyproject.toml, setup.py is removed.

Describe alternatives you've considered

Additional context
https://peps.python.org/pep-0518/

Implement `catch` argument in `DistributedStudy.optimize`

Is your feature request related to a problem? Please describe.
Currently catch argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.

Describe the solution you'd like
Implement functionality similar to Optuna's catch.

Describe alternatives you've considered

Additional context
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize

Log level

Hi Adrian,

Thanks for this work.

I am using your API, but I cannot configure optuna log level. Every time I run a study, a lot of messages, for each try, fill the log no matter if I previously configure it :

optuna.logging.disable_default_handler()
optuna.logging.set_verbosity(optuna.logging.WARNING)
optuna.logging.disable_propagation()

¿Is there a way to avoid it?

Thanks in advance
Aura

Optuna-distributed in HPC environment

Dear Adrian Zuber,

I would like to ask whether optuna-distributed could be used with a MySQL storage on an HPC cluster. I am currently trying to parallelize my optimization by submitting multiple processes to Slurm, which usually comes along with a concurrency overhead. Having discovered other approaches recently (thread-based vs process-based, InMemory storage vs RDB storage, dask-optuna vs optuna-distributed, etc.), I wonder what parallelization config you consider as the best. I would highly appreciate if you could perhaps share your experience on this.

Best regards
Ábel

Run unit tests on cron job

We want to make sure releases of dependencies do not cause any regressions.

Use builtin logging

Is your feature request related to a problem? Please describe.
Currently logging is handled by Optuna's logger. This introduces unnecessary dependency and causes some minor bugs (e.g. no logs for pruned trials). Logging should be handled via Python builtins to avoid those problems.

Describe the solution you'd like
Use builtin logging APIs to handle all logging within package.

Describe alternatives you've considered

Additional context

Add single exception catch to study optimize

Is your feature request related to a problem? Please describe.
Following Optuna API changes in 3.1.0 release.

Describe the solution you'd like

Describe alternatives you've considered

Additional context
optuna/optuna#4098

Full license text clutters meta section on PyPI

Describe the bug
Currently license table in pyproject.toml is misconfigured, resulting in cluttered meta section on PyPI (see additional context).

To Reproduce
Visit https://pypi.org/project/optuna-distributed/ :^)

Expected behavior
License name is displayed without its contents, just like in https://pypi.org/project/optuna/.

Additional context
License table spec

Enforce mypy in CI

Typing is quite lax in places, which should be fixed as soon as possible to avoid piling on debt.

xadrianzetx / optuna-distributed Goto Github PK

optuna-distributed's People

Contributors

Stargazers

Watchers

optuna-distributed's Issues

Recommend Projects

Recommend Topics

Recommend Org