xadrianzetx / optuna-distributed Goto Github PK
View Code? Open in Web Editor NEWDistributed hyperparameter optimization made easy
License: MIT License
Distributed hyperparameter optimization made easy
License: MIT License
import optuna_distributed
client = None # Enables local asynchronous optimization.
study = optuna_distributed.from_study(optuna.create_study(direction="maximize"), client=client)
study.optimize(lambda trial: run_algo(trial, df), n_trials=10)
print(study.best_params)
from optuna.visualization import plot_optimization_history
fig = plot_optimization_history(study)
fig.show()
As a workaround I did
class DistributedStudy(Study):
Describe the bug
Continuing optimization from pickled distributed study with multiple objectives raises error.
To Reproduce
Expected behavior
Expected behavior is continuation of study.
Additional context
Error looks like this:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 65, in run
message.process(self.study, self.manager)
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/messages/suggest.py", line 44, in process
trial = Trial(study, self._trial_id)
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna/trial/_trial.py", line 53, in __init__
self._study_id = self.study._study_id
AttributeError: 'DistributedStudy' object has no attribute '_study_id'. Did you mean: '_study'?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 192, in optimize
event_loop.run(terminal, timeout, catch)
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 72, in run
self._fail_unfinished_trials()
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/eventloop.py", line 94, in _fail_unfinished_trials
self.study._storage.set_trial_state_values(trial._trial_id, TrialState.FAIL)
AttributeError: 'DistributedStudy' object has no attribute '_storage'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/repos/private/project/optimize_simulation.py", line 178, in <module>
study.optimize(objective, n_trials=args.trials, n_jobs=args.n_jobs)
File "/home/user/miniconda3/envs/bot/lib/python3.10/site-packages/optuna_distributed/study.py", line 205, in optimize
self._study._storage.remove_session()
AttributeError: 'DistributedStudy' object has no attribute '_storage'
I'm on Ubuntu.
Is your feature request related to a problem? Please describe.
Optuna started supporting Python 3.11 in 3.1.0
, so we should follow.
Describe the solution you'd like
Add Python 3.11 to pyproject.toml
and CI
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
Currently timeout
argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.
Describe the solution you'd like
Implement functionality similar to Optuna's timeout.
Describe alternatives you've considered
Additional context
This requires fully functional DistributedManager.stop_optimization
(#14).
Any plans to add optuna.visualization plotting support?
Describe the bug
Can't run the sample code in the README.md. After running it I get:
python optuna-distributed.py
[I 2023-01-14 19:19:16,896] A new study created in memory with name: no-name-50b57926-30b7-4a96-82d8-35a8a0f13503
Traceback (most recent call last):
File "~/project/optuna-distributed.py", line 21, in <module>
study.optimize(objective, n_trials=10)
File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/study.py", line 184, in optimize
event_loop.run(terminal, timeout, catch)
File "~/project/venv/lib/python3.9/site-packages/optuna_distributed/eventloop.py", line 61, in run
self.manager.create_futures(self.study, self.objective)
File "project/venv/lib/python3.9/site-packages/optuna_distributed/managers/local.py", line 63, in create_futures
p.start()
File "~/miniforge3/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "~/miniforge3/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "~/miniforge3/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "~/miniforge3/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "~/miniforge3/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_distributable.<locals>._wrapper'
To Reproduce
I have a M1 Apple-silicon Mac Mini. Created a Python 3.9 virtual environment. Created a file called optuna-distributed.py
and added the code:
import random
import time
import optuna
import optuna_distributed
from dask.distributed import Client
def objective(trial):
x = trial.suggest_float("x", -100, 100)
y = trial.suggest_categorical("y", [-1, 0, 1])
# Some expensive model fit happens here...
time.sleep(random.uniform(1.0, 2.0))
return x**2 + y
if __name__ == "__main__":
# client = Client("<your.cluster.scheduler.address>") # Enables distributed optimization.
client = None # Enables local asynchronous optimization.
study = optuna_distributed.from_study(optuna.create_study(), client=client)
study.optimize(objective, n_trials=10)
print(study.best_value)
Expected behavior
To get logs and the study.best_value
printed in the console.
Is your feature request related to a problem? Please describe.
Cleanup after timeouts or interrupts is problematic when LocalCluster
is used and it gets even more convoluted when we consider other environments such as jupyter notebook.
Describe the solution you'd like
All these problems can be avoided if we default to multiprocessing
backend when user only wants local asynchronous optimization.
Describe alternatives you've considered
Keep the current design and cover new edge cases in distributed optimization manager.
Additional context
Is your feature request related to a problem? Please describe.
Following Optuna log message changes in 3.1.0 release.
Describe the solution you'd like
In our case, log line that needs to be adjusted to match Optuna is in messages/failed.py
Describe alternatives you've considered
Additional context
optuna/optuna#3857
Is your feature request related to a problem? Please describe.
Currently manager.stop_optimization
will only cancel scheduled trials, but is unable to stop already running ones. This is a limitation of Dask (or actually Python itself) as described in this so thread.
Describe the solution you'd like
Make stop_optimization
emit a signal to all running trials (e.g. using variable) and make sure trial is periodically checking for this signal, as we can't assume users will do it manually in objective functions. If trial reads reads stop signal, optimization should be immediately halted.
Describe alternatives you've considered
Do nothing and wait for Dask to implement thread interrupts as discussed in dask/distributed#4694.
Additional context
Sometimes optimization continues past n_trials
. This only happens with local manager, so most likely it's related to multiprocess pool controller.
Is your feature request related to a problem? Please describe.
Dask dropped Python 3.7 in 2022.02.1
. Since this release is quite old now, it makes sense to stop supporting 3.7 here as well, since eol is few months away anyway.
Describe the solution you'd like
Drop Python 3.7 from pyproject.toml
, adjust classifiers, adjust README, remove typing-extensions
, drop 3.7 from CI matrix.
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
Since setup.py
is considered legacy, we should switch completely to pyproject.toml
.
Describe the solution you'd like
Build requirements are moved to pyproject.toml
, setup.py
is removed.
Describe alternatives you've considered
Additional context
https://peps.python.org/pep-0518/
Is your feature request related to a problem? Please describe.
Currently catch
argument in optimize exists, but has no effect. This functionality should be implemented to close the gap to Optuna behavior.
Describe the solution you'd like
Implement functionality similar to Optuna's catch
.
Describe alternatives you've considered
Additional context
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.study.Study.html#optuna.study.Study.optimize
Hi Adrian,
Thanks for this work.
I am using your API, but I cannot configure optuna log level. Every time I run a study, a lot of messages, for each try, fill the log no matter if I previously configure it :
optuna.logging.disable_default_handler()
optuna.logging.set_verbosity(optuna.logging.WARNING)
optuna.logging.disable_propagation()
¿Is there a way to avoid it?
Thanks in advance
Aura
Dear Adrian Zuber,
I would like to ask whether optuna-distributed could be used with a MySQL storage on an HPC cluster. I am currently trying to parallelize my optimization by submitting multiple processes to Slurm, which usually comes along with a concurrency overhead. Having discovered other approaches recently (thread-based vs process-based, InMemory storage vs RDB storage, dask-optuna vs optuna-distributed, etc.), I wonder what parallelization config you consider as the best. I would highly appreciate if you could perhaps share your experience on this.
Best regards
Ábel
We want to make sure releases of dependencies do not cause any regressions.
Is your feature request related to a problem? Please describe.
Currently logging is handled by Optuna's logger. This introduces unnecessary dependency and causes some minor bugs (e.g. no logs for pruned trials). Logging should be handled via Python builtins to avoid those problems.
Describe the solution you'd like
Use builtin logging APIs to handle all logging within package.
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
Following Optuna API changes in 3.1.0
release.
Describe the solution you'd like
Describe alternatives you've considered
Additional context
optuna/optuna#4098
Describe the bug
Currently license
table in pyproject.toml
is misconfigured, resulting in cluttered meta section on PyPI (see additional context).
To Reproduce
Visit https://pypi.org/project/optuna-distributed/ :^)
Expected behavior
License name is displayed without its contents, just like in https://pypi.org/project/optuna/.
Additional context
License table spec
Typing is quite lax in places, which should be fixed as soon as possible to avoid piling on debt.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.