allenai / allentune Goto Github PK
View Code? Open in Web Editor NEWHyperparameter Search for AllenNLP
License: Apache License 2.0
Hyperparameter Search for AllenNLP
License: Apache License 2.0
I tried to run the tests and get a path error. Looking into the fixture:
https://github.com/allenai/allentune/blob/master/tests/fixtures/classifier.jsonnet#L7
I was stunned, that there is a fixed path, taken from one of the developers computer.
Would you mind fixing that?
same as the title.
Hey,
It is not clear how to use RayExecutor,
Where can I find a working example or documentation on how to use it.
Thanks
Hi ,I'm fans of allentune. I installed allennlp=1.1.0, which should based on torch=1.6 , but alllentune requires torch=1.5. It is not compatible currently.
If I go back to allennlp=1.0.0, some of my code should make change, such as BertTokenizer.from_pretrained could not use any more.
Is there any suggestions?
Thank you very much!
Contrary to the instructions, I installed the latest allennlp library. Unfortunately I got the following error:
Traceback (most recent call last):
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 918, in save_global
obj2, parent = _getattribute(module, name)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 266, in _getattribute
.format(name, obj))
AttributeError: Can't get local attribute 'wrap_function.<locals>.WrappedFunc' on <function wrap_function at 0x2b6ace9a7d90>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 639, in save_global
return Pickler.save_global(self, obj, name=name)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 922, in save_global
(obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'ray.tune.trainable.wrap_function.<locals>.WrappedFunc'>: it's not found as ray.tune.trainable.wrap_function.<locals>.WrappedFunc
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/net/people/plgapohl/python-albert-pytorch/bin/allentune", line 11, in <module>
load_entry_point('allentune', 'console_scripts', 'allentune')()
File "/net/people/plgapohl/allentune/allentune/commands/__init__.py", line 67, in main
args.func(args)
File "/net/people/plgapohl/allentune/allentune/commands/search.py", line 126, in search_from_args
executor.run(args)
File "/net/people/plgapohl/allentune/allentune/modules/ray_executor.py", line 94, in run
self.run_distributed(run_func, args)
File "/net/people/plgapohl/allentune/allentune/modules/ray_executor.py", line 58, in run_distributed
register_trainable("run", run_func)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/tune/registry.py", line 49, in register_trainable
_global_registry.register(TRAINABLE_CLASS, name, trainable)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/tune/registry.py", line 88, in register
self._to_flush[(category, key)] = pickle.dumps(value)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 881, in dumps
cp.dump(obj)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 268, in dump
return Pickler.dump(self, obj)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 648, in save_global
return self.save_dynamic_class(obj)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 495, in save_dynamic_class
save(clsdict)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v) [26/1833]
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 410, in save_function
self.save_function_tuple(obj)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 553, in save_function_tuple
save(state)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 410, in save_function
self.save_function_tuple(obj)
File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 553, in save_function_tuple
save(state)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 781, in save_list
self._batch_appends(obj)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 808, in _batch_appends
save(tmp[0])
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
self.save_reduce(obj=obj, *rv)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
save(state)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
self._batch_setitems(obj.items())
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
save(v)
File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 496, in save
rv = reduce(self.proto)
TypeError: can't pickle _thread.RLock objects
Any idea how I can fix that?
Hi there!
Should allentune work with the latest version of AllenNLP? The one from master, some 1.0.0 release candidate.
Thanks!
The allentune plot
command supports plotting multiple models, but
this is not surfaced well in the README
There isn't an easy way to merge multiple experiment results (you have to manually concatenate the files together).
An allentune merge
subcommand would probably be useful here, which merges multiple experiment files together. You can then run allentune plot
on the resulting jsonl.
When multiple runners with high cpu usage are started they use all available pytorch threads by default. This will slow down the machine due to massive rescheduling of cpus.
An alternative would be to limit the number of pytorch threads (to the number of cpus per trial).
I created a pull request (#5) as an example of how this could be resolved. Having an additional command line argument might be useful.
Allentune fails to run due to a method being renamed in the most recent version of AllenNLP (v0.9.1+).
================================================================== ERRORS ==================================================================
________________________________________________ ERROR collecting tests/test_example_run.py ________________________________________________
ImportError while importing test module '/home/wfu/packages/allentune/tests/test_example_run.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_example_run.py:1: in <module>
from allentune.modules import AllenNlpRunner, RayExecutor
allentune/modules/__init__.py:1: in <module>
from allentune.modules.allennlp_runner import AllenNlpRunner
allentune/modules/allennlp_runner.py:14: in <module>
from allennlp.common.util import import_modules_and_submodules
E ImportError: cannot import name 'import_modules_and_submodules'
Hi! I realize that allentune is pinned to allennlp==1.0.0, but I was hoping to use it for some experiments on allennlp 1.3.0. I ran into the following issue that doesn't seem related to allennlp, and I was wondering if you had any pointers on how to fix it. Specifically, it looks like ray is trying to load allentune from GCS (something remote?) and having trouble unpickling the allentune.commands
module, even though it's installed in my python environment. Do you have any pointers on workarounds or fixes?
Additional info: I was able to run pytest -v .
on my installation, and all tests passed.
Here's how I'm invoking allentune:
i=ner_mt_mbert; allentune search --experiment-name $i --num-cpus 14 --log-dir ./tmp --search-space config/lib/search.json --num-samples 1 --base-config config/${i}.jsonnet --include-package modules
And the log trace:
2021-01-11 13:12:36,121 - INFO - allentune.modules.ray_executor - Init Ray with 14 CPUs and 1 GPUs.
2021-01-11 13:12:36,124 INFO resource_spec.py:212 -- Starting Ray with 104.54 GiB memory available for workers and up to 48.8 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2021-01-11 13:12:36,518 WARNING services.py:923 -- Redis failed to start, retrying now.
2021-01-11 13:12:37,210 INFO services.py:1165 -- View the Ray dashboard at localhost:8265
2021-01-11 13:12:37,631 - INFO - allentune.modules.ray_executor - Run Configuration: {'ner_mt_mbert': {'run': 'run', 'resources_per_trial': {'cpu': 1, 'gpu': 1}, 'config': {'RANDOM_SEED': <function RandomSearch.random_integer.<locals>.<lambda> at 0x7f0b1e6e17a0>, 'NUMPY_SEED': <function RandomSearch.random_integer.<
locals>.<lambda> at 0x7f0b1e6e97a0>, 'PYTORCH_SEED': <function RandomSearch.random_integer.<locals>.<lambda> at 0x7f0b1e6e9440>}, 'local_dir': './tmp', 'num_samples': 1}}
2021-01-11 13:12:37,667 WARNING tune.py:318 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs, set tune.run(resources_per_trial={'gpu': 1}...) which allows Tune to expose 1 GPU to each trial. You can also override `Trainable.default_resource_request` if using the Trainable API.
== Status ==
Memory usage on this node: 156.3/251.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/14 CPUs, 1/1 GPUs, 0.0/104.54 GiB heap, 0.0/33.64 GiB objects
Result logdir: /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert
Number of trials: 1 (1 RUNNING)
+-----------------+----------+-------+
| Trial name | status | loc |
|-----------------+----------+-------|
| run_bb506_00000 | RUNNING | |
+-----------------+----------+-------+
2021-01-11 13:12:38,520 WARNING worker.py:1047 -- Failed to unpickle actor class 'ImplicitFunc' for actor ID 45b95b1c0100. Traceback:
Traceback (most recent call last):
File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
actor_class = pickle.loads(pickled_class)
ModuleNotFoundError: No module named 'allentune.commands'
2021-01-11 13:12:38,525 ERROR trial_runner.py:520 -- Trial run_bb506_00000: Error processing event.
Traceback (most recent call last):
File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 468, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/worker.py", line 1474, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::TemporaryActor.train() (pid=45844, ip=128.208.3.44)
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 442, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 445, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 446, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 400, in ray._raylet.execute_task.function_executor
RuntimeError: The actor with name ImplicitFunc failed to be imported, and so cannot execute this method.
(pid=45844) 2021-01-11 13:12:38,515 ERROR function_manager.py:496 -- Failed to load actor class ImplicitFunc.
(pid=45844) Traceback (most recent call last):
(pid=45844) File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
(pid=45844) actor_class = pickle.loads(pickled_class)
(pid=45844) ModuleNotFoundError: No module named 'allentune.commands'
== Status ==
Memory usage on this node: 156.6/251.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/14 CPUs, 0/1 GPUs, 0.0/104.54 GiB heap, 0.0/33.64 GiB objects
Result logdir: /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert
Number of trials: 1 (1 ERROR)
+-----------------+----------+-------+
| Trial name | status | loc |
|-----------------+----------+-------|
| run_bb506_00000 | ERROR | |
+-----------------+----------+-------+
Number of errored trials: 1
+-----------------+--------------+---------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|-----------------+--------------+---------------------------------------------------------------------------------------------------|
| run_bb506_00000 | 1 | /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert/run_0_2021-01-11_13-12-37nbrmel3w/error.txt |
+-----------------+--------------+---------------------------------------------------------------------------------------------------+
2021-01-11 13:12:38,550 - ERROR - allentune.modules.ray_executor - Error during run of experiment 'ner_mt_mbert': ('Trials did not complete', [run_bb506_00000])
I'm using allentune
with allennlp==2.4.0
. First it would be nice if the explicit requirement on allennlp==1.0.0
was removed from setup.py
, that way at least there's a chance that users can continue to install the library as other dependencies move on, you can always set it to allennlp>=1.0.0
and users can choose to install 1.0.0 before allentune
if they want to replicate the original configuration.
Line 38 in 437e98c
I had to leave ray==0.8.6
due to a dependency class.
I'm not sure if it's a result of running on a later version of allennlp
or not, the search
command worked fine. However report
errored. On further investigation the report script was looking for a stdout.log
file, the file was actually called out.log
It was a simple fix to update, I simply changed the line below:
allentune/allentune/commands/report.py
Line 50 in 437e98c
This looks like a really useful toolchain, but ray==.0.8.6 isnt even on pip anymore
I would really appreciate if this repository got an update!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.