Git Product home page Git Product logo

allentune's Issues

Documentation on RayExecutor?

Hey,
It is not clear how to use RayExecutor,
Where can I find a working example or documentation on how to use it.
Thanks

how to use allennlp 1.1.0 with allentune

Hi ,I'm fans of allentune. I installed allennlp=1.1.0, which should based on torch=1.6 , but alllentune requires torch=1.5. It is not compatible currently.
If I go back to allennlp=1.0.0, some of my code should make change, such as BertTokenizer.from_pretrained could not use any more.
Is there any suggestions?
Thank you very much!

The library dose not work with the latest allennlp

Contrary to the instructions, I installed the latest allennlp library. Unfortunately I got the following error:

Traceback (most recent call last):
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 918, in save_global
    obj2, parent = _getattribute(module, name)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 266, in _getattribute
    .format(name, obj))
AttributeError: Can't get local attribute 'wrap_function.<locals>.WrappedFunc' on <function wrap_function at 0x2b6ace9a7d90>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 639, in save_global
    return Pickler.save_global(self, obj, name=name)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 922, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <class 'ray.tune.trainable.wrap_function.<locals>.WrappedFunc'>: it's not found as ray.tune.trainable.wrap_function.<locals>.WrappedFunc

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/net/people/plgapohl/python-albert-pytorch/bin/allentune", line 11, in <module>
    load_entry_point('allentune', 'console_scripts', 'allentune')()
  File "/net/people/plgapohl/allentune/allentune/commands/__init__.py", line 67, in main
    args.func(args)
  File "/net/people/plgapohl/allentune/allentune/commands/search.py", line 126, in search_from_args
    executor.run(args)
  File "/net/people/plgapohl/allentune/allentune/modules/ray_executor.py", line 94, in run
    self.run_distributed(run_func, args)
  File "/net/people/plgapohl/allentune/allentune/modules/ray_executor.py", line 58, in run_distributed
    register_trainable("run", run_func)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/tune/registry.py", line 49, in register_trainable
    _global_registry.register(TRAINABLE_CLASS, name, trainable)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/tune/registry.py", line 88, in register
    self._to_flush[(category, key)] = pickle.dumps(value)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 881, in dumps
    cp.dump(obj)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 268, in dump
    return Pickler.dump(self, obj)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 648, in save_global
    return self.save_dynamic_class(obj)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 495, in save_dynamic_class
    save(clsdict)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
   save(v)                                                                                                                                                                                        [26/1833]
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 410, in save_function
    self.save_function_tuple(obj)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 553, in save_function_tuple
    save(state)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 808, in _batch_appends
    save(tmp[0])
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 410, in save_function
    self.save_function_tuple(obj)
  File "/net/people/plgapohl/python-albert-pytorch/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle.py", line 553, in save_function_tuple
    save(state)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
  f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 808, in _batch_appends
    save(tmp[0])
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/net/software/local/python/3.6.5/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
TypeError: can't pickle _thread.RLock objects

Any idea how I can fix that?

AllenNLP updated version

Hi there!

Should allentune work with the latest version of AllenNLP? The one from master, some 1.0.0 release candidate.

Thanks!

Merge multiple experiments together + plot their results

The allentune plot command supports plotting multiple models, but

  1. this is not surfaced well in the README

  2. There isn't an easy way to merge multiple experiment results (you have to manually concatenate the files together).

An allentune merge subcommand would probably be useful here, which merges multiple experiment files together. You can then run allentune plot on the resulting jsonl.

High cpu usage can slow down experiments

When multiple runners with high cpu usage are started they use all available pytorch threads by default. This will slow down the machine due to massive rescheduling of cpus.

An alternative would be to limit the number of pytorch threads (to the number of cpus per trial).

I created a pull request (#5) as an example of how this could be resolved. Having an additional command line argument might be useful.

Tests fail with versions of AllenNLP v0.9.1 onward

Allentune fails to run due to a method being renamed in the most recent version of AllenNLP (v0.9.1+).

================================================================== ERRORS ==================================================================
________________________________________________ ERROR collecting tests/test_example_run.py ________________________________________________
ImportError while importing test module '/home/wfu/packages/allentune/tests/test_example_run.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_example_run.py:1: in <module>
    from allentune.modules import AllenNlpRunner, RayExecutor
allentune/modules/__init__.py:1: in <module>
    from allentune.modules.allennlp_runner import AllenNlpRunner
allentune/modules/allennlp_runner.py:14: in <module>
    from allennlp.common.util import import_modules_and_submodules
E   ImportError: cannot import name 'import_modules_and_submodules'

Issues running allentune on allennlp > 1.0.0

Hi! I realize that allentune is pinned to allennlp==1.0.0, but I was hoping to use it for some experiments on allennlp 1.3.0. I ran into the following issue that doesn't seem related to allennlp, and I was wondering if you had any pointers on how to fix it. Specifically, it looks like ray is trying to load allentune from GCS (something remote?) and having trouble unpickling the allentune.commands module, even though it's installed in my python environment. Do you have any pointers on workarounds or fixes?

Additional info: I was able to run pytest -v . on my installation, and all tests passed.

Here's how I'm invoking allentune:

i=ner_mt_mbert; allentune search --experiment-name $i --num-cpus 14 --log-dir ./tmp --search-space config/lib/search.json --num-samples 1 --base-config config/${i}.jsonnet --include-package modules

And the log trace:

2021-01-11 13:12:36,121 - INFO - allentune.modules.ray_executor - Init Ray with 14 CPUs and 1 GPUs.
2021-01-11 13:12:36,124 INFO resource_spec.py:212 -- Starting Ray with 104.54 GiB memory available for workers and up to 48.8 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2021-01-11 13:12:36,518 WARNING services.py:923 -- Redis failed to start, retrying now.
2021-01-11 13:12:37,210 INFO services.py:1165 -- View the Ray dashboard at localhost:8265
2021-01-11 13:12:37,631 - INFO - allentune.modules.ray_executor - Run Configuration: {'ner_mt_mbert': {'run': 'run', 'resources_per_trial': {'cpu': 1, 'gpu': 1}, 'config': {'RANDOM_SEED': <function RandomSearch.random_integer.<locals>.<lambda> at 0x7f0b1e6e17a0>, 'NUMPY_SEED': <function RandomSearch.random_integer.<
locals>.<lambda> at 0x7f0b1e6e97a0>, 'PYTORCH_SEED': <function RandomSearch.random_integer.<locals>.<lambda> at 0x7f0b1e6e9440>}, 'local_dir': './tmp', 'num_samples': 1}}
2021-01-11 13:12:37,667 WARNING tune.py:318 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs, set tune.run(resources_per_trial={'gpu': 1}...) which allows Tune to expose 1 GPU to each trial. You can also override `Trainable.default_resource_request` if using the Trainable API.
== Status ==
Memory usage on this node: 156.3/251.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/14 CPUs, 1/1 GPUs, 0.0/104.54 GiB heap, 0.0/33.64 GiB objects
Result logdir: /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert
Number of trials: 1 (1 RUNNING)
+-----------------+----------+-------+
| Trial name      | status   | loc   |
|-----------------+----------+-------|
| run_bb506_00000 | RUNNING  |       |
+-----------------+----------+-------+


2021-01-11 13:12:38,520 WARNING worker.py:1047 -- Failed to unpickle actor class 'ImplicitFunc' for actor ID 45b95b1c0100. Traceback:
Traceback (most recent call last):
  File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
    actor_class = pickle.loads(pickled_class)
ModuleNotFoundError: No module named 'allentune.commands'

2021-01-11 13:12:38,525 ERROR trial_runner.py:520 -- Trial run_bb506_00000: Error processing event.
Traceback (most recent call last):
  File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 468, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/worker.py", line 1474, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::TemporaryActor.train() (pid=45844, ip=128.208.3.44)
  File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 442, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 445, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 446, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 400, in ray._raylet.execute_task.function_executor
RuntimeError: The actor with name ImplicitFunc failed to be imported, and so cannot execute this method.
(pid=45844) 2021-01-11 13:12:38,515     ERROR function_manager.py:496 -- Failed to load actor class ImplicitFunc.
(pid=45844) Traceback (most recent call last):
(pid=45844)   File "/homes/gws/echau18/miniconda3/envs/lr-ssmba/lib/python3.7/site-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
(pid=45844)     actor_class = pickle.loads(pickled_class)
(pid=45844) ModuleNotFoundError: No module named 'allentune.commands'
== Status ==
Memory usage on this node: 156.6/251.8 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/14 CPUs, 0/1 GPUs, 0.0/104.54 GiB heap, 0.0/33.64 GiB objects
Result logdir: /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert
Number of trials: 1 (1 ERROR)
+-----------------+----------+-------+
| Trial name      | status   | loc   |
|-----------------+----------+-------|
| run_bb506_00000 | ERROR    |       |
+-----------------+----------+-------+
Number of errored trials: 1
+-----------------+--------------+---------------------------------------------------------------------------------------------------+
| Trial name      |   # failures | error file                                                                                        |
|-----------------+--------------+---------------------------------------------------------------------------------------------------|
| run_bb506_00000 |            1 | /homes/gws/echau18/research-lr-ssmba/tmp/ner_mt_mbert/run_0_2021-01-11_13-12-37nbrmel3w/error.txt |
+-----------------+--------------+---------------------------------------------------------------------------------------------------+

2021-01-11 13:12:38,550 - ERROR - allentune.modules.ray_executor - Error during run of experiment 'ner_mt_mbert': ('Trials did not complete', [run_bb506_00000])

Expecting stdout.log rather than out.log

I'm using allentune with allennlp==2.4.0. First it would be nice if the explicit requirement on allennlp==1.0.0 was removed from setup.py, that way at least there's a chance that users can continue to install the library as other dependencies move on, you can always set it to allennlp>=1.0.0 and users can choose to install 1.0.0 before allentune if they want to replicate the original configuration.

"allennlp==1.0.0",

I had to leave ray==0.8.6 due to a dependency class.

I'm not sure if it's a result of running on a later version of allennlp or not, the search command worked fine. However report errored. On further investigation the report script was looking for a stdout.log file, the file was actually called out.log

It was a simple fix to update, I simply changed the line below:

with open(os.path.join(dir, "stdout.log"), 'r') as stdout_file:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.