Git Product home page Git Product logo

anyrl-py's People

Contributors

frenchie4111 avatar matwilso avatar unixpickle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anyrl-py's Issues

Simplify model state representation

Right now, there's three different types a model state can be: NoneType, an array, or a tuple of arrays. In the future, it would be nice to unify this.

I propose that states will always be tuples. An empty tuple corresponds to a stateless model; otherwise, the tuple may contain one element (the standard state case), or more than one element. Also, instead of a stateful property, there could just be a num_states property that indicates how many tuple elements are in the state.

Use placeholders for assignments in sync_from_root()

In MPIOptimizer.sync_from_root(), we create a constant and only use it once. It would be nice if we created some placeholders and re-used them for every sync_from_root. This would also avoid filling up the graph with useless constants.

Memory-saving model for "rolling" hidden states

Some RNNCells, like the WaveNet cell implemented in this repository, encode a queue in their recurrent hidden state. For example, the first state might be (A, B, C, D), then the next state might be (B, C, D, E), etc. Currently, models like RNNCellAC will not be able to optimize memory usage in this case.

Thus, I propose a stateful model that wraps another model and re-uses state arrays when possible. It should guarantee that, if a numpy array in the new state matches a numpy array in the previous state, then the old numpy array is reused in the new state. This should greatly reduce memory consumption for WaveNet when doing rollouts and saving the states in model_outs.

Suggestions of writing some documentation

Hi unixpickle,
anyrl-py is the most complete and terse reinforcement learning frameworks I've ever seen.
It is a great work!
Would you like write some documentation to make anyrl-py more popular?

Best wishes,
Simon

Show explained variation for PPO/A2C

Explained variation shows how much better your value function is than a simple mean estimator. It is a useful metric to tell how well your value function is actually doing. Right now, the critic loss value is pretty much useless.

ZeroDivisionError in noisy_net_dense

File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_dist.py", line 550, in isolated_net
aux = dense(aux_batch, dense_units, activation='relu')
File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_scalar.py", line 242, in noisy_net_dense
stddev = 1 / sqrt(num_inputs)
ZeroDivisionError: float division by zero

Zeros in feed_dict cause act and val to be NaN in feed-forward

In anyrl/models/feedforward_ac.py, the function step may assign zero-values in feed_dict.
This may cause self.session.run((self.actor_out, self.critic_out), feed_dict) to return act and val as arrays containing nan's that may eventually throw an exception.

In order to reproduce the following Traceback, simply modify the examples/cartpole.py, replacing 'CartPole-v0' with 'RoboschoolHumanoid-v1'.
(Installing Roboschool and including import roboschool may be necessary)

Traceback (most recent call last):
  File "/home/user/py/run.py", line 72, in <module>
    main()
  File "/home/user/py/cartpole.py", line 24, in main
    run_algorithm(algo)
  File "/home/user/py/cartpole.py", line 49, in run_algorithm
    rollouts = roller.rollouts()
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/rollouts/rollers.py", line 48, in rollouts
    model_out = self.model.step([obs], states)
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/models/feedforward_ac.py", line 60, in step
    'actions': self.action_dist.sample(act),
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/spaces/continuous.py", line 40, in sample
    return np.random.normal(loc=means, scale=stddevs)
  File "mtrand.pyx", line 1656, in mtrand.RandomState.normal
ValueError: scale < 0

Fix slice : position

A lot of places have x[start: end], with a space after the :. This is because I switched to autopep8, and it got rid of one space but not the other.

Create legitimate documentation

In particular, it would be nice to turn the existing docstrings, classes, and functions into a browsable web-friendly format.

PPO: return list of dicts instead of tuples

Right now, we return (actor_loss, explained, entropy, clipped) tuples. This is too rigid, since it is quite possible that more keys will be added later, or in a subclass. For example, if we add value function clipping, we will want another field for that.

Discussion: Import and Export Rainbow Models

Greetings,

I am looking into the usage of this library for a project. Specifically, I seek the ability to export a pre-trained model from the training set when using the Rainbow for use on another (unknown) test set.

Looking forward to your wisdom.

Thanks,
Coast

Question: How to import/export model?

Hello
I was wondering if anyone could help me with exporting a model after training so it can continue to train afterwards. I'm new to tensorflow and using colab so help would be appriciated. I already saw the other issue saying to use savemodels of tensorflow but i don't understand how to use them correctly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.