unixpickle / anyrl-py Goto Github PK

View Code? Open in Web Editor NEW

154.0 154.0 22.0 389 KB

A reinforcement learning framework

Python 100.00%

anyrl-py's People

Contributors

Stargazers

Watchers

anyrl-py's Issues

Simplify model state representation

Right now, there's three different types a model state can be: NoneType, an array, or a tuple of arrays. In the future, it would be nice to unify this.

I propose that states will always be tuples. An empty tuple corresponds to a stateless model; otherwise, the tuple may contain one element (the standard state case), or more than one element. Also, instead of a stateful property, there could just be a num_states property that indicates how many tuple elements are in the state.

Add reset() method to MultiBinaryPadEnv

Use placeholders for assignments in sync_from_root()

In MPIOptimizer.sync_from_root(), we create a constant and only use it once. It would be nice if we created some placeholders and re-used them for every sync_from_root. This would also avoid filling up the graph with useless constants.

Memory-saving model for "rolling" hidden states

Some RNNCells, like the WaveNet cell implemented in this repository, encode a queue in their recurrent hidden state. For example, the first state might be (A, B, C, D), then the next state might be (B, C, D, E), etc. Currently, models like RNNCellAC will not be able to optimize memory usage in this case.

Thus, I propose a stateful model that wraps another model and re-uses state arrays when possible. It should guarantee that, if a numpy array in the new state matches a numpy array in the previous state, then the old numpy array is reused in the new state. This should greatly reduce memory consumption for WaveNet when doing rollouts and saving the states in model_outs.

Gaussian dist should scale mean

Let's say we have a gaussian scaled to the range -50, 50. In order to get the mean to 25, 25, the network would still have to output a large number (25). Ideally, the network's outputs should be in a reasonable range, like [-1, 1]. See https://github.com/unixpickle/anyrl-py/blob/master/anyrl/spaces/continuous.py#L68

Support FusedRNNCell

This may make it possible to have more optimized backprop.

Can pandas version requirement be loosened?

I'm getting a version conflict when trying to install the games repo.

Move README TODO's to issues

Use huber loss in scalar DQN by default

Suggestions of writing some documentation

Hi unixpickle,
anyrl-py is the most complete and terse reinforcement learning frameworks I've ever seen.
It is a great work!
Would you like write some documentation to make anyrl-py more popular?

Best wishes,
Simon

Advantage-based reward normalization

Simplify RNN model using tf.boolean_mask?

This may allow us to completely remove the mask from TF actor-critics. This is obviously a big breaking change.

Implement TRPO

This is a "nice to have", but not a necessity.

pass dtype to gym.spaces.Box()

Show explained variation for PPO/A2C

Explained variation shows how much better your value function is than a simple mean estimator. It is a useful metric to tell how well your value function is actually doing. Right now, the critic loss value is pretty much useless.

ZeroDivisionError in noisy_net_dense

File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_dist.py", line 550, in isolated_net
aux = dense(aux_batch, dense_units, activation='relu')
File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_scalar.py", line 242, in noisy_net_dense
stddev = 1 / sqrt(num_inputs)
ZeroDivisionError: float division by zero

Zeros in feed_dict cause act and val to be NaN in feed-forward

In anyrl/models/feedforward_ac.py, the function step may assign zero-values in feed_dict.
This may cause self.session.run((self.actor_out, self.critic_out), feed_dict) to return act and val as arrays containing nan's that may eventually throw an exception.

In order to reproduce the following Traceback, simply modify the examples/cartpole.py, replacing 'CartPole-v0' with 'RoboschoolHumanoid-v1'.
(Installing Roboschool and including import roboschool may be necessary)

Traceback (most recent call last):
  File "/home/user/py/run.py", line 72, in <module>
    main()
  File "/home/user/py/cartpole.py", line 24, in main
    run_algorithm(algo)
  File "/home/user/py/cartpole.py", line 49, in run_algorithm
    rollouts = roller.rollouts()
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/rollouts/rollers.py", line 48, in rollouts
    model_out = self.model.step([obs], states)
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/models/feedforward_ac.py", line 60, in step
    'actions': self.action_dist.sample(act),
  File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/spaces/continuous.py", line 40, in sample
    return np.random.normal(loc=means, scale=stddevs)
  File "mtrand.pyx", line 1656, in mtrand.RandomState.normal
ValueError: scale < 0

Benchmarks for rollouts and training

Import and Export Rainbow's replay buffer

Greetings!

Do you have any suggestions to transfer Rainbow's replay buffer with the model checkpoints?

Thanks,
coast

Fix slice : position

A lot of places have x[start: end], with a space after the :. This is because I switched to autopep8, and it got rid of one space but not the other.

Support for dict spaces

PPO: add option for value-function clipping

Create legitimate documentation

In particular, it would be nice to turn the existing docstrings, classes, and functions into a browsable web-friendly format.

Quantile regression in Rainbow

Distributional Reinforcement Learning with Quantile Regression

This supposedly improves the performance of distributional DQN, so I'd like to add it as an option for distributional Q-networks.

Atomic save_vars

& tests

PPO: return list of dicts instead of tuples

Right now, we return (actor_loss, explained, entropy, clipped) tuples. This is too rigid, since it is quite possible that more keys will be added later, or in a subclass. For example, if we add value function clipping, we will want another field for that.

softmax_cross_entropy_with_logits deprecated

We should check if softmax_cross_entropy_with_logits_v2 exists, and use that if possible.

Use weights in distrib DQN loss

Unify CNN and MLP models with a single base class

Typo in `recurrent.py`

I said "statefull" when I meant "stateful".

Discussion: Import and Export Rainbow Models

Greetings,

I am looking into the usage of this library for a project. Specifically, I seek the ability to export a pre-trained model from the training set when using the Rainbow for use on another (unknown) test set.

Looking forward to your wisdom.

Thanks,
Coast

Conversion from rollouts to DQN transitions

This way, you can use a Roller in conjunction with DQN (instead of needing to use a Player).

Question: How to import/export model?

Hello
I was wondering if anyone could help me with exporting a model after training so it can continue to train afterwards. I'm new to tensorflow and using colab so help would be appriciated. I already saw the other issue saying to use savemodels of tensorflow but i don't understand how to use them correctly.

unixpickle / anyrl-py Goto Github PK

anyrl-py's People

Contributors

Stargazers

Watchers

Forkers

anyrl-py's Issues

Recommend Projects

Recommend Topics

Recommend Org