unixpickle / anyrl-py Goto Github PK
View Code? Open in Web Editor NEWA reinforcement learning framework
A reinforcement learning framework
Right now, there's three different types a model state can be: NoneType
, an array, or a tuple of arrays. In the future, it would be nice to unify this.
I propose that states will always be tuples. An empty tuple corresponds to a stateless model; otherwise, the tuple may contain one element (the standard state case), or more than one element. Also, instead of a stateful
property, there could just be a num_states
property that indicates how many tuple elements are in the state.
In MPIOptimizer.sync_from_root()
, we create a constant and only use it once. It would be nice if we created some placeholders and re-used them for every sync_from_root. This would also avoid filling up the graph with useless constants.
Some RNNCell
s, like the WaveNet cell implemented in this repository, encode a queue in their recurrent hidden state. For example, the first state might be (A, B, C, D)
, then the next state might be (B, C, D, E)
, etc. Currently, models like RNNCellAC
will not be able to optimize memory usage in this case.
Thus, I propose a stateful model that wraps another model and re-uses state arrays when possible. It should guarantee that, if a numpy array in the new state matches a numpy array in the previous state, then the old numpy array is reused in the new state. This should greatly reduce memory consumption for WaveNet when doing rollouts and saving the states in model_outs
.
Let's say we have a gaussian scaled to the range -50, 50. In order to get the mean to 25, 25, the network would still have to output a large number (25). Ideally, the network's outputs should be in a reasonable range, like [-1, 1]. See https://github.com/unixpickle/anyrl-py/blob/master/anyrl/spaces/continuous.py#L68
This may make it possible to have more optimized backprop.
I'm getting a version conflict when trying to install the games repo.
Hi unixpickle,
anyrl-py is the most complete and terse reinforcement learning frameworks I've ever seen.
It is a great work!
Would you like write some documentation to make anyrl-py more popular?
Best wishes,
Simon
This may allow us to completely remove the mask from TF actor-critics. This is obviously a big breaking change.
This is a "nice to have", but not a necessity.
Explained variation shows how much better your value function is than a simple mean estimator. It is a useful metric to tell how well your value function is actually doing. Right now, the critic loss value is pretty much useless.
File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_dist.py", line 550, in isolated_net
aux = dense(aux_batch, dense_units, activation='relu')
File "C:\Users\simon\Desktop\anyrl\anyrl\models\dqn_scalar.py", line 242, in noisy_net_dense
stddev = 1 / sqrt(num_inputs)
ZeroDivisionError: float division by zero
In anyrl/models/feedforward_ac.py
, the function step
may assign zero-values in feed_dict
.
This may cause self.session.run((self.actor_out, self.critic_out), feed_dict)
to return act
and val
as arrays containing nan
's that may eventually throw an exception.
In order to reproduce the following Traceback, simply modify the examples/cartpole.py
, replacing 'CartPole-v0'
with 'RoboschoolHumanoid-v1'
.
(Installing Roboschool and including import roboschool
may be necessary)
Traceback (most recent call last):
File "/home/user/py/run.py", line 72, in <module>
main()
File "/home/user/py/cartpole.py", line 24, in main
run_algorithm(algo)
File "/home/user/py/cartpole.py", line 49, in run_algorithm
rollouts = roller.rollouts()
File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/rollouts/rollers.py", line 48, in rollouts
model_out = self.model.step([obs], states)
File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/models/feedforward_ac.py", line 60, in step
'actions': self.action_dist.sample(act),
File "/home/user/py/env/lib64/python3.6/site-packages/anyrl/spaces/continuous.py", line 40, in sample
return np.random.normal(loc=means, scale=stddevs)
File "mtrand.pyx", line 1656, in mtrand.RandomState.normal
ValueError: scale < 0
Greetings!
Do you have any suggestions to transfer Rainbow's replay buffer with the model checkpoints?
Thanks,
coast
A lot of places have x[start: end]
, with a space after the :
. This is because I switched to autopep8, and it got rid of one space but not the other.
In particular, it would be nice to turn the existing docstrings, classes, and functions into a browsable web-friendly format.
Distributional Reinforcement Learning with Quantile Regression
This supposedly improves the performance of distributional DQN, so I'd like to add it as an option for distributional Q-networks.
& tests
Right now, we return (actor_loss, explained, entropy, clipped)
tuples. This is too rigid, since it is quite possible that more keys will be added later, or in a subclass. For example, if we add value function clipping, we will want another field for that.
We should check if softmax_cross_entropy_with_logits_v2 exists, and use that if possible.
I said "statefull" when I meant "stateful".
Greetings,
I am looking into the usage of this library for a project. Specifically, I seek the ability to export a pre-trained model from the training set when using the Rainbow for use on another (unknown) test set.
Looking forward to your wisdom.
Thanks,
Coast
This way, you can use a Roller in conjunction with DQN (instead of needing to use a Player).
Hello
I was wondering if anyone could help me with exporting a model after training so it can continue to train afterwards. I'm new to tensorflow and using colab so help would be appriciated. I already saw the other issue saying to use savemodels of tensorflow but i don't understand how to use them correctly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.