kngwyu / rainy Goto Github PK

View Code? Open in Web Editor NEW

36.0 4.0 5.0 1.19 MB

:umbrella: Deep RL agents with PyTorch:umbrella:

License: Apache License 2.0

Python 99.94% Makefile 0.06%

pytorch reinforcement-learning deep-reinforcement-learning dqn a2c acktr ppo td3 ddpg sac option-critic

rainy's Issues

Revisit Agent classes

Now we have two types of agents

OneStepAgent
- for DQN-like algorithms
- execute 1-step + stores transition to replay buffer + train agent by sampled transitions
NStepParallelAgent
- for A2C-like algorithms
- execute N-step in parallel environments + train the policy in an online manner
  These 2 divisions are practical but lack flexibility.
  E.g., we cannot extend OneStep algorithms to batched-parallel style without rewriting the whole process.

So we should re-define agent hierarchies using some important properties, like

Online/Offline(or use replay buffer or not)
MultiStep/OneStep
Not Parallel/Batch Parallel/ Async Parallel

Revisit experiment workflow

Use pandas/xarray instead of hand-made Log object
Integration with Jupyter
Real time experiment tracking

Save/Load running mean&std with model

To correctly evaluate PyBullet envs.

Customizable CLI

Our CLI tool should be more customizable like RLPy3's one.

Choose eval or not after training

There's environments only with VecEnv(like coinrun) so it's desirable to enable to set None as eval_env.

lr_cooler and clip_cooler doesn't work correctly

Rainy/rainy/config.py

Line 157 in 21987a3

def lr_cooler(self, initial: float) -> Cooler:

They subtracts (initial - minimum) / max_steps from the value every time they're called, but they are called only max_steps / (nstpes * nworkers) times in A2C, PPO and ACKTR.

This should be fixed and the benchmark in #8 should be done again.

Refactor files from baselines

rms.py and atari_wrappers.py are copied from openai baselines, and not well written.
They need refactoring.

Implement NormalizeReward/Obs for single Env

Now they're only in parallel wrappers

Add an example of overriding subcommands

Use gym.wrappers.AtariPreprocessing

https://github.com/openai/gym#what-s-new
It is introduced in v0.15.2.
Now we use wrappers ported from baselines, but it's good time to migrate.

Report running stats of loss

Currently loss is reported occasionally but we should compute running stats of loss and report it.

Report KL-divergence between old/new policy

It may be useful for debugging and parameter tuining

Errors in bootdqn_cartpole.py?

Hi @kngwyu,

When I ran your bootdqn_cartpole.py like this:

python examples\bootdqn_cartpole.py train

I got:

Traceback (most recent call last):
File "examples\bootdqn_cartpole.py", line 21, in
) -> Config:
File "C:\rainy\cli.py", line 227, in decorator
rainy_cli(obj=_CLIContext(f, agent, agent_selector, script_path))
File "C:\Miniconda3\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "C:\Miniconda3\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Miniconda3\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Miniconda3\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Miniconda3\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\Miniconda3\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "C:\rainy\cli.py", line 52, in train
experiment.train(eval_render=eval_render)
File "C:\rainy\experiment.py", line 97, in train
for res in self.ag.train_episodes(self.config.max_steps):
File "C:\rainy\agents\base.py", line 274, in train_episodes
self.store_transition(state, action, *transition[:-1]) # Do not pass info
TypeError: store_transition() takes 4 positional arguments but 6 were given

Thanks for your help!

BTW:
I can run your dqn_cartpole.py like this:
python examples\dqn_cartpole.py train
with no problems.

kngwyu / rainy Goto Github PK

rainy's Issues

Revisit Agent classes

Revisit experiment workflow

Save/Load running mean&std with model

Customizable CLI

Choose eval or not after training

lr_cooler and clip_cooler doesn't work correctly

Refactor files from baselines

Implement NormalizeReward/Obs for single Env

Add an example of overriding subcommands

Use gym.wrappers.AtariPreprocessing

Report running stats of loss

Fix dataparallel

Report KL-divergence between old/new policy

Errors in bootdqn_cartpole.py?

Evaluate modes for RNNs are broken

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent