Continuous control with deep reinforcement learning

Implement DDPG ( Deep Deterministic Policy Gradient)

Experiments

Game	Epochs	Training Time	Model Parameters
MountainCarContinuous-v0	1000	30 min	299,032(total)
Pendulum-v0	1000	30 min	299,536(total)
3DBall	willbeupdated	willbeupdated	willbeupdated

Todo

solve the problem that if epochs are over 200, then the action is converged in wrong direction.
more games have to be tested.
parser

Update (2019.08.27)

Save error and notation fixed
argparser added

Update (2019.08.30)

replaybuffer.py's sampling method is changed.
new test result added.
pendulum-v0 is now testing.

Plot

MountainCarContinuous-v0

2019.08.27

As epochs over 200, all(train and test) models are diverged.
- i tried to adjust batch size, learning-rate, activation function, model size, noise size but it is not cleared.

2019.08.30

it doesn't converged at all.
- i tried almost same model maded by another people, it looks same i guess , but it looks converged. but my model didn't converged.

2019.08.30

i changed the training rate in Critic model at 0.001 to 0.0001(i have tried some points.)
- it shows that model can be trained well by adjusting the learning rate. i gain the idea from TRPO and PPO that the change of model of parameters is handled carefully.

Run

python main.py

If you want to change hyper-parameters, you can check "python main.py --help"

Options:

'--epochs', type=int, default=100, help='number of epochs, (default: 100)'
'--e', type=str, default='MountainCarContinuous-v0', help='environment name, (default: MountainCarContinuous-v0)' #- '--d', type=bool, default=False, help='train and test alternately. (default : False)'
'--t', type=bool, default=True, help="True if training, False if test. (default: True)"
'--r', type=bool, default=False, help='rendering the game environment. (default : False)'
'--b', type=int, default=128, help='train batch size. (default : 128)'
'--v', type=bool, default=False, help='verbose mode. (default : False)' #- '--n', type=bool, default=True, help='reward normalization. (default : True)'
'--sp', type=int, default=True, help='save point. epochs // sp. (default : 100)'

joeljosephjin / ddpg-mountain-car-continuous Goto Github PK

ddpg-mountain-car-continuous's Introduction

Continuous control with deep reinforcement learning

Experiments

Todo

Update (2019.08.27)

Update (2019.08.30)

Plot

MountainCarContinuous-v0

2019.08.27

2019.08.30

2019.08.30

Run

Reference

Version

ddpg-mountain-car-continuous's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent