I ran the notebook without any changes on the vizdoom environment. After around an hou

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Fails to learn about deeprl-agents HOT 8 OPEN

awjuliani commented on June 24, 2024

Fails to learn

from deeprl-agents.

Comments (8)

KeirSimmons commented on June 24, 2024

Have left it running on Pong for 4 days now, 8-core CPU roughly 10,000 episodes each. Average reward (over last 100 episodes) has not increased even slightly. Sits around -20.5 to -21.0 (min reward).

This is also the same with the default vizdoom setup, min reward is achieved. Has this code-base been tested by anyone else?

from deeprl-agents.

MatheusMRFM commented on June 24, 2024

I had the same problem for both the doom environment and for Pong. For the Doom environment, I only trained the agent using the parameters that were defined in the current version, but for the Pong environment, I tested several different networks architecture with different learning rates and optimizers.

But, in the end, I couldn't get it to work even after training for one day. In fact, it ended up converging to a policy where it always moves up or always moves down (not both). I then tried recreating this code from scratch, but ended up with the same problem (my code is in my github account).

I would really appreciate any hints.

from deeprl-agents.

DMTSource commented on June 24, 2024

I have also fought allot with using different environments and the network always going off to 1 action and accomplishing nothing. It works well enough for the provided doom example, but tuning the network for any other task seems oddly difficult. @IbrahimSobh really seemed to fight with this in the doom Health level. His use of Skip Frames helped save time, but the inability to learn such tasks had me wondering if there was a more fundamental problem going on.

from deeprl-agents.

MatheusMRFM commented on June 24, 2024

I managed to get the network to learn a good policy for the given Doom environment. My mistake was that, in order to fix the Nan problem that the original version presents (this happens when the policy outputs a zero value for an action, resulting a nan after taking the log of zero), I added a small value to the policy (1e-8). But then I realized that this wasn't a very small value and it was interfering with the results. By changing the small value to 1e-13, the network converged for the Doom environment in about 6k episodes (for all threads).

But the problem still persists for the Pong environment.....I still can't get it to learn, since it is still converging to a bad policy where the agent executes only one action. I'm actually using a different network setup as the one used by the OpenAI gym's A3C implementation, which is 4 convolutional layers with 32 filters, 3x3 kernels and strides of 2, followed by the LSTM layer with an output of 256 (it doesn't use the hidden layer between the convolutional and LSTM layers). It also uses the Adam optimizer with a learning rate of 1e-4. But it still doesn't work.......

I really don't understand why it still doesn't work.....

from deeprl-agents.

zhaolewen commented on June 24, 2024

Hello,
I'm implementing A3C without LSTM based on this repository and others. And one thing that I'm pretty sure that is broken is the shared optimizer.
I'm trying the SpaceInvaders-v0 env. When the optimizer object is passed naïvely to the threads, the game score stays pretty much at the level of random play. When it's one optimizer per thread, the current performance is about 280 points in average(random play is about 140).
And my code is still running, but the speed of growth is a bit disappointing, since I'm using 16 threads.

By the way, in Denny Britz's repository, it's separate optimizers per thread.
https://github.com/dennybritz/reinforcement-learning/blob/master/PolicyGradient/a3c/estimators.py
In this A3C repo developed with Pytorch, it's also using the normal optimizers one per thread, but the author has also written a shared version of Adam and RMSProp.
https://github.com/dgriff777/rl_a3c_pytorch/blob/master/shared_optim.py

from deeprl-agents.

mkisantal commented on June 24, 2024

@zhaolewen Which TF version are you using?

I also implemented my own repo based on this. With newer TF versions (>1.0) it seemed to work. However I had to revert to 0.8 due to some hardware issues (I'm trying to run my network on a TK1 board). Now it does not seem to learn anything... Of course I had to do tons of changes to make my code compatible with the old python APU, but I'm just wondering if this shared optimizer issue can be related to earlier TF versions.

from deeprl-agents.

zhaolewen commented on June 24, 2024

Hi @mkisantal
I'm using TF 1.2 I think. Well if it's caused by difference between TF versions, that would be quite tricky ...

from deeprl-agents.

mkisantal commented on June 24, 2024

Now I'm running a test with separate optimizers, to see if it solves the issue. But yeah, there might be tons of other reasons for the problems I'm experiencing, as I reverted from 1.4 back to 0.8, and TF has been under a heavy development between these versions.

from deeprl-agents.

Fails to learn about deeprl-agents HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent