Git Product home page Git Product logo

Comments (8)

KeirSimmons avatar KeirSimmons commented on June 24, 2024

Have left it running on Pong for 4 days now, 8-core CPU roughly 10,000 episodes each. Average reward (over last 100 episodes) has not increased even slightly. Sits around -20.5 to -21.0 (min reward).

This is also the same with the default vizdoom setup, min reward is achieved. Has this code-base been tested by anyone else?

from deeprl-agents.

MatheusMRFM avatar MatheusMRFM commented on June 24, 2024

I had the same problem for both the doom environment and for Pong. For the Doom environment, I only trained the agent using the parameters that were defined in the current version, but for the Pong environment, I tested several different networks architecture with different learning rates and optimizers.

But, in the end, I couldn't get it to work even after training for one day. In fact, it ended up converging to a policy where it always moves up or always moves down (not both). I then tried recreating this code from scratch, but ended up with the same problem (my code is in my github account).

I would really appreciate any hints.

from deeprl-agents.

DMTSource avatar DMTSource commented on June 24, 2024

I have also fought allot with using different environments and the network always going off to 1 action and accomplishing nothing. It works well enough for the provided doom example, but tuning the network for any other task seems oddly difficult. @IbrahimSobh really seemed to fight with this in the doom Health level. His use of Skip Frames helped save time, but the inability to learn such tasks had me wondering if there was a more fundamental problem going on.

from deeprl-agents.

MatheusMRFM avatar MatheusMRFM commented on June 24, 2024

I managed to get the network to learn a good policy for the given Doom environment. My mistake was that, in order to fix the Nan problem that the original version presents (this happens when the policy outputs a zero value for an action, resulting a nan after taking the log of zero), I added a small value to the policy (1e-8). But then I realized that this wasn't a very small value and it was interfering with the results. By changing the small value to 1e-13, the network converged for the Doom environment in about 6k episodes (for all threads).

But the problem still persists for the Pong environment.....I still can't get it to learn, since it is still converging to a bad policy where the agent executes only one action. I'm actually using a different network setup as the one used by the OpenAI gym's A3C implementation, which is 4 convolutional layers with 32 filters, 3x3 kernels and strides of 2, followed by the LSTM layer with an output of 256 (it doesn't use the hidden layer between the convolutional and LSTM layers). It also uses the Adam optimizer with a learning rate of 1e-4. But it still doesn't work.......

I really don't understand why it still doesn't work.....

from deeprl-agents.

zhaolewen avatar zhaolewen commented on June 24, 2024

Hello,
I'm implementing A3C without LSTM based on this repository and others. And one thing that I'm pretty sure that is broken is the shared optimizer.
I'm trying the SpaceInvaders-v0 env. When the optimizer object is passed naïvely to the threads, the game score stays pretty much at the level of random play. When it's one optimizer per thread, the current performance is about 280 points in average(random play is about 140).
And my code is still running, but the speed of growth is a bit disappointing, since I'm using 16 threads.

By the way, in Denny Britz's repository, it's separate optimizers per thread.
https://github.com/dennybritz/reinforcement-learning/blob/master/PolicyGradient/a3c/estimators.py
In this A3C repo developed with Pytorch, it's also using the normal optimizers one per thread, but the author has also written a shared version of Adam and RMSProp.
https://github.com/dgriff777/rl_a3c_pytorch/blob/master/shared_optim.py

from deeprl-agents.

mkisantal avatar mkisantal commented on June 24, 2024

@zhaolewen Which TF version are you using?

I also implemented my own repo based on this. With newer TF versions (>1.0) it seemed to work. However I had to revert to 0.8 due to some hardware issues (I'm trying to run my network on a TK1 board). Now it does not seem to learn anything... Of course I had to do tons of changes to make my code compatible with the old python APU, but I'm just wondering if this shared optimizer issue can be related to earlier TF versions.

from deeprl-agents.

zhaolewen avatar zhaolewen commented on June 24, 2024

Hi @mkisantal
I'm using TF 1.2 I think. Well if it's caused by difference between TF versions, that would be quite tricky ...

from deeprl-agents.

mkisantal avatar mkisantal commented on June 24, 2024

Now I'm running a test with separate optimizers, to see if it solves the issue. But yeah, there might be tons of other reasons for the problems I'm experiencing, as I reverted from 1.4 back to 0.8, and TF has been under a heavy development between these versions.

from deeprl-agents.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.