Git Product home page Git Product logo

Comments (27)

nakosung avatar nakosung commented on June 19, 2024

Google released their deep mind's playing atari code which is written in lua/theano.

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

Do you have a link for their code?

from dqn-in-the-caffe.

AjayTalati avatar AjayTalati commented on June 19, 2024

https://sites.google.com/a/deepmind.com/dqn/

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

@arashno
Your results are so bad that I think there must be some problems. Which version of Caffe & ALE are you using? My Pong experiment used:

Unfortunately, training will not stop at the moment. You can easily modify dqn_main.cpp to make it stop by checking dqn.current_iteration(), though.

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@muupan
I am using your recommended version of caffe and ALE 0.4.4
It seems that there is a problem with ALE randomization, ALE shows that it is using TIME random seed but each time I want to evaluate my trained networks ALE plays the exact same game.
How can I fix this problem?
Thanks

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@muupan
Still I have problem with the results.
Could you please help me and also provide me your own results?
Thanks

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

Sorry for no reply. Could you give it a try with this trained model?
https://www.dropbox.com/s/vrvpu69d1cr4a7d/ec2_pong_5m.caffemodel?dl=0
./dqn -gui -evaluate -model ec2_pong_5m.caffemodel -rom pong.bin
It's not the one used in the demo video, but it can also successfully play pong in my PC. If it works in your environment too, the problem is just about training.

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@muupan
Thanks for your reply.
I tried your trained model.
The score was +17(21 to 4) for DQN so it works.
I ran the model several times, and the model was playing the exact same game every time.
Actually I expect some level of variations between different runs. Is this normal for the code or something is wrong about my environment?
My second question is that what could be wrong about my training?
My last question is about parameters, the Lua code (see the above comments for the link) is using slightly different parameters for training, What parameters should I use to achieve good results?
(From the name of your trained model file it seems that you have used 5 million iterations which is different from the default parameters. are you using different parameter set?)
Thanks

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

There's at least two random generators in the program: the one used in ALE and the one in DQN. The ALE random generator might not affect the results at all because pong is probably a deterministic game. The seed of DQN random generator is set to zero in the constructor of DQN class, so it will choose actions in the same way between different runs if the network parameters are the same. You can change that behavior by changing the seed value by modifying the code.

I don't have any clear idea of what can be wrong about your training. If your five trained nets are completely the same, try other seed values; it's possible that you were just too unlucky.

My uploaded model used slightly different parameters:

net: "dqn.prototxt"
solver_type: ADADELTA
momentum: 0.95
base_lr: 0.2
lr_policy: "fixed"
stepsize: 10000000
max_iter: 10000000
display: 100
snapshot: 50000
snapshot_prefix: "dqn"

This solver doesn't decay the learning rate, and training lasts 10 million iterations. According to my observation, using this will eventually result in better results than using the default one.

There's many differences between their lua code and mine, not only in parameter values but also in algorithm details. For example, they use RMSProp for optimization while mine uses AdaDelta.

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

Based on my experience RMSProp outperforms.

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

@nakosung
You're probably right given that the DeepMind people are still using RMSProp in their Nature paper. I chose AdaDelta only because it was available as a PR at that time.

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@muupan My RMSprop implementation is available on nakosung/caffe@1509647963e. It is a little bit weird because of 'fluent pattern' I introduced.

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@nakosung @muupan
Thanks for your Comments.
Is there any other major differences between the Lua code and this code?
I tried to use MSPROP, I downloaded nakosung/caffe@1509647 (I mean the whole branch)and make it, the source files seems OK but when I am trying to run it gives me this error:
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 3:1: Unknown enumeration value of "RMSPROP" for field "solver_type".
What's wrong with it?

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

Also the proto file seems OK because it contains RMSPROP = 4; in line 150.
I have cleaned and rebuild both caffe and dqn-in-caffe, But still the problem exist

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@arashno Maybe the changelist does not make a good build. Could you try another newer changelist? Sorry for your inconvenience. (Or maybe your repo contains two different versions of proto generated file-set)

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@nakosung
Sorry
There was a problem with my rebuild.
When I am using MSPROP, the Q-values go very large!
Thanks

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@arashno exploding network is a common problem for DQN because it is iterative. You can try various techniques to avoid it. (like dropout)

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@nakosung
But I am just trying to replicate the original paper.
The authors have used RMSPROP and they didn't used dropout.
Also they didn't mentioned any other technique to avoid exploding.

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

@arashno
You need to carefully select learning rate and discount factor for Q-learning, otherwise Q-values can diverge.

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@arashno My rmsprop implementation requires tiny learning rate.

For my experience training DQN isn't so straightforward like other well known problems. Because the process is iterative, some tiny difference can lead to divergence. If you want to reproduce deep-mind paper's work, I would recommend try their implementation.

from dqn-in-the-caffe.

muupan avatar muupan commented on June 19, 2024

@mohammad63
I have no experience with @nakosung 's RMSProp implementation, so don't ask me how to compile it.

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@nakosung
Did you tried your RMSProp implementation with this DQN code?
What parameters did you use?
How good was the results?
Is RMSProp working better for you than ADADELTA?
Thanks

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@arashno I had not tried with muupan's dqn. I'm doing my experiment with my own implementation. Parameters I used is like: lr = 0.001, momentum ~= 0.6, but I don't remember what parameter was used for rmsprop factor. As for my experience ADADELTA doesn't seem to be good as RMSProp for keeping network healthier(because it is more sensitive to glitches)

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

@nakosung
Is your implementation of DQN is publicly available?

from dqn-in-the-caffe.

nakosung avatar nakosung commented on June 19, 2024

@arashno unfortunately, no

from dqn-in-the-caffe.

arashno avatar arashno commented on June 19, 2024

It seems that I was too unlucky, results are acceptable now.

from dqn-in-the-caffe.

toweln avatar toweln commented on June 19, 2024

i ran the above ec2_pong_5m.caffemodel and i always get -21. what could be the problem?

from dqn-in-the-caffe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.