Comments (6)
Hi Fausto,
It is a tricky thing, and if you have empirical results either way I would be interested in seeing what they suggest. The thinking behind using the tau for incremental updates is that the single big change to the target network in the original DQN architecture can actually be disruptive to training itself, since the target q values are now suddenly drawn from a potentially very different distribution than they were a moment ago. The idea behind utilizing tau is to eliminate this with a slow and smooth change over time. As you say though, the interpolation may result in target values the original q-network would never have produced. Overall though, I think this is alright, since it matters more that the target values are relatively stable and uncorrelated with the primary network more than it does that they are exactly correct. (The q values from the primary network aren't actually correct themselves, just closer approximations) If they are somewhat off that is okay, since they will eventually continue to be pushed in the right direction. I hope that long explanation provided some context. As I said earlier though, it may be that in certain cases one ends up working better than the other depending on the specific task.
I will however change the wording in the notebook in order to make it clear that the tau updating strategy is being employed.
from deeprl-agents.
Hello Arthur,
Thank you for your answer. I will try to come up with some results today using your simple grid world env.
In relation to the first question in my message, do you confirm that the statement that you do right at the beginning of training (right after sess.run(init)) to make the two networks (main and target equal) should actually be applied with tau != 1.0?
from deeprl-agents.
I apologize for not realizing you were referring specifically to the initial setting of the networks. You are right that they should be initialized to the same values at the beginning of training. Though, since those values are initialized as random in both cases (and as such produce random Q values), it may not make much of an empirical difference.
from deeprl-agents.
from deeprl-agents.
I have tried your simple grid world, size 5x5, 10K iterations of pre-training, 10K iteration for chance of random action annealing, experience replay buff 250K, learning rate 0.0001, tau 0.001
It seems that making the network equal at the beginning of training actually harms performance.
In orange: the performance when at the beginning we perform just a standard update with tau = 0.001.
In purple: the performance when at the beginning we make the network equal with tau = 1.0 and then use the usual tau = 0.001.
In green: the performance when at the beginning we don't do any parameter update.
Hopefully I have no bugs around that influence the performance of this evaluation.
(I can think of reasons why the loss would go up a little after going down, now i'm looking into checking that it still goes down as the algorithm runs further)
from deeprl-agents.
Thanks for running these experiments! It looks like having the networks start correlated is indeed detrimental. I am going to remove the initial update line.
from deeprl-agents.
Related Issues (20)
- _ HOT 1
- simple and odd python problem HOT 2
- Double-Dueling-DQN: question about the rate to update target network
- Double-Dueling-DQN stops learning
- Can't see the source code. HOT 2
- checkGoal() in gridworld.py
- apply_gradients need a lock?
- A3C-Doom, is threading can make real parallelism?
- Please add more comments..
- Target network updates / Double-Dueling-DQN.ipynb HOT 1
- A3C Doom : function error
- DRQN plays FlappyBird
- what is the mean of multiply (1./(i+1))?
- Reward Smoothing
- A3C Doom: Why there should be no more workers than there are threads on CPU?
- How to do twice training session for the same buffer
- scipy.misc.imresize is deprecated in Scipy 1.14.3 --> modified code HOT 1
- A garbage code in Model-Network.ipynb
- Issue in DRQN
- Crash and burn in TF 2.0 and alter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeprl-agents.