Git Product home page Git Product logo

Comments (4)

ranahanocka avatar ranahanocka commented on August 20, 2024

Any update on this? I am also wondering the same thing.

If this implementation uses rewards to train the LSTM then it is no longer a model-based approach (which is very different from the whole concept of the paper).

from world-models.

ctallec avatar ctallec commented on August 20, 2024

We did model the reward as well in the previous version. This does not change the fact that this is a model-based approach. The reward is just included in the things about the world we aim at modelling. Besides, I've just pushed a version where the reward is no longer modelled by default. I don't expect that to significantly change the results.

from world-models.

ranahanocka avatar ranahanocka commented on August 20, 2024

Thanks for the reply. But I don't understand how the rewards that drive the controller would also drive the model of the world (the dynamics). A huge advantage of model based RL is the separation of the rewards and the dynamics, right?

For example, the trained LSTM will be used to train the controller to lane follow. Suppose the LSTM also used the lane following reward during training, but now if I wanted a different policy (e.g., have the car drive on the left side of the road, because it's England), it wouldn't be enough to just retrain the controller. Now my dynamics may not work for this new policy, since the rewards changed.

RE:your new fix
BTW - the MDRNN network (models/mdrnn.py) still regresses a reward.

from world-models.

ctallec avatar ctallec commented on August 20, 2024

What you might be missing here is that the model is trained to predict the reward, not to optimize it. Modelling the problem's reward is unlikely to degrade your dynamic modelisation; if the dynamic is perfectly modelled, then predicting the reward should not be problematic, and should come at little computational overhead (since the reward is mostly a deterministic function of the dynamic hidden state). On the other hand, if the dynamic is very hard to model, you are giving hints to the network as to what part of the environment may be of use for your own task, and probably for many other tasks as well.

Predicting the reward for one task may be useful in a wide variety of tasks, other than the original one. In general, model-based RL is all about predicting things that are not necessarily directly related to your own task, but which could help as side info. Typically, in your example, if you want both to move forward, and to remain on the left side of the road, having access to variables that are predictive of wether you are moving forward or not (basically the variables you learnt from modelling the reward of the original task) is likely to be useful. This is quite general: if your task is to learn how to walk, having access to variables that are predictive of how high your head is (which is a proper reward for the task "Standing up") is also likely to be useful. Overall, what I am saying here is that by additionnally modelling the original task reward, we are not restraining the model, since we are only telling it to predict more things, not less. The only problem that there could be here is a problem of network capacity: the network could be only capable of modelling either the dynamic or the reward, but not both. As the reward is a simple function of the dynamic, and is low dimensional compared to the latent dynamic, my guess is that this is not the bottleneck here. Hope this makes things clearer. If not I'd be happy to discuss this more.

Besides, if I haven't made any mistake in the code (entirely possible), the reward is no longer regressed. The reward prediction loss is zeroed by default in trainmdrnn.py (l137-142). You still have a network head that could be used to predict the reward, but this head is no longer trained, and the corresponding error is no longer backpropagated in the LSTM:
if include_reward:
mse = f.mse_loss(rs, reward)
scale = LSIZE + 2
else:
mse = 0
scale = LSIZE + 1

from world-models.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.