Git Product home page Git Product logo

Comments (6)

Trinkle23897 avatar Trinkle23897 commented on June 30, 2024

Yes, you are right, that's the discounted returns.
Btw, before training the policy, it performs a reward scaling:

batch.returns = (r - r.mean()) / (r.std() + self._eps)

I'll add an argument --rew-norm later in test_pg.py.

from tianshou.

yingchengyang avatar yingchengyang commented on June 30, 2024

Thanks a lot! :)

from tianshou.

Trinkle23897 avatar Trinkle23897 commented on June 30, 2024

Assume we have a trajectory with 4 timesteps: s0, s1, s2, s3

timestep 0 1 2 3
reward r0 r1 r2 r3
done 0 0 0 1

Once we have a discount factor g (in test_pg.py it is 0.9), we compute the returns as:
G3 = r3
G2 = r2 + r3 * g
G1 = r1 + r2 * g + r3 * g^2
G0 = r0 + r1 * g + r2 * g^2 + r3 * g^3
The compute_episodic_return does this thing, in other words, batch.returns == G.

Tianshou collects different trajectories but stores them chronologically. You can check out the Collector's documentation for more details.

from tianshou.

yingchengyang avatar yingchengyang commented on June 30, 2024

Thanks! So are all elements of batch.returns the discounted reward of a trajectory, like G0? Or may some elements of batch.returns be the discounted reward of a part of trajectory, like G1,G2,G3?

What's more, I'd like to ask about why using rew-norm? I try to not use rew-norm and find pg will converge much more slowly than using rew-norm. Thanks a lot!

from tianshou.

Trinkle23897 avatar Trinkle23897 commented on June 30, 2024

len(G) == len(r)
If the trajectory has T timesteps, the G_i is computed over r_i, r_{i+1}, r_{i+2}, ..., r_{T-1}
The network training procedure is quite unstable under no-norm input. This is just a code-level trick, many papers do not openly mention it.

from tianshou.

yingchengyang avatar yingchengyang commented on June 30, 2024

Thanks a lot!

from tianshou.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.