Comments (6)
Yes, you are right, that's the discounted returns.
Btw, before training the policy, it performs a reward scaling:
tianshou/tianshou/policy/modelfree/pg.py
Line 74 in 205698d
I'll add an argument
--rew-norm
later in test_pg.py.from tianshou.
Thanks a lot! :)
from tianshou.
Assume we have a trajectory with 4 timesteps: s0, s1, s2, s3
timestep | 0 | 1 | 2 | 3 |
---|---|---|---|---|
reward | r0 | r1 | r2 | r3 |
done | 0 | 0 | 0 | 1 |
Once we have a discount factor g (in test_pg.py it is 0.9), we compute the returns as:
G3 = r3
G2 = r2 + r3 * g
G1 = r1 + r2 * g + r3 * g^2
G0 = r0 + r1 * g + r2 * g^2 + r3 * g^3
The compute_episodic_return
does this thing, in other words, batch.returns
== G
.
Tianshou collects different trajectories but stores them chronologically. You can check out the Collector's documentation for more details.
from tianshou.
Thanks! So are all elements of batch.returns the discounted reward of a trajectory, like G0? Or may some elements of batch.returns be the discounted reward of a part of trajectory, like G1,G2,G3?
What's more, I'd like to ask about why using rew-norm? I try to not use rew-norm and find pg will converge much more slowly than using rew-norm. Thanks a lot!
from tianshou.
len(G) == len(r)
If the trajectory has T timesteps, the G_i is computed over r_i, r_{i+1}, r_{i+2}, ..., r_{T-1}
The network training procedure is quite unstable under no-norm input. This is just a code-level trick, many papers do not openly mention it.
from tianshou.
Thanks a lot!
from tianshou.
Related Issues (20)
- Question: Is Recurrent net supported for FQF
- Chinese document pages return 404 HOT 4
- data recording and saving method HOT 4
- Typing annotations of step from MyTestEnv is incompatible with its current subclass gym.Env because it can generate non-scalar rewards.
- How to monitor the episode/epoch return/length in Tianshou? HOT 1
- Replicating results in collect random operations through seed setting HOT 2
- Batch: deprecate setattr HOT 1
- Batch: don't create new objects on getitem HOT 8
- Batch: only allow entries with the same length HOT 3
- Batch: don't just set 0 when elements have None entries HOT 8
- Batch: don't just strip off empty entries when creating batches HOT 5
- Buffer: fix discrepancy in slicing order HOT 2
- Better interfaces and names for Actor, Critic, Net and other classes
- Reduce duplication between examples/atari/atari_network and examples/vizdoom/network HOT 1
- Fix docstring in BranchingNet
- Re-examine the need of utils.net.common.DataParallelNet
- Re-examine the whole state story for RNNs
- Don't pass envpool envs where vectorenvs are needed
- Missing Link HOT 5
- AttributeError: 'PPOPolicy' object has no attribute 'set_eps' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tianshou.