Hey Kevin, I hope you are doing well. I noticed a small bug where the step functio

Replacing `done` with `truncated` and `terminated` about dtqn HOT 6 OPEN

ashok-arora commented on August 22, 2024

Replacing `done` with `truncated` and `terminated`

from dtqn.

Comments (6)

kevslinger commented on August 22, 2024

Hi Ashok,

This is a somewhat complicated issue. In the older versions of gym, this truncated variable was stored inside info (see

DTQN/run.py

Line 371 in 79ccf8b

if info.get("TimeLimit.truncated", False):

). This new gym version with 5 returns values is a breaking change, and updating to it would mean one could no longer run DTQN on the original environments. At the same time, leaving it in its current state would mean you cannot run DTQN on "modern" (updated) environments.

With that in mind, there are a few ways we can proceed with this:
1.) Do nothing, and leave the repo in its current (outdated) state.
2.) Update the repo to account for the new version of gym, while keeping the paper branch in its original, old form. Then experiments could be run on the main branch with new environments, and the paper branch with old environments.
3.) Create a new branch from either the main branch and/or paper branch to accommodate for new gym environments and allow people to switch between branches based on the version of their environment.
4.) Come up with a sort of gym wrapper to interface between the two styles of environments. We could either wrap the old environments so that they now return obs, reward, terminated, truncated, info instead of obs, reward, done, info, and then update the codebase accordingly, OR we could wrap the new environments so that they return obs, reward, done, info and put truncated inside the info variable, the same as the old method.

Probably option 4 is the best way. Let me know what you think. I of course will welcome pull requests towards this issue

from dtqn.

ashok-arora commented on August 22, 2024

Thank you for the detailed response.

Yeah, the fourth option seems like the straightforward way since splitting branches may lead to confusion for future users. To maintain compatibility, we can pass in an arg of maybe --gym old (returns obs, reward, done, info) or --gym new (returns obs, reward, terminated, truncated, info) that handles it accordingly, with --gym old being the default to maintain compatibility with the paper branch.

On a sidenote, I am wondering how

DTQN/run.py

Line 371 in 79ccf8b

if info.get("TimeLimit.truncated", False):

is able to calculate the maximum time that should be decided for the truncated part. Would it make more sense to define the time limit ourselves depending upon the the velocity and the distance between heaven and hell?

from dtqn.

kevslinger commented on August 22, 2024

That sounds like a good idea, we can use the gym-version flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.

As for the TimeLimit.truncated part, that is a bit of old gym magic -- by supplying the max_episode_steps argument when adding an environment to the gym register, the backend wraps the environment in a TimeLimitWrapper which then ends the episode and sets TimeLimit.truncated: True in the info as part of the return of the step function. For example,

DTQN/envs/__init__.py

Line 47 in 79ccf8b

max_episode_steps=200,

means the episode will always end after 200 steps. I don't think we need to do anything to change that

from dtqn.

ashok-arora commented on August 22, 2024

That sounds like a good idea, we can use the gym-version flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.

Okay sure, I'll send in a PR with the gym-version flag.

As for the TimeLimit.truncated part, that is a bit of old gym magic -- by supplying the max_episode_steps argument when adding an environment to the gym register, the backend wraps the environment in a TimeLimitWrapper which then ends the episode and sets TimeLimit.truncated: True in the info as part of the return of the step function. For example,

DTQN/envs/__init__.py

Line 47 in 79ccf8b

max_episode_steps=200,

means the episode will always end after 200 steps. I don't think we need to do anything to change that

and what is the basis for the choice of 200 for max_episode_steps, and similarly other values for the other environments?

from dtqn.

kevslinger commented on August 22, 2024

In general, we want to give the agents the ability to successfully end an episode while exploring, so it the max_episode_steps needs to be large enough such that this is possible. However, we don't want to allow the episodes to go on forever, as this may fill the replay buffer with many useless experiences. This is not a value I spent a lot of time tuning

from dtqn.

ashok-arora commented on August 22, 2024

Apologies for the delay. I got sidetracked due to coursework. I am working on this and the info of gridverse envs is an empty dict {} for all the timesteps, is it supposed to be so?

from dtqn.

Replacing `done` with `truncated` and `terminated` about dtqn HOT 6 OPEN

Comments (6)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent