Git Product home page Git Product logo

Comments (6)

kevslinger avatar kevslinger commented on August 22, 2024

Hi Ashok,

This is a somewhat complicated issue. In the older versions of gym, this truncated variable was stored inside info (see

DTQN/run.py

Line 371 in 79ccf8b

if info.get("TimeLimit.truncated", False):
). This new gym version with 5 returns values is a breaking change, and updating to it would mean one could no longer run DTQN on the original environments. At the same time, leaving it in its current state would mean you cannot run DTQN on "modern" (updated) environments.

With that in mind, there are a few ways we can proceed with this:
1.) Do nothing, and leave the repo in its current (outdated) state.
2.) Update the repo to account for the new version of gym, while keeping the paper branch in its original, old form. Then experiments could be run on the main branch with new environments, and the paper branch with old environments.
3.) Create a new branch from either the main branch and/or paper branch to accommodate for new gym environments and allow people to switch between branches based on the version of their environment.
4.) Come up with a sort of gym wrapper to interface between the two styles of environments. We could either wrap the old environments so that they now return obs, reward, terminated, truncated, info instead of obs, reward, done, info, and then update the codebase accordingly, OR we could wrap the new environments so that they return obs, reward, done, info and put truncated inside the info variable, the same as the old method.

Probably option 4 is the best way. Let me know what you think. I of course will welcome pull requests towards this issue

from dtqn.

ashok-arora avatar ashok-arora commented on August 22, 2024

Thank you for the detailed response.

Yeah, the fourth option seems like the straightforward way since splitting branches may lead to confusion for future users. To maintain compatibility, we can pass in an arg of maybe --gym old (returns obs, reward, done, info) or --gym new (returns obs, reward, terminated, truncated, info) that handles it accordingly, with --gym old being the default to maintain compatibility with the paper branch.

On a sidenote, I am wondering how

DTQN/run.py

Line 371 in 79ccf8b

if info.get("TimeLimit.truncated", False):

is able to calculate the maximum time that should be decided for the truncated part. Would it make more sense to define the time limit ourselves depending upon the the velocity and the distance between heaven and hell?

from dtqn.

kevslinger avatar kevslinger commented on August 22, 2024

That sounds like a good idea, we can use the gym-version flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.

As for the TimeLimit.truncated part, that is a bit of old gym magic -- by supplying the max_episode_steps argument when adding an environment to the gym register, the backend wraps the environment in a TimeLimitWrapper which then ends the episode and sets TimeLimit.truncated: True in the info as part of the return of the step function. For example,

max_episode_steps=200,
means the episode will always end after 200 steps. I don't think we need to do anything to change that

from dtqn.

ashok-arora avatar ashok-arora commented on August 22, 2024

That sounds like a good idea, we can use the gym-version flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.

Okay sure, I'll send in a PR with the gym-version flag.

As for the TimeLimit.truncated part, that is a bit of old gym magic -- by supplying the max_episode_steps argument when adding an environment to the gym register, the backend wraps the environment in a TimeLimitWrapper which then ends the episode and sets TimeLimit.truncated: True in the info as part of the return of the step function. For example,

max_episode_steps=200,

means the episode will always end after 200 steps. I don't think we need to do anything to change that

and what is the basis for the choice of 200 for max_episode_steps, and similarly other values for the other environments?

from dtqn.

kevslinger avatar kevslinger commented on August 22, 2024

In general, we want to give the agents the ability to successfully end an episode while exploring, so it the max_episode_steps needs to be large enough such that this is possible. However, we don't want to allow the episodes to go on forever, as this may fill the replay buffer with many useless experiences. This is not a value I spent a lot of time tuning

from dtqn.

ashok-arora avatar ashok-arora commented on August 22, 2024

Apologies for the delay. I got sidetracked due to coursework. I am working on this and the info of gridverse envs is an empty dict {} for all the timesteps, is it supposed to be so?

from dtqn.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.