Git Product home page Git Product logo

Comments (5)

aviralkumar2907 avatar aviralkumar2907 commented on July 17, 2024 2

The data points are in the correct order as trajectories, so obs[t+1] is the next state after t. However, MuJoCo environments do not return terminal=True when a trajectory ends due to a timeout. This induces non-Markovian dependencies where the reward-to-go is dependent on how many steps to the goal are remaining.

As a sample loading function, we have provided a function here: https://github.com/rail-berkeley/d4rl_evaluations/blob/master/bear/examples/bear_hdf5_d4rl.py#L18

(note the for loop that appropriately accounts for termination due to a timeout vs a termination due to a termination flag).

from d4rl.

aviralkumar2907 avatar aviralkumar2907 commented on July 17, 2024

I am closing this issue for now, but let us know if there are any concerns.

from d4rl.

justinjfu avatar justinjfu commented on July 17, 2024

We're adding a new function in #36

from d4rl.

Wenxuan-Zhou avatar Wenxuan-Zhou commented on July 17, 2024

It seems the next_observation is not correctly aligned with observation. For example, I use "dataset = d4rl.qlearning_dataset(env)" to load the hopper-expert-v0 dataset, I get:

In [42]: dataset['observations'][1306]
Out[42]:
array([ 1.1960315 , -0.12238141, -0.27723378, -0.1835278 , 0.7594525 ,
3.751946 , -1.1062441 , 0.8567439 , -2.6299708 , 0.46845496,
-2.8050947 ], dtype=float32)

In [43]: dataset['next_observations'][1306]
Out[43]:
array([ 1.2480359e+00, -5.7157112e-04, -2.6452148e-03, -3.2997034e-03,
-2.6625939e-04, -1.1605565e-03, 1.4991794e-03, 4.3500989e-04,
-4.7029392e-03, -6.1305630e-04, -1.0700644e-03], dtype=float32)

In [45]: dataset['observations'][308]
Out[45]:
array([ 1.2471584 , -0.15857491, -0.5438693 , 0.00728944, 0.743322 ,
2.6159544 , -2.4281225 , -0.06683858, 0.5487718 , -0.1945495 ,
-3.872534 ], dtype=float32)

In [46]: dataset['next_observations'][308]
Out[46]:
array([ 1.2462358e+00, -3.1487644e-03, 5.2377465e-04, -3.9951322e-03,
2.3823448e-03, 3.3964110e-03, 1.5305470e-03, -1.3778923e-03,
9.2762033e-04, 3.9863074e-04, 9.7913540e-04], dtype=float32)

In [47]: dataset['terminals'][308]
Out[47]: False

These next_observations look like the beginning of a new episode (the first number is around 1.25 and the others around 0) which shouldn't be connected to these observations.

I checked the original dataset without using d4rl.qlearning_dataset. It seems index 309, 1308, 2307 are the beginning of the new episodes.

from d4rl.

Wenxuan-Zhou avatar Wenxuan-Zhou commented on July 17, 2024

Hi @aviralkumar2907 @justinjfu, would you help me to re-open this issue? It seems I'm not able to re-open it.

from d4rl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.