jannerm / trajectory-transformer Goto Github PK

View Code? Open in Web Editor NEW

445.0 6.0 61.0 391 KB

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Home Page: https://trajectory-transformer.github.io

License: MIT License

Python 66.55% Dockerfile 2.13% Shell 31.32%

trajectory-transformer's People

Contributors

Stargazers

Watchers

trajectory-transformer's Issues

Some questions about hyperparameters in newer version

Dear Author,

I am very interested in your great works and trying to reproduce your experiment results.

Previously I have almost achieved the same score described in the older version of your paper (approximately 70 score on average in 3 * 3 datasets). But I noticed that you updated your paper in arXiv in November and the score for TT (quantile) went up to 78.9 on average in 3 * 3 datasets.

I also noticed that you listed your beam search hyperparameters in Appendix E, where k_act is 20. The listed hyperparameters has some discrepancy with your config file (config/offline.py), where default k_act is None and cdf_act is 0.6. I am wondering if you changed the hyperparameters and obtained a higher score. If so, could you please update your config file so that I can also reproduce your results?

Thanks!

double forward in goal gpt

Hi! I noticed one more not straightforward thing in goal conditioned version of GPT.

Here:

trajectory-transformer/trajectory/models/transformers.py

Lines 288 to 295 in e0b5f12

 gx = torch.cat([goal_embeddings, x], dim=1) 

 gx = self.blocks(gx) 

 x = gx[:, self.observation_dim:] 

 #### /goal 

 x = self.blocks(x) 

 ## [ B x T x embedding_dim ] 

 x = self.ln_f(x)

After you append goal embeddings to the main sequence, you do self.blocks twice. Is that how it's intended to work? Shouldn't one time be enough, since all embeddings will have all needed information about the goal due to the attention mechanism.

Issue with mc_bin_client.py

While trying to run your given script, I faced following problems.
Traceback (most recent call last):
File "scripts/train.py", line 6, in
import trajectory.utils as utils
File "/home/rs/18CS91P06/Bill_payment/trajectory-transformer-master/trajectory/utils/init.py", line 1, in
from .setup import Parser, watch
File "/home/rs/18CS91P06/Bill_payment/trajectory-transformer-master/trajectory/utils/setup.py", line 6, in
from tap import Tap
File "/home/rs/18CS91P06/anaconda3/envs/trajectory/lib/python3.8/site-packages/tap.py", line 6, in
from mc_bin_client import mc_bin_client, memcacheConstants as Constants
File "/home/rs/18CS91P06/anaconda3/envs/trajectory/lib/python3.8/site-packages/mc_bin_client/mc_bin_client.py", line 14, in
from memcacheConstants import REQ_MAGIC_BYTE, RES_MAGIC_BYTE
ModuleNotFoundError: No module named 'memcacheConstants'

Reproduce help for AntMaze tasks

Hello, I'm interested in the AntMaze tasks and found that the released code didn't cover the AntMaze tasks. Could you please provide the relative code (including how to combine Q-value, the concrete training/planning configs...) as well as the pretrained models? I will appreciate your help very much!!!

long-term prediction

Hi, how can I reproduce the Trajectory predictions in Figure 2 in the paper?
Does this long-term prediction also need discrete states and actions? Hope for more details about the long-term prediction experiment implementations.

No registered env with id: halfcheetah-medium-v2

When I run "python scripts/train.py --dataset halfcheetah-medium-v2", then exception occurred : "gym.error.UnregisteredEnv: No registered env with id: halfcheetah-medium-v2". And my gym version and mujoco version are all same as environment.yml

Can't run train script

Hi I keep getting an error from this line: from tap import Tap

What is this module? I can't seem to find it properly and it's not working well.

Question about D4RL-gym dataset version

Hi, recently I read your paper and it inspire me a lot, and I think it is no doubt a good paper. However, I am confused about the version of D4RL dataset used for your compared baselines. I notice that in "Appendix C Baseline performance sources", the results of BC, MOPO (by the way, I didn't find MOPO in your experiment part) and MBOP are taken from their original papers, all of which use D4RL-gym-v0 datasets.
Because I find that the performance of CQL on D4RL-gym-v0^[1] is greatly different from that on D4RL-gym-v2[2] on several datasets, I wonder that will scores of the above baselines change greatly on D4RL-gym-v2, or you have evidence that this will not happen, since you compare these scores directly?

KeyError: 'halfcheetah-medium-v2'

Dear author,

After installation and downloading pretrained models&plans, I still get in trouble with running the command.
python scripts/train.py --dataset halfcheetah-medium-v2

(trajectory) qz@qz:~/trajectory-transformer$ python scripts/train.py --dataset halfcheetah-medium-v2

[ utils/setup ] Reading config: config.offline:halfcheetah_medium_v2
[ utils/setup ] Not using overrides | config: config.offline | dataset: halfcheetah_medium_v2
[ utils/setup ] Made savepath: logs/halfcheetah-medium-v2/gpt/azure
[ utils/setup ] Saved args to logs/halfcheetah-medium-v2/gpt/azure/args.json
Traceback (most recent call last):
File "/home/qz/anaconda3/envs/trajectory/lib/python3.6/site-packages/gym/envs/registration.py", line 121, in spec
return self.env_specs[id]
KeyError: 'halfcheetah-medium-v2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "scripts/train.py", line 25, in
env = datasets.load_environment(args.dataset)
File "/home/qz/trajectory-transformer/trajectory/datasets/d4rl.py", line 81, in load_environment
wrapped_env = gym.make(name)
File "/home/qz/anaconda3/envs/trajectory/lib/python3.6/site-packages/gym/envs registration.py", line 145, in make
return registry.make(id, **kwargs)
File "/home/qz/anaconda3/envs/trajectory/lib/python3.6/site-packages/gym/envs/registration.py", line 89, in make
spec = self.spec(path)
File "/home/qz/anaconda3/envs/trajectory/lib/python3.6/site-packages/gym/envs/registration.py", line 131, in spec
raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: halfcheetah-medium-v2

Thank you very much for your attention.

[Question] Output shape of heads

Thank you for such an interesting work.

Im really interested in your works and trying to understand your code,
but I wonder about why the head network outputs the "#vocabulary + 1".
Can you explain this for me?

Jax code

Hi,

May I ask is it possible to provide the code based on jax.

Best

ERROR: Could not find a version that satisfies the requirement dm-control

Hi,

I encounter an error when creating the Conda environment. It seems that the version of dm-control (from D4RL) is unavailable.
Is there any recommended version for the D4RL package? The following is the full error message.

Pip subprocess error:
  Running command git clone -q https://github.com/JannerM/d4rl.git /tmp/pip-req-build-enrmm6ao
  Running command git rev-parse -q --verify 'sha^d5719e2c6ef6ab3b1c678a846c02621abb8074a4'
  Running command git fetch -q https://github.com/JannerM/d4rl.git d5719e2c6ef6ab3b1c678a846c02621abb8074a4
  Running command git checkout -q d5719e2c6ef6ab3b1c678a846c02621abb8074a4
  WARNING: Missing build requirements in pyproject.toml for mujoco-py==2.0.2.13 from https://files.pythonhosted.org/packages/2f/48/b108057c1a23c8da9f4cdc7a7c46ab7cec49c3563c0706d50f2527de6ba0/mujoco-py-2.0.2.13.tar.gz#sha256=d6ae66276b565af9063597fda70683a89c7356290f5ac3961b794ee90ec50eea (from -r /708HDD/hungyh/trajectory-transformer/condaenv.l4fl7x8g.requirements.txt (line 4)).
  WARNING: The project does not specify a build backend, and pip cannot fall back to setuptools without 'wheel'.
  Running command git clone -q git://github.com/deepmind/dm_control /tmp/pip-install-ehe12w5d/dm-control_d0b0cee6667746188485b8f85955e996
  fatal: unable to connect to github.com:
  github.com[0: 20.27.177.113]: errno=Connection timed out

WARNING: Discarding git+git://github.com/deepmind/dm_control@90f00e4e80af56abb9f905070d0c152845db5602#egg=dm_control. Command errored out with exit status 128: git clone -q git://github.com/deepmind/dm_control /tmp/pip-install-ehe12w5d/dm-control_d0b0cee6667746188485b8f85955e996 Check the logs for full command output.
  Running command git clone -q git://github.com/aravindr93/mjrl /tmp/pip-install-ehe12w5d/mjrl_82435f9d000845a69223ca68a1c237e4
  fatal: unable to connect to github.com:
  github.com[0: 20.27.177.113]: errno=Connection timed out

WARNING: Discarding git+git://github.com/aravindr93/mjrl@master#egg=mjrl. Command errored out with exit status 128: git clone -q git://github.com/aravindr93/mjrl /tmp/pip-install-ehe12w5d/mjrl_82435f9d000845a69223ca68a1c237e4 Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement dm-control (unavailable) (from d4rl) (from versions: 0.0.286587932, 0.0.286955599, 0.0.288398964, 0.0.288483845, 0.0.295778102, 0.0.300771433, 0.0.312466143, 0.0.318037100, 0.0.318066097, 0.0.319497192, 0.0.322773188, 0.0.355168290, 0.0.364896371)
ERROR: No matching distribution found for dm-control (unavailable)

failed

CondaEnvException: Pip failed

Thanks!

Pretrained Model in AntMaze

Hello, I'm interested in the AntMaze tasks and notice that currently the pretrained models in AntMaze are not provided. Will you provide the pretrained models in AntMaze in the future? Thank you!

Found 0 GPUs for rendering. Using device 0. Device id outside of range of available devices.Failed to initialize OpenGL

Hello, may I ask if I called make while running plan. py. When make_rendering (), the following error is reported in the "self. viewer=mjc. MjRenderContextOffscreen (self. env. sim)" step:
Found 0 GPUs for rendering. Using device 0.
Device id outside of range of available devices.
Traceback (most recent call last):
File "scripts/plan.py", line 41, in
renderer = utils.make_renderer(args)
File "/home/user/projects/trajectory-transformer/trajectory/utils/rendering.py", line 22, in make_renderer
return render_class(args.dataset, observation_dim=observation.size)
File "/home/user/projects/trajectory-transformer/trajectory/utils/rendering.py", line 96, in init
self.viewer = mjc.MjRenderContextOffscreen(self.env.sim,device_id=-1)
File "mjrendercontext.pyx", line 46, in mujoco_py.cymj.MjRenderContext.init
File "mjrendercontext.pyx", line 114, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
File "opengl_context.pyx", line 130, in mujoco_py.cymj.OffscreenOpenGLContext.init
RuntimeError: Failed to initialize OpenGL.
How can I solve this problem?

Question about the visualisation of four-rooms

Hi,

Here I saw the paper where you drew the trajectories of four rooms environment in Figure 6. Where the observation in this environment is based on pictures, could you share your code about how you draw the trajectories?

Calculation of value expectation

Hi,

I'm having trouble understanding how you calculate expectation from probabilities and thresholds here.

trajectory-transformer/trajectory/utils/discretization.py

Lines 108 to 123 in c77076d

 def expectation(self, probs, subslice): 

 ''' 

  probs : [ B x N ] 

  ''' 

 if torch.is_tensor(probs): 

 probs = to_np(probs) 

 ## [ N ] 

 thresholds = self.thresholds[:, subslice] 

 ## [ B ] 

 left = probs @ thresholds[:-1] 

 right = probs @ thresholds[1:] 

 avg = (left + right) / 2. 

 return avg

I understand that thresholds are quantiles calculated from the empirical distribution, but it's hard for me to grasp why you can get the expectation from the average of those two matrix multiplications.

Could you give me the explanation or a page or something to look at?

Thanks.

imitation learning results on HalfCheetah env

Hi!
I noticed that I can't get good results on the HalfCheetah environment with imitation learning (with plain beam search decoding by logprob) even after long training and without overfitting (but can on Hopper). I also noticed that in the paper only results on Hopper and Walker2d are presented for imitation learning section.

Have you encountered the same difficulties? Or haven't considered testing in this environment? If so, where there any particular reasons for this?

Novel Dataset Preparation

Do you have any recommendations or resources you could point me to for preparing a novel dataset for use in Trajectory Transformer?

About the gym env.

Hi, I run this project without Docker, just Vscode.
However, when I run

python scripts/train.py --dataset halfcheetah-medium-v2

It goes like this:
How can I fix it ?
Thank you so much.

python scripts/train.py --dataset halfcheetah-medium-v2
Warning: Mujoco-based envs failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'mjrl'
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: May 20 2022 19:44:17
[ utils/setup ] Reading config: config.offline:halfcheetah_medium_v2
[ utils/setup ] Not using overrides | config: config.offline | dataset: halfcheetah_medium_v2
[ utils/setup ] Saved args to logs/halfcheetah-medium-v2/gpt/azure/args.json
Traceback (most recent call last):
  File "/home/**/software/anaconda3/envs/trajectory/lib/python3.8/site-packages/gym/envs/registration.py", line 121, in spec
    return self.env_specs[id]
KeyError: 'halfcheetah-medium-v2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/train.py", line 26, in <module>
    env = datasets.load_environment(args.dataset)
  File "/home/**/Desktop/TT/trajectory-transformer/trajectory/datasets/d4rl.py", line 84, in load_environment
    wrapped_env = gym.make(name)
  File "/home/**/software/anaconda3/envs/trajectory/lib/python3.8/site-packages/gym/envs/registration.py", line 145, in make
    return registry.make(id, **kwargs)
  File "/home/**/software/anaconda3/envs/trajectory/lib/python3.8/site-packages/gym/envs/registration.py", line 89, in make
    spec = self.spec(path)
  File "/home/**/software/anaconda3/envs/trajectory/lib/python3.8/site-packages/gym/envs/registration.py", line 131, in spec
    raise error.UnregisteredEnv('No registered env with id: {}'.format(id))
gym.error.UnregisteredEnv: No registered env with id: halfcheetah-medium-v2

purpose of pad_to_full_observation

Hi! First of all, thank you for such an interesting work!

I'm trying to figure out how trajectories are represented in this work. As far as I understand, after transformer blocks we get
[batch, block_size, embedding_dim] shapes. In a normal transformer we would just pass this to the head, for example nn.Linear(embedding_dim, vocab_size) and get logits for prediction.

Why wouldn't that work? What's the intuition behind such padding and reshape (and ein linear) that you do? It doesn't seem to be mentioned in the paper.

Also, what is stop token? Seems like there is no special cases for ending in beam plan. Is this just for done?

Thanks!

	gx = torch.cat([goal_embeddings, x], dim=1)
	gx = self.blocks(gx)
	x = gx[:, self.observation_dim:]
	#### /goal

	x = self.blocks(x)
	## [ B x T x embedding_dim ]
	x = self.ln_f(x)

	def expectation(self, probs, subslice):
	'''
	probs : [ B x N ]
	'''

	if torch.is_tensor(probs):
	probs = to_np(probs)

	## [ N ]
	thresholds = self.thresholds[:, subslice]
	## [ B ]
	left = probs @ thresholds[:-1]
	right = probs @ thresholds[1:]

	avg = (left + right) / 2.
	return avg

jannerm / trajectory-transformer Goto Github PK

trajectory-transformer's People

Contributors

Stargazers

Watchers

Forkers

trajectory-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org