conglu1997 / v-d4rl Goto Github PK

View Code? Open in Web Editor NEW

84.0 84.0 8.0 397 KB

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

License: MIT License

Python 100.00%

v-d4rl's People

Contributors

Stargazers

Watchers

Forkers

yang0110 dhruvsreenivas blankshc mbayediongue longfeizhang617 pickxiguapi jaewoopudding taohuang13

v-d4rl's Issues

Failed to run DrQ+BC evaluation

Dear developer,

Thanks for your work.
I followed your instructions strictly while failed to run DrQ+BC evaluation with the following error:
Traceback (most recent call last): File "drqbc/train.py", line 20, in <module> import dmc File "/vd4rl/drqbc/dmc.py", line 14, in <module> from envs.distracting_control.suite import distracting_wrapper ModuleNotFoundError: No module named 'envs'
Have you run into a similar problem before?

Many thanks,
Levi

AttributeError: 'EfficientReplayBuffer' object has no attribute 'valid'

Hello,

The new paper is so inspiring!!

When I ran this command python drqbc/train.py task_name=offline_walker_walk_random offline_dir=vd4rl_data/main/walker_walk/random/84px nstep=3 seed=0, I met an error here:

Error executing job with overrides: ['task_name=offline_walker_walk_random', 'offline_dir=vd4rl_data/main/walker_walk/random/84px', 'nstep=3', 'seed=0']
Traceback (most recent call last):
  File "drqbc/train.py", line 315, in main
    workspace.train_offline(cfg.dataset_dir)
  File "/home/mgz/project/v-d4rl-main/drqbc/train.py", line 283, in train_offline
    metrics = self.agent.update(self.replay_buffer, self.global_step)
  File "/home/mgz/project/v-d4rl-main/drqbc/drqv2.py", line 263, in update
    batch = next(replay_buffer)
  File "/home/mgz/project/v-d4rl-main/drqbc/numpy_replay_buffer.py", line 92, in __next__
    indices = np.random.choice(self.valid.nonzero()[0], size=self.batch_size)
AttributeError: 'EfficientReplayBuffer' object has no attribute 'valid'

I create the environment using drqbc/conda_env.yml without any other change. How can I resolve this issue? Thanks!

a complete list of all tasks for V-D4RL

For all tasks that are prone to confusion, it is recommended to provide a complete list of all tasks for V-D4RL.
Similar to this https://github.com/Farama-Foundation/d4rl/wiki/Tasks

Torchrl data omission

Hi, I saw that you had uploaded the dataset to torchrl repository recently. It is amazing that I can access easily with torch tensordict. However I am writing to report an issue I've encountered while attempting to download the main->humanoid_walk->medium-replay dataset from the V-D4RL benchmarks through torchrl.

(I directly pulled torchrl package from github repo and also tensordict too , pip install seems like it hasnt been updated yet)

(…)03b080197e0b44c08694cb699fff5ce6-501.npz: 100%|██████████████████████████████████| 2.30M/2.30M [00:20<00:00, 112kB/s]
file=/tmp/tmpdnm5926d/datasets--conglu--vd4rl/snapshots/6001dd3a96d44c22e2a6c5c8f937ba0f840c4d50/vd4rl/main/humanoid_wal
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 11
      9         for pixel in [64, 84]:
     10             print(f'task, type, pixel:, {task}, {type}, {pixel}')
---> 11             d = VD4RLExperienceReplay(f"main/{task}/{type}/{pixel}px", batch_size=4, image_size=50, download='force')
     12 for batch in d:   
     13     print(batch)

File ~/anaconda3/envs/vd4rl/lib/python3.9/site-packages/torchrl/data/datasets/vd4rl.py:200, in VD4RLExperienceReplay.__init__(self, dataset_id, batch_size, root, download, sampler, writer, collate_fn, pin_memory, prefetch, transform, split_trajs, totensor, image_size, **env_kwargs)
    198         except FileNotFoundError:
    199             pass
--> 200     storage = self._download_and_preproc(dataset_id, data_path=self.data_path)
    201 elif self.split_trajs and not os.path.exists(self.data_path):
    202     storage = self._make_split()

File ~/anaconda3/envs/vd4rl/lib/python3.9/site-packages/torchrl/data/datasets/vd4rl.py:308, in VD4RLExperienceReplay._download_and_preproc(cls, dataset_id, data_path)
    306             td_save = tdc[0]
    307         tds.append(td)
--> 308         total_steps += td.shape[0]
    310 # From this point, the local paths are non needed anymore
    311 td_save = td_save.expand(total_steps).memmap_like(data_path, num_threads=32)

IndexError: tuple index out of range

This issue has prevented me from successfully downloading only the humanoid-medium-replay dataset. I've verified that my setup and versions are compatible as per the documentation, yet the problem persists. I think some files(maybe .npz files) has been omitted for some reason in hugging_face hub or somewhere.

Could you please look into this matter? Any guidance on resolving this error or confirming whether this might be a known issue with a potential workaround would be highly appreciated.

Thank you very much for your time and assistance.

Is image observation of the Franka Kitchen dataset supported?

Hi~
I want to know whether image observation of the Franka Kitchen dataset supported?
Thank you ！

About the corresponding proprioceptive states for the dataset

Hi, thank you for this great work! I noticed that the visual observations are generated by a proprioceptive SAC agent, but could not find the states corresponding to the images in the downloaded dataset; is it possible to acquire the proprioceptive states somewhere? I saw the behavior agent training script in the README, but it seems hard to deterministically reproduce the training / data collection process, as various sources of randomness are present. Thank you for your time!

Torchrl dataset problem.

pytorch/rl#1833 (comment)

Hi again!. Thanks for forwarding this problem to author of torchrl.

However, the author �raised a issue that it can be a problem of vd4rl dataset.(maybe some omitted dataset)

It seems like it has to get one more or less data(maybe .npz file) according to error message!

TensorDict(
    fields={
        action: MemoryMappedTensor(shape=torch.Size([500, 21]), device=cpu, dtype=torch.float32, is_shared=False),
        discount: MemoryMappedTensor(shape=torch.Size([500]), device=cpu, dtype=torch.float64, is_shared=False),
        image: MemoryMappedTensor(shape=torch.Size([500, 64, 64, 3]), device=cpu, dtype=torch.uint8, is_shared=False),
        is_first: MemoryMappedTensor(shape=torch.Size([501]), device=cpu, dtype=torch.bool, is_shared=False),
        is_last: MemoryMappedTensor(shape=torch.Size([501]), device=cpu, dtype=torch.bool, is_shared=False),
        is_terminal: MemoryMappedTensor(shape=torch.Size([501]), device=cpu, dtype=torch.bool, is_shared=False),
        reward: MemoryMappedTensor(shape=torch.Size([500]), device=cpu, dtype=torch.float64, is_shared=False)},
    batch_size=torch.Size([]),
    device=cpu,
    is_shared=False)

I'm wondering whether you failed to upload a single file to somewhere.

Could you please look into this matter once again?

Appreciate it for your time.

I think it will be powerful and become very easy to access to this dataset, if it supports torchrl perfectly.

Generating Humanoid Dataset

Hi, this might be not the place but I am just wondering what hyperparameters did you use to train the SAC agent (data collection policy) for Humanoid Walk? The default hyperparameters successfully achieve expert level performance for 1M steps for Walker Walk and Cheetah Run. I use this codebase as mentioned in the README.

Questions about TimeLimit of DMC vision and dataset in pytorch version

Thank you for your outstanding work. May I ask why the timelimit for DMC vision is 0 and not 1000?

Another thing I'd like to confirm is that in each episode, the first timestep's "action" and "reward" are 0 right, because they're offset by one step from "image"?

Also if I want to change the dataset related code to a pytorch version, how should I make sure it uses the full offline data? In the current code it looks like it selects sub-trajectory randomly.

Questions about the number of eval episodes

As the paper noted, the experimental results is averaged over six random seeds. Can I ask how many eval_episodes were used for each methods (DV2, CQL,et al) in the evaluation phase, as I found the visual input settings（V-D4RL） are more unstable compared to proprioceptive states (D4RL).

Default parameter setting seems hard to train(offline DV2)

Hi, I am currently using the default parameter config setting(dmc_vision,dmc_walker_walk) to train offline DV2 with mixed walker_walk dataset, but the eval return seems to have a big difference compared with the result in the paper. Here are my questions, hope to see your response.

In the paper's setting, the world model training epoch is 800 and the agent training epoch is 2400， but in the default configs file, the world model training epoch is 25001, in code, the world model is trained for 100 steps for each epoch, also, the agent training epoch setting also doesn't match the paper which is far more than paper's setting(2e5 epochs in code, 2400 epochs in paper). Does this need to be modified?
I tried to run the different dataset settings, and the eval returns are all far less than the paper's result. For example, in the mixed dataset, the highest eval return only has 240, in the paper, it is nearly 600. Can you share your loss curve or something that can help me to find out the problem?

Here is my world model training loss curve screenshot(mixed dataset)

Here is my agent training loss curve (the last checkpoint of the world model) (mixed dataset)

Potential bug in `EfficientReplayBuffer`

Hi,

Upon reading the code, I think the n-step reward computation is wrong ie.

def gather_nstep_indices(self, indices):
    n_samples = indices.shape[0]
    all_gather_ranges = np.stack([np.arange(indices[i] - self.frame_stack, indices[i] + self.nstep)
                                    for i in range(n_samples)], axis=0) % self.buffer_size
    gather_ranges = all_gather_ranges[:, self.frame_stack:]  # bs x nstep
    obs_gather_ranges = all_gather_ranges[:, :self.frame_stack]
    nobs_gather_ranges = all_gather_ranges[:, -self.frame_stack:]

    all_rewards = self.rew[gather_ranges]

    # Could implement below operation as a matmul in pytorch for marginal additional speed improvement
    rew = np.sum(all_rewards * self.discount_vec, axis=1, keepdims=True)

If we take an example of self.frame_stack=1 and self.nstep=1 and lets say indices[0] = 1, supposedly the experiences are written as (s, a, r, s', a', r', s''), then the sampled experience will be (s, a, r', s') instead of (s, a, r, s'). The fix will be

gather_ranges = all_gather_ranges[:, self.frame_stack-1:-1]  # bs x nstep

What do you think? Did I miss something?

Visual Distraction Experiment

Hi, I have two questions.

How to reproduce the experiments in Section 5.1? Especially, the part where it uses different percentage of data shifts.
I am a bit confused with the amount of data used. In paragraph 5 first sentence of Section 5.1, it says that for cheetah-run medium-expert it uses 1M datapoints. Aren't there only 200K datapoints for that task?

Anyway, thank you for the nice codebase!

Only 50K data for 64px in hugging face

It seems that there is only 100 npz files in hugging face (while 400 npz files are in Google drive)

Can you update the hugging face repository?

Inconsistency between 64px and 84 datasets

Thanks for the lovely open-source benchmark. However, I noticed inconsistencies between the 64px and 84px datasets. Specifically, I observed that the number of transitions in the 64px dataset is less than in the 84px one. For instance, the cheetah_run_medium_replay dataset in 84px has 200k transitions (which aligns with the description in your paper), but the 64px version only has 50k. I'm wondering if you might have missed uploading some of the data for the 64px version.

'offline_dir' or 'dataset_dir'?

Hello again!

When I tried the given commands

python drqbc/train.py task_name=offline_${ENVNAME}_${TYPE} offline_dir=vd4rl_data/main/${ENV_NAME}/${TYPE}/84px nstep=3 seed=0

Here was an error of:

Could not override 'offline_dir'.
To append to your config use +offline_dir=vd4rl_data/main/walker_walk/expert/84px
Key 'offline_dir' is not in struct
    full_key: offline_dir
    object_type=dict

Hence I tried to change the offline_dir to dataset_dir, then it can work. Am I right?

python drqbc/train.py task_name=offline_${ENVNAME}_${TYPE} dataset_dir=vd4rl_data/main/${ENV_NAME}/${TYPE}/84px nstep=3 seed=0