Hi, When I got a .ckpt file after I trained locally,

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The closest one i've found looks like this: <details open="" class="details-reset

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ran <div class="snippet-clipboard-content notranslate position-relative overflow-a

The score on leaderboard is 0 under 'env_name=OpenCabinetDoor-v1' about maniskill HOT 27 CLOSED

xtli12 commented on August 23, 2024

The score on leaderboard is 0 under 'env_name=OpenCabinetDoor-v1'

from maniskill.

Comments (27)

StoneT2000 commented on August 23, 2024

I'll investigate this today, what's the submission ID?

from maniskill.

xtli12 commented on August 23, 2024

It's 6463324ee09b0d30f6f2680c

from maniskill.

StoneT2000 commented on August 23, 2024

Tried rerunning the submission and there weren't any errors it seems. I'll try rerunning and save some videos to see if its your policy behaving weirdly.

from maniskill.

StoneT2000 commented on August 23, 2024

@xtli12 these were some of the resulting videos after rerunning your submission on the train set of data

0.mp4

1.mp4

2.mp4

8.mp4

it seems the policy is quite poor (it never goes towards the handle?)

Can you verify your policy trained on these particular assets, and as well as it works locally with non zero success?

from maniskill.

StoneT2000 commented on August 23, 2024

The closest one i've found looks like this:

14.mp4

from maniskill.

xtli12 commented on August 23, 2024

@StoneT2000 These were the local resulting videos for the specific assets you uploaded:

0.mp4

1.mov

2.mp4

14.mp4

The local results seem better than your rerunning results, and there had been many successful local results for this submission, such as:

7.mov

25.mov

3.mov

from maniskill.

xuanlinli17 commented on August 23, 2024

Could you test your user_solution.py locally with docker and with your checkpoint? Make sure you are loading the correct checkpoint.

from maniskill.

xtli12 commented on August 23, 2024

Yes, I had tested my user_solution.py locally with docker and it worked correctly, and I also had checked the checkpoint, the checkpoint is right. What's more, I uploaded a checkpoint with a local success rate of 0.47 under the env_name=OpenCabinetDoor-v1, the submission ID is 64658ae7e09b0d593af2680f, the socre on leaderboard is

The strange thing is I didn't train this checkpoint under the env_name=PushChair-v1, yet it has score on the leaderboard. On the other hand, I have a local score of 0.47 under the env_name=OpenCabinetDoor-v1, but the score under the OpenCabinetDoor-v1/train is 0. So I think the user_solution.py in my submitted docker image is right, and I had checked the checkpoint which is also right.

Besides, I submitted a docker image which had local success rate of 0.23 under the env_name=OpenCabinetDrawer-v1, the score of OpenCabinetDrawer-v1/train is 0.048 (This submission ID is 6465d792e09b0da4ccf26812).

Are the OpenCabinetDoor-v1/train or OpenCabinetDrawer-v1/train the same as the trainset I downloaded?

from maniskill.

StoneT2000 commented on August 23, 2024

@xtli12
I ran the following as well using the public evaluation method (not the challenge's hidden train configurations)

# submission: 6463324ee09b0d30f6f2680c
cd 0408_1 &&  python -m mani_skill2.evaluation.run_evaluation -e "OpenCabinetDoor-v1" -o out -n 100

i see no success however. The generated videos seem similar

Trying 6465d792e09b0da4ccf26812 now

OpenCabinetDoor and OpenCabinetDrawer are different environments, although they share assets. (It's testing opening revolute vs prismatic joints)

from maniskill.

StoneT2000 commented on August 23, 2024

Ran

# submission 6465d792e09b0da4ccf26812
cd 0408_1 && python -m mani_skill2.evaluation.run_evaluation -e "OpenCabinetDrawer-v1" -o out -n 50

Same issue

0.mp4

I'm going to try and dig through the code a bit more to see what's happening., It does seem like somehow the actions are constant? (the robot is moving in one smooth direction)

from maniskill.

StoneT2000 commented on August 23, 2024

@xtli12 could you run this after generating each action

print(np.abs(action).mean(), action)

What I'm seeing is something like this:

0.8592395 [ 0.9996552  -0.99749184  0.11628906  0.9973375  -0.4441153  -0.99427223
  0.9854965   0.9999647  -0.917217   -0.9997946   1.        ]
0.8791752 [ 0.9999976  -0.999967    0.14011787  0.999825   -0.5562259  -0.99953717
  0.99939114  0.99999994 -0.9758689  -0.99999654  1.        ]
0.8688345 [ 0.99999356 -0.999944    0.04204816  0.9998341  -0.5399283  -0.99957144
  0.99853987  0.9999999  -0.97732574 -0.9999947   1.        ]
0.8693352 [ 0.9996279  -0.99732757  0.1255527   0.9986975  -0.5120498  -0.9967714
  0.98351383  0.9999751  -0.9492529  -0.9999177   1.        ]

The values seem quite large which lead me to suspect perhaps you have some bug / your local version has a fix not committed to the image you submitted?

If it's not clear by then, we can hop on a zoom / discord office hour, let me know your availability

from maniskill.

xtli12 commented on August 23, 2024

Oh, I made a mistake in thinking that using

ENV_ID="OpenCabinetDoor-v1" OUTPUT_DIR="tmp" NUM_EPISODES=1
python -m mani_skill2.evaluation.run_evaluation -e ${ENV_ID} -o ${OUTPUT_DIR} -n ${NUM_EPISODES}

in my container would test whether the docker image is working correctly. However, when I used

ENV_ID="OpenCabinetDoor-v1" OUTPUT_DIR="tmp" NUM_EPISODES=100
python -m mani_skill2.evaluation.run_evaluation -e ${ENV_ID} -o ${OUTPUT_DIR} -n ${NUM_EPISODES}

the success rate in episode_results.json is 0, so I think there may be some bugs in my process of creating the docker image, I am trying to figuer it out.

from maniskill.

xtli12 commented on August 23, 2024

I have checked the Dockerfile and user_solution.py, but I couldn't find any errors. Additionally, I used the same Dockerfile and user_solution.py (with the only change being the .ckpt file which was trained under env_name=PickCube-v0 and control_mode) to create a docker image, but when I ran

python -m mani_skill2.evaluation.run_evaluation -e "PickCube-v0" -o out -n 100

in my local container, the success rate was the same as when I didn't use the docker image. Could you please create a submission to see if it will have a score under OpenCabinetDoor or OpenCabinetDrawer?

from maniskill.

StoneT2000 commented on August 23, 2024

It seems your PickCube submissions is fine. To ensure we are on the same page, can you tell me the exact docker ID of the submission that has the 20%+ success rate on OpenCabinetDrawer/OpenCabinetDoor? And provide me the exact commands you use to 1. run the docker (docker run ...), 2. set up anything in the docker, 3. run the evaluation command

You can send me your docker ID by email to preserve privacy: [email protected]

from maniskill.

xtli12 commented on August 23, 2024

Detailed information and attachment file had been sent via email .

from maniskill.

StoneT2000 commented on August 23, 2024

Just saw, checking tomorrow!

from maniskill.

StoneT2000 commented on August 23, 2024

I replied your email regarding your submission, see my tests over there.

from maniskill.

StoneT2000 commented on August 23, 2024

After private discussion with the original poster, the conclusion is as follows so far

The scores on the leaderboard are indeed correct and the leaderboard is evaluating submissions correctly. The problem lies in the user_solution.py file which leverages ManiSkill2-Learn. Currently OpenCabinetDoor and the other ManiSkill1 environments do not return agent base_pose values to the observation. To address that the ManiSkill2-Learn framework automatically adds it by using the actual environment's robot information.

However, in submissions, the user_solution.py file creates a dummy environment to process observations which includes adding the base_pose. This dummy environment is not the actual environment so there is a mismatch in results as running the ManiSkill2-Learn evaluation will produce successes but running the actual evaluation where there is no access to the actual environment will cause errors as the base_pose values are wrong (and this is also why the robot looks like its only going one direction, the observation is not really changing)

We are currently discussing fixes for this.

from maniskill.

xuanlinli17 commented on August 23, 2024

ManiSkill2 and ManiSkill2-Learn have been updated. You can use the latest env/wrappers.py from ManiSkill2-Learn, which returns the correct base pose and tcp pose(s) for MS1 environments.

from maniskill.

xtli12 commented on August 23, 2024

Hi, when I test the docker image which use the latest ManiSkill2-Learn, error comes out like:

  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/mani_skill2/evaluation/run_evaluation.py", line 151, in <module>
    main()
  File "/opt/conda/lib/python3.8/site-packages/mani_skill2/evaluation/run_evaluation.py", line 132, in main
    evaluator.setup(args.env_id, UserPolicy, env_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mani_skill2/evaluation/run_evaluation.py", line 23, in setup
    super().setup(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mani_skill2/evaluation/evaluator.py", line 30, in setup
    self.policy = policy_cls(
  File "/root/0408_1/user_solution.py", line 35, in __init__
    env_params = get_env_info(cfg.env_cfg)
  File "/root/0408_1/maniskill2_learn/env/env_utils.py", line 83, in get_env_info
    vec_env = build_vec_env(env_cfg.copy()) if vec_env is None else vec_env
  File "/root/0408_1/maniskill2_learn/env/env_utils.py", line 224, in build_vec_env
    vec_env = SingleEnv2VecEnv(cfgs, **vec_env_kwargs)
  File "/root/0408_1/maniskill2_learn/env/vec_env.py", line 266, in __init__
    self._init_obs_space()
  File "/root/0408_1/maniskill2_learn/env/vec_env.py", line 201, in _init_obs_space
    self.observation_space = convert_observation_to_space(self.reset(idx=np.arange(self.num_envs)))
  File "/root/0408_1/maniskill2_learn/env/vec_env.py", line 279, in reset
    return self._unsqueeze(self._env.reset(*args, **kwargs))
  File "/root/0408_1/maniskill2_learn/env/wrappers.py", line 97, in reset
    obs = self.env.reset(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 27, in reset
    return self.env.reset(**kwargs)
  File "/root/0408_1/maniskill2_learn/env/wrappers.py", line 224, in reset
    return self.observation(obs)
  File "/root/0408_1/maniskill2_learn/env/wrappers.py", line 421, in observation
    tcp_poses = observation["extra"]["tcp_pose"]
KeyError: 'tcp_pose'

from maniskill.

xuanlinli17 commented on August 23, 2024

The error doesn't occur on my end. Could you check you have the latest ManiSkill2? If so, could you check if you have the correct command?

from maniskill.

xtli12 commented on August 23, 2024

Hi, when I use the old version of wrappers.py, the error disappears. However, the error still persists when I use the latest version of wrappers.py. Do I need to make any other changes when using the latest version of wrappers.py?

from maniskill.

xuanlinli17 commented on August 23, 2024

No changes needed. Do you have the latest version of ManiSkill2 (commit 538ab6) ? If not then there will be an error.

Also, please post your command here.

from maniskill.

xtli12 commented on August 23, 2024

Oh, I found that the issue might be because I didn't update the mani-skill2 package in my Dockerfile. Now, it has a score on the leaderboard. However, I have two concerns:
The first concern is that the leaderboard doesn't seem to fully utilize the dataset like 94% or 80% like :

The second concern is that the score on the train dataset is 0.488, but the score on the test dataset is only 0.04, like:

Is this normal?

from maniskill.

StoneT2000 commented on August 23, 2024

I will investigate now and also check your submission. I feel like you most likely overfit to the training set (test uses different assets)

Will also look into the strange sub 100%s

from maniskill.

StoneT2000 commented on August 23, 2024

Ok checked the submission. The numerical result shown (0.488, 0.04) are indeed correct. Your policy is close to opening many of the test assets but just falls a little short. I tried with a few other seeds and the values range from 0.04 to 0.08, so there's some amount of randomness here.

The percentage completion issue can be ignored, it's likely some race condition with how the progress bars are updated and potentially network issues.

It also seems that there may have some non deterministic parts that weren't handled. The server now has all evaluations seeded properly.

from maniskill.

xtli12 commented on August 23, 2024

Ok, thank you very much!

from maniskill.

The score on leaderboard is 0 under 'env_name=OpenCabinetDoor-v1' about maniskill HOT 27 CLOSED

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent