Git Product home page Git Product logo

Comments (16)

zqh0253 avatar zqh0253 commented on June 24, 2024 1

The camera setting I use is dict(size=[320, 180], position=[2.0, 0.0, 1.4], rotation=[0,0,0], fov=100). I use default FullTown01-v1 to collect data and FullTown02-v2 for evaluation.
Note that these settings are based on an old version of DI-drive ( with commit ID: f532c9e9a6b26386a933049c1754ca5262d76e0a).

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024 1

Why do we have a rindex 6510787 in the dataset.

This number is used to fix a bug in the index. The data part is not ready yet and I am still working on this.

There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.

Current 0.8M data is a small portion of the whole ytb frames that I experiment with. I pick up the video clips with close visual appearance to Carla to form these 0.8M frames. This does not affect the pretrain quality in carla downstream tasks. I will consider uploading the whole dataset in the future.

For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.

I use the last epoch. I found the test accuracy stable during training so I simply pick the last checkpoint.

For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?

This one is a little tricky. Due to IL's distribution shift problem, I found that the test performance between different epochs varies greatly. It is also hard to decide which epoch to pick since we do not have a validate environment. So for each pretrain weight, I pick (i*10)th, i=[3, 10] checkpoint and report the highest success rate.

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024

Hi, thanks for your interest in our work. You can find the frames here: OneDrive link.
After downloading, you can run

cat sega* > frames.zip

to get the zip file.

from aco.

penghao-wu avatar penghao-wu commented on June 24, 2024

Thank you very much! Another question about the down-stream tasks: do you also use the DI-drive engine for imitation learning and collect data using the default setting? Besides, which carla version is used for training and evaluation?

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024

Yes, I use DI-drive engine for imitation learning and the Carla version I use is 0.9.9.4.

Based on default settings, I change several things including the camera setting (to match the pretrained image resolution).

from aco.

penghao-wu avatar penghao-wu commented on June 24, 2024

Thanks. Could you provide the camera settings you used including the size, position, and fov? Also, could you provide more details about the collecting and evaluation suites. For example, do you use the default 'FullTown01-v1' suite and weather to collect data, and what evaluation suite and weather are used (straight/One turn/navigation/navigation with dynamics). That would be very helpful.
Thanks a lot.

from aco.

penghao-wu avatar penghao-wu commented on June 24, 2024

Thanks for your help. I appreciate it a lot.

from aco.

penghao-wu avatar penghao-wu commented on June 24, 2024

Sorry to bother you again. I have a few questions about the training to confirm.

  • Why do we have a rindex 6510787 in the dataset, is it used to fix the index problem in dir-65?
  • There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.
  • For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.
  • For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?
    Thanks in advance.

from aco.

penghao-wu avatar penghao-wu commented on June 24, 2024

Hi, I follow your instructions and train an agent (pretrained from imagenet) using 4K data. However, the SR evaluated on the FullTown02-v2 suite of it is $37.3 \pm 3.1$, which is higher than the reported $21.3 \pm 7.5$ in paper. Do I miss any details or other modifications are needed? What are the possible reasons in your opinion? I am using Carla 0.9.9.4 and the same DI-Drive version as you. I sample the 10% data uniformly (eg data_4K = data_40K[::10]), is this the same way you do it or I should choose data_4K = data_40K[:4000].

Besides, do you plan to release the pre-calculated steer values for the uploaded 80K frames or the code for the inverse dynamic model? If not, could you please share more details about the model structure so that I can implement and train it by myself.

Also, as the DI-drive only contains PPO model with bev input, could you provide your model file or model details for PPO training?

Thanks a lot!

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024

Hi,

I sample the 10% data uniformly (eg data_4K = data_40K[::10]), is this the same way you do it or I should choose data_4K = data_40K[:4000].

I reduce the data set size in trajectory level, data_40K[:4000]. Since redundancy exists in adjacent frames, reducing in trajectory level will create harder problem than reducing in frame level (data_40K[::10]). So I think the performance gap you reported is expected.

Do you plan to release the pre-calculated steer values?

Yes, I am working on this part and will release in the future. Stay tuned.

PPO training.

I do not experiment much with PPO model design. A resnet34 backbone is used to extract the visual feature. Then, the feature goes through a mlp and is concatenated with the velocity as the output of the encoder.

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024

Let's keep this issue only dataset-relevant. If you have any further questions about training, feel free to open a new one.

from aco.

SiyuanHuang95 avatar SiyuanHuang95 commented on June 24, 2024

Why do we have a rindex 6510787 in the dataset.

This number is used to fix a bug in the index. The data part is not ready yet and I am still working on this.

There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.

Current 0.8M data is a small portion of the whole ytb frames that I experiment with. I pick up the video clips with close visual appearance to Carla to form these 0.8M frames. This does not affect the pretrain quality in carla downstream tasks. I will consider uploading the whole dataset in the future.

For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.

I use the last epoch. I found the test accuracy stable during training so I simply pick the last checkpoint.

For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?

This one is a little tricky. Due to IL's distribution shift problem, I found that the test performance between different epochs varies greatly. It is also hard to decide which epoch to pick since we do not have a validate environment. So for each pretrain weight, I pick (i*10)th, i=[3, 10] checkpoint and report the highest success rate.

Hi, Thanks for sharing your great work!

  • In this issue, you mentioned that you picked up the clips with a close visual appearance to Carla, so what is the criteria for visual appearance similarity? Picking up 0.8M will bring a better performance or it just saves the training cost?
  • Did you follow the default suite of DI-Drive for IL Carla dataset generation?

Bests,

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024

Hi, thanks for your interest in our work.

  1. There are no clear criteria; I simply removed some driving videos with extreme weather to save the training cost. It would help with more carefully designed measures, for example, calculating the feature distance between Carla frames and video frames of a particular video and then sorting all the videos based on that distance.
  2. Yes, I follow the default suite of DI-Drive for IL dataset generation, except for several settings mentioned here.

from aco.

SiyuanHuang95 avatar SiyuanHuang95 commented on June 24, 2024

from aco.

zqh0253 avatar zqh0253 commented on June 24, 2024
  1. We didn't conduct experiments comparing different dataset size. Indeed with more diversity, the pre-trained large model could be even stronger.
  2. From the very start, DI-drive did not support Carla 0.9.9. And that is why we used an old version.

from aco.

SiyuanHuang95 avatar SiyuanHuang95 commented on June 24, 2024
  1. Okay, thanks.
  2. Thanks for your information.

from aco.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.