wzzheng / occworld Goto Github PK

[ECCV 2024] 3D World Model for Autonomous Driving

Home Page: https://wzzheng.net/OccWorld/

License: Apache License 2.0

Python 100.00%

occworld's Introduction

Hi there 👋

I'm Wenzhao Zheng, a postdoctoral fellow at BAIR, UC Berkeley, working with by Prof. Kurt Keutzer. I received my Ph.D. and B.S. from Tsinghua University, supervised by Jie Zhou and Jiwen Lu.

Previous Efforts

We build the first academic surround-camera 3D occupancy prediction model TPVFormer🎉.

3D Scene Representation: SurroundDepth -> TPVFormer -> SurroundOcc -> SelfOcc -> GaussianFormer
End-to-End Autonomous Driving: BEVerse -> OccWorld -> GenAD -> S³Gaussian -> OccSora

Current Interests

🦙Large Models + 🚙Autonomous Driving -> 🤖AGI

🦙 Large Models: Efficient/Small LLMs, Multimodal Models, Video Generation Models, Large Action Models...
🚙 Autonomous Driving: 3D Occupancy Prediction, End-to-End Driving, World Models, 3D Scene Reconstruction...

Collaborations

If you want to work with me (in person or remotely) at 🐻UC Berkeley (Co-supervised by Prof. Kurt Keutzer), 💜Tsinghua University (Co-supervised by Prof. Jiwen Lu), and/or 🔴Peking University (Co-supervised by Prof. Shanghang Zhang), feel free to drop me an email at [email protected]. I could support GPUs if we are a good fit.

occworld's People

Contributors

Stargazers

Watchers

Forkers

jie311 seabird-go chen-wl20 whuhxb hiyyg kicgit fine2copyv xijunke yuzhouxianzhi gerty168 lycsqq gusongen fshi2006 pengyunzhao weisui-ad jmwang0117

occworld's Issues

Whether the visible mask is used for training or testing?

Attempts at lower versions of CUDA？

The Occ project is truly great, and I am delighted to see it being open-sourced. However, I am facing a dilemma. Despite the exquisite model design and innovative conceptualization, the installation process seems ambiguous to me. Many versions appear to be too advanced. I wonder if the author has tried installing it on other slightly lower versions, such as CUDA 11.3 or others. I look forward to your reply.

Environment

Which version of mmdet3d do you use?

Training hours

Thank you for your contributions! In estimate, how long does it take to train the network with 8 RTX 4090 GPUs?

how to project the generated occupation to cam view?

how can I project the occupation to 6-camera views as your demo gif shows?https://github.com/wzzheng/OccWorld/blob/1ee7f77ecc4c984a4f7f6411d95c2e6e73806b6e/assets/demo.gif

Evaluating on the mini dataset

Hello! Firstly, I would like to thank you for your amazing work!

I noticed that the pickle files provided in README.md are for the full dataset (v1.0-trainval). Can you possibly provide the pickle files for the mini (v1.0-mini) dataset as well?

If I would like to evaluate OccWorld on the mini dataset, what other files will I need? Can I use the gts files from Occ3D directly, since Occ3D seems to support the mini dataset?

Again, thank you for your time!

Evaluation protocol

Hi, thank you for sharing this good work.

I have a question about the results of the main paper.

In Table 1, does the column '2s' mean the mIoU computed within 1.5~2s or just at 2s?

nan when training the OccWorld

Hi, thanks for your wonderful work.

When I trained the occworld model in the second stage, I obtained the nan after epoch 35. And in the last epoch, the evaluation metrics are:

2023/12/10 16:25:53 - mmengine - INFO - Current val iou is [11.32928878068924, 11.171085387468338, 11.522497981786728, 11.246349662542343, 11.434061080217361, 11.45360991358757, 11.66500672698021, 11.577576398849487, 11.563246697187424, 11.626467853784561, 11.70591413974762, 11.869920790195465, 11.870458722114563, 11.824838817119598, 11.806531250476837] while the best val iou is [30.42658567428589, 34.664684534072876, 35.00288724899292, 34.887245297431946, 34.17853116989136, 34.29064750671387, 34.21706259250641, 34.07915234565735, 33.918631076812744, 33.681508898735046, 33.49155783653259, 33.52125287055969, 33.82187783718109, 33.90370011329651, 33.536869287490845]
2023/12/10 16:25:53 - mmengine - INFO - Current val miou is [1.2357821030123513, 1.3048479963532267, 1.3326199349274461, 1.286004960317822, 1.3091895496900714, 1.2971919745562928, 1.313220287727959, 1.2903256293879273, 1.2739735182977336, 1.2695381125120226, 1.268610595691237, 1.278414056115948, 1.2753155520733666, 1.2545286294292002, 1.2432696814785766] while the best val miou is [18.178497386329315, 21.72152917174732, 22.131569595897897, 22.358492542715634, 21.344122553572937, 21.65812363519388, 21.089138879495508, 20.842806030722226, 20.499638687161838, 20.376817619099334, 20.342478550532284, 20.2685889952323, 20.782285183668137, 20.823075622320175, 20.936191344962403]

I'm not sure why this happens.

For the first stage about training the VQVAE, I obtain the best results in epoch_146, the results are:

2023/12/09 09:21:31 - mmengine - INFO - Current val iou is [63.11628818511963, 63.11272978782654, 63.088274002075195, 63.07087540626526, 63.056015968322754, 63.0810022354126, 63.00719976425171, 62.911272048950195, 62.83813118934631, 62.745869159698486] while the best val iou is [63.202375173568726, 63.272738456726074, 63.23975324630737, 63.15116882324219, 63.2210910320282, 63.14346790313721, 63.13709020614624, 63.109904527664185, 63.041579723358154, 63.05258274078369]
2023/12/09 09:21:31 - mmengine - INFO - Current val miou is [66.91237109548905, 66.66993341025184, 66.66781867251676, 66.60014688968658, 66.64343599010917, 66.61616002812106, 66.6671146364773, 66.45773175884696, 66.08651210280026, 66.2292769726585] while the best val miou is [66.91237109548905, 66.98254785116981, 67.08006876356461, 66.98910103124732, 66.95465007249047, 66.91935833762673, 66.94301542113809, 67.0062887317994, 66.87079089529374, 66.90992292235879]

Could you help me find the reasons? Thanks in advance.

您好，我想问一下关于训练VQ-VAE的问题

您好，感谢您们出色的工作
我想问一下第一阶段VQ-VAE的训练细节，或者是否有相关的config文件？

Visual six perspective angle calculation

Hello, I want to try to add six perspective projections to your visual code. I refer to the visual code of TPVFormer. However, there is a certain difference between the angle of the projection of [ 0.7,1.3,0 ] and the original image. How to calculate [ 0.7,1.3,0 ] ?

how to make pkl file?

Thanks for amazing work!

How to make pkl file?
because I want to know how to get rel_poses and gt_mode.

Visualization results are confusing. 🤔

Thank you for your inspiring work. I try to reproduce the results, the avg val IoU is 26.15, and the avg val mIoU is 16.72, which are similar with the results from paper.

However, my visualization results make me confused. During the training, the Transformer get 1st-15th frames as inputs and predict 2nd-16th frames. Here are some visualization results.

2nd GT:
3rd GT:
16th GT:

2nd Predict:

3rd Predict:

16th Predict:

The results are confusing. Even the reconstruction of 2nd and 3rd frames are not satisfying, and I cannot find any connection between them. @wzzheng Could authors provide any help?

About Codebook Training

Great work on World Model! But I have a small question here.
Since the codebook in vqvae training is easily collapses, did the author apply any tricks on training this codebook?

Quesion about the Pose_mode

Excellent Work and released code!
I went through the code and met an unclear question in dataset.py at here:
gt_modes.append(self.nusc_infos[scene_name][idx+i]['pose_mode'])

The pose_mode seems to be a three-dimensional vector.
What does it mean? Could authors give an explanation?
Thanks in advance!

Regarding the scene tokenizer

Hi, you mentioned in the article that the scene tokenizer can encode high-level scene information, have you done any experiments without using the scene tokenizer?

About Dataset

Hi，

Thanks for releasing code. Do I need to download the whole nuscene dataset, or just need Lidarseg.

Thanks!

How to run a inference and visualize the result?

Thank you for your inspiring work!
I have run the training and evaluation, but how to run a inference and visualize the result?

About Evaluating

Hi，
Thanks for your help. I just finished training with 4 x 3090. After evaluating, I see that the result has a big different from yours.
I wander if you can tell me where can I change the tokenizer settings and other hyperparameters.
Thanks!

Where to download gts?

Can any kind person tell me where to download the gts in Occ3D? Thank you!

question about the position embedding of query and key-value

why the position embedding of Query is one step different from the ones of Key and Value ?
Here, the Query、Key、Value means the QKV of MultiheadAttention.

Training get IoU=0.0 after 60 epochs.

I clone the repo and execute the command: python train.py --py-config config/occworld.py --work-dir out/occworld on 4 GPUs. I guess I omit the command: 'a VQVAE should be trained using similar command before training' and gets almost no accuracy. May you explain more in detail how to train VQVAE first and train OccWorld next?

Empty pkl files in pan.baidu.

Thanks for your great work and timely open-source. May you check again the baidu pan link of the pkl files? Thanks again.