Git Product home page Git Product logo

occworld's Introduction

Hi there 👋

I'm Wenzhao Zheng, a postdoctoral fellow at BAIR, UC Berkeley, working with by Prof. Kurt Keutzer. I received my Ph.D. and B.S. from Tsinghua University, supervised by Jie Zhou and Jiwen Lu.

Previous Efforts

We build the first academic surround-camera 3D occupancy prediction model TPVFormer🎉.

Current Interests

🦙Large Models + 🚙Autonomous Driving -> 🤖AGI

  • 🦙 Large Models: Efficient/Small LLMs, Multimodal Models, Video Generation Models, Large Action Models...
  • 🚙 Autonomous Driving: 3D Occupancy Prediction, End-to-End Driving, World Models, 3D Scene Reconstruction...

Collaborations

If you want to work with me (in person or remotely) at 🐻UC Berkeley (Co-supervised by Prof. Kurt Keutzer), 💜Tsinghua University (Co-supervised by Prof. Jiwen Lu), and/or 🔴Peking University (Co-supervised by Prof. Shanghang Zhang), feel free to drop me an email at [email protected]. I could support GPUs if we are a good fit.

occworld's People

Contributors

chen-wl20 avatar gusongen avatar wzzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

occworld's Issues

Attempts at lower versions of CUDA?

The Occ project is truly great, and I am delighted to see it being open-sourced. However, I am facing a dilemma. Despite the exquisite model design and innovative conceptualization, the installation process seems ambiguous to me. Many versions appear to be too advanced. I wonder if the author has tried installing it on other slightly lower versions, such as CUDA 11.3 or others. I look forward to your reply.

Training hours

Thank you for your contributions! In estimate, how long does it take to train the network with 8 RTX 4090 GPUs?

Evaluating on the mini dataset

Hello! Firstly, I would like to thank you for your amazing work!

I noticed that the pickle files provided in README.md are for the full dataset (v1.0-trainval). Can you possibly provide the pickle files for the mini (v1.0-mini) dataset as well?

If I would like to evaluate OccWorld on the mini dataset, what other files will I need? Can I use the gts files from Occ3D directly, since Occ3D seems to support the mini dataset?

Again, thank you for your time!

Evaluation protocol

Hi, thank you for sharing this good work.

I have a question about the results of the main paper.

In Table 1, does the column '2s' mean the mIoU computed within 1.5~2s or just at 2s?

image

nan when training the OccWorld

Hi, thanks for your wonderful work.

When I trained the occworld model in the second stage, I obtained the nan after epoch 35. And in the last epoch, the evaluation metrics are:

2023/12/10 16:25:53 - mmengine - INFO - Current val iou is [11.32928878068924, 11.171085387468338, 11.522497981786728, 11.246349662542343, 11.434061080217361, 11.45360991358757, 11.66500672698021, 11.577576398849487, 11.563246697187424, 11.626467853784561, 11.70591413974762, 11.869920790195465, 11.870458722114563, 11.824838817119598, 11.806531250476837] while the best val iou is [30.42658567428589, 34.664684534072876, 35.00288724899292, 34.887245297431946, 34.17853116989136, 34.29064750671387, 34.21706259250641, 34.07915234565735, 33.918631076812744, 33.681508898735046, 33.49155783653259, 33.52125287055969, 33.82187783718109, 33.90370011329651, 33.536869287490845]
2023/12/10 16:25:53 - mmengine - INFO - Current val miou is [1.2357821030123513, 1.3048479963532267, 1.3326199349274461, 1.286004960317822, 1.3091895496900714, 1.2971919745562928, 1.313220287727959, 1.2903256293879273, 1.2739735182977336, 1.2695381125120226, 1.268610595691237, 1.278414056115948, 1.2753155520733666, 1.2545286294292002, 1.2432696814785766] while the best val miou is [18.178497386329315, 21.72152917174732, 22.131569595897897, 22.358492542715634, 21.344122553572937, 21.65812363519388, 21.089138879495508, 20.842806030722226, 20.499638687161838, 20.376817619099334, 20.342478550532284, 20.2685889952323, 20.782285183668137, 20.823075622320175, 20.936191344962403]

I'm not sure why this happens.

For the first stage about training the VQVAE, I obtain the best results in epoch_146, the results are:

2023/12/09 09:21:31 - mmengine - INFO - Current val iou is [63.11628818511963, 63.11272978782654, 63.088274002075195, 63.07087540626526, 63.056015968322754, 63.0810022354126, 63.00719976425171, 62.911272048950195, 62.83813118934631, 62.745869159698486] while the best val iou is [63.202375173568726, 63.272738456726074, 63.23975324630737, 63.15116882324219, 63.2210910320282, 63.14346790313721, 63.13709020614624, 63.109904527664185, 63.041579723358154, 63.05258274078369]
2023/12/09 09:21:31 - mmengine - INFO - Current val miou is [66.91237109548905, 66.66993341025184, 66.66781867251676, 66.60014688968658, 66.64343599010917, 66.61616002812106, 66.6671146364773, 66.45773175884696, 66.08651210280026, 66.2292769726585] while the best val miou is [66.91237109548905, 66.98254785116981, 67.08006876356461, 66.98910103124732, 66.95465007249047, 66.91935833762673, 66.94301542113809, 67.0062887317994, 66.87079089529374, 66.90992292235879]

Could you help me find the reasons? Thanks in advance.

Visual six perspective angle calculation

Hello, I want to try to add six perspective projections to your visual code. I refer to the visual code of TPVFormer. However, there is a certain difference between the angle of the projection of [ 0.7,1.3,0 ] and the original image. How to calculate [ 0.7,1.3,0 ] ?
image

how to make pkl file?

Thanks for amazing work!

How to make pkl file?
because I want to know how to get rel_poses and gt_mode.

Visualization results are confusing. 🤔

Thank you for your inspiring work. I try to reproduce the results, the avg val IoU is 26.15, and the avg val mIoU is 16.72, which are similar with the results from paper.

However, my visualization results make me confused. During the training, the Transformer get 1st-15th frames as inputs and predict 2nd-16th frames. Here are some visualization results.

  • 2nd GT:
    image
  • 3rd GT:
    image
  • 16th GT:
image
  • 2nd Predict:
image
  • 3rd Predict:
image
  • 16th Predict:
image

The results are confusing. Even the reconstruction of 2nd and 3rd frames are not satisfying, and I cannot find any connection between them. @wzzheng Could authors provide any help?

About Codebook Training

Great work on World Model! But I have a small question here.
Since the codebook in vqvae training is easily collapses, did the author apply any tricks on training this codebook?

Quesion about the Pose_mode

Excellent Work and released code!
I went through the code and met an unclear question in dataset.py at here:
gt_modes.append(self.nusc_infos[scene_name][idx+i]['pose_mode'])

The pose_mode seems to be a three-dimensional vector.
What does it mean? Could authors give an explanation?
Thanks in advance!

Regarding the scene tokenizer

Hi, you mentioned in the article that the scene tokenizer can encode high-level scene information, have you done any experiments without using the scene tokenizer?

About Dataset

Hi,

Thanks for releasing code. Do I need to download the whole nuscene dataset, or just need Lidarseg.

Thanks!

About Evaluating

Hi,
Thanks for your help. I just finished training with 4 x 3090. After evaluating, I see that the result has a big different from yours.
I wander if you can tell me where can I change the tokenizer settings and other hyperparameters.
Thanks!

Training get IoU=0.0 after 60 epochs.

I clone the repo and execute the command: python train.py --py-config config/occworld.py --work-dir out/occworld on 4 GPUs. I guess I omit the command: 'a VQVAE should be trained using similar command before training' and gets almost no accuracy. May you explain more in detail how to train VQVAE first and train OccWorld next?

Empty pkl files in pan.baidu.

Thanks for your great work and timely open-source. May you check again the baidu pan link of the pkl files? Thanks again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.