huangjunjie2017 / bevdet Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 237.0 16.5 MB

Official code base of the BEVDet series .

License: Apache License 2.0

Python 98.73% Shell 0.70% MATLAB 0.16% Dockerfile 0.15% C++ 0.11% Cuda 0.14%

bevdet's People

Contributors

Stargazers

Watchers

Forkers

scott-mao 24werewolf 184338740 fudan-autonomous-driving-perception collector-m allenpeng0209 synsin0 mengxingshifen1218 rick-sunrise aaronswei birdflies tuskaw githubfragments lzhbrian drilistbox tangal0203 l-net-1992 hailuo0112 qingsong99 wondervictor jlqzzz leedonus jiweimaster hexroytom rookielike ithink3iam yan811 whqing gaondong lsm666 einstein10147 zhumingxu zhengfangwu piglogic-cyber zhuokunyao yanggui19891007 anti-destiny yangdaiyu123 concerttttt pycoco cancaries programmermw1986 jameschen23 zhaozp15 keyuli you-old updating00 pyten noticeable mandymo zoricheng zazuone fangwudi jie311 zoeveryday zhouth-shiny fangliancheng huangzhengxiang gg-bonds jiayuzou2020 barcelona16 hehangtian bruce1408 qhfan klingner hxyhxy612 jiangyongyu1 laoyang1994 zjufkq sainttelant ai-jie01 jfortissj qinhuaping qiuhuan clw5180 hren20 haotian0717 chonghaosima marongbo houxin-j zhoumaomin zhouleidcc tanjingme hongbo123467 xiaolong-rrl a1exr fire2323 seanzhang777 jizhishutong zzzzzpppf jenny0420 aromaticj soverngity canwang-sjtu destinyls amitsuveerqtiswe nuaasxr af-74413592 wanglaotou jishumiao95

bevdet's Issues

The output format

Hello, Thanks for sharing the paper. Since you are using BEV space for the final detection, so I am wondering whether the output from your network should be under bev bin imgs (the same as Lift-splat-shoot). So when you do the evaluation, do you need to project the bev outputs back to each images?

The mASE and mAOE is too large

Thanks for your great work.
When I training the bevdet with the swinting.py config, I have found the mASE and mAOE is too large in test stage.

the pretrained model you support:

the model trained from scratch:

Do you know what could be causing this problem.
best wishes

Questions about retraining

Hi. When I retrain BEVDet-sttiny with the config file bevdet-sttiny.py for 20 epochs, I get the result on val set:

mAP: 0.3049

mATE: 0.6762
mASE: 0.2743
mAOE: 0.5235
mAVE: 0.9782
mAAE: 0.2653
NDS: 0.3807
Eval time: 103.4s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.509   0.539   0.160   0.118   0.946   0.227
truck   0.209   0.677   0.219   0.113   0.955   0.222
bus     0.326   0.677   0.187   0.087   2.124   0.468
trailer 0.157   1.020   0.231   0.375   1.061   0.173
construction_vehicle    0.069   0.829   0.484   1.126   0.096   0.375
pedestrian      0.333   0.750   0.305   1.360   0.882   0.544
motorcycle      0.243   0.729   0.265   0.618   1.544   0.110
bicycle 0.198   0.543   0.281   0.794   0.217   0.002
traffic_cone    0.501   0.512   0.324   nan     nan     nan
barrier 0.503   0.486   0.286   0.120   nan     nan

It seems the NDS is lower than the model you provided (38.1 v.s 40.4).
More information can be seen in the log file (http://www.junbin.xyz/fileURL/20220607_222110.log). Can you help me? Thanks.

Config of BEVDet-base

Hi @HuangJunJie2017 thanks for sharing this wonderful work. Could you please share the config file of BEVDet-base and BEVDet4D-base?

fps question about BEVDet4D-Tiny

Great piece of work. I have some question about the fps stated in readme file. I download the checkpoints and use the commands as below.

with acceleration

python tools/analysis_tools/benchmark.py configs/bevdet/bevdet-sttiny-accelerated.py $checkpoint

without acceleration

python tools/analysis_tools/benchmark.py configs/bevdet/bevdet-sttiny.py $checkpoint

For BEVDet4D-tiny, I can only get 3.6FPS on a A100 GPU. For BEVDet-tiny with acceleration, I can only get 12 FPS. Both of these results are lower than the number listed in readme/paper. Any thing I did wrong?

Visualize output of model

Thanks for your great works.

I want to draw 3D detection in multi camera images, or in the BEV format.
I want to know any scripts meet my requirements.
I used test.py but failed with errors.

How to infer self-prepared images?

Hi, I have some self-prepared images, and camera parameters, how can I do inference using your pretrained model?
I could not find a 'bev_demo' python file under /demo folder. Could you please give me some advice?

Evaluation results on nuScenes using pretrained weight

Hi Junjie,

I appreciate your excellent work. I am trying to evaluate the provided BEVDet-Tiny model on nuScenes val set.
The command was "bash ./tools/dist_test.sh configs/bevdet/bevdet-sttiny.py checkpoints/bevdet-sttiny-pure.pth 1 --eval bbox --out ./workdirs/bevdey-sttiny-eval-results.pkl". And I got the following results:

mAP: 0.2751
mATE: 0.7179
mASE: 0.2738
mAOE: 0.5512
mAVE: 0.8749
mAAE: 0.2206
NDS: 0.3737
Eval time: 120.9s

Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.441 0.631 0.167 0.131 1.037 0.254
truck 0.197 0.757 0.225 0.125 0.828 0.227
bus 0.283 0.680 0.185 0.139 1.895 0.350
trailer 0.132 1.053 0.224 0.463 0.547 0.068
construction_vehicle 0.066 0.795 0.484 1.174 0.095 0.358
pedestrian 0.301 0.788 0.305 1.320 0.848 0.412
motorcycle 0.235 0.704 0.262 0.612 1.438 0.090
bicycle 0.182 0.607 0.265 0.875 0.310 0.006
traffic_cone 0.445 0.616 0.333 nan nan nan
barrier 0.468 0.547 0.287 0.122 nan nan

I am wondering why I could not get the reported mAP values (30.8). Did I miss something here?
Thank you.

question about the positional shift between two candidate feature

It says in the paper that the network predict position shifting that is irrelevant to the ego-motion, but it seems like the position shifting is given as input. As i know, the aligned matrix is given in the dataset pipeline.
May i ask which part is the network predict the position shift

share detection results for a val set in .JSON file #4

Hi, could you guys release the detection results of the validation set in the standard .json format? I wish to perform some stat analysis over the results. Thanks.

Visualization on output

Hi,
I tried visualize the output on the nuScenes mini dataset and use the command:
python tools/test.py configs/bevdet4d/bevdet4d-sttiny.py checkpoints/bevdet4d-sttiny-pure.pth --eval 'mAP' --eval-options 'show=False' 'out_dir=/xxx/project/BEVDet/result'
The model output the eval metrics,but nothing saved below the result folder.I found it a bit confused to visualize the output,can you help out?

TypeError: 'module' object is not callable in view_transformer.py when training bevdet4d-sttiny

Hello, thanks for your great work.
I encountered an error training bevdet4d-sttiny:

TypeError: 'module' object is not callable
bev_feat = self.img_view_transformer.voxel_pooling(geom, volume)
File "/ws/BEVDet/mmdet3d/models/necks/view_transformer.py", line 165, in voxel_pooling
self.nx[1])

I reach out to the code

BEVDet/mmdet3d/models/necks/view_transformer.py

Line 164 in 196fed9

final = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0],

and it seems that the following modification is needed:

modify:
final = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])
to
final = bev_pool.bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])

With the modification, training starts without error, but I don't know whether it is right.

my env:

Python: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0]
GCC: gcc (GCC) 7.3.0
PyTorch: 1.9.0
TorchVision: 0.10.0
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+196fed9

Run mono_det_demo.py problem

Hi,
I was trying to inference the test data under the demo/data folder using bevdet pretrained model,it shows problem as follow:

`(bevdet) qianch@sv2-px:/mnt/data10/qianch/project/BEVDet$ python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525_mono3d.coco.json configs/bevdet/bevdet-sttiny.py checkpoints/bevdet4d-sttiny-pure.pth /mnt/data10/qianch/project/BEVDet/mmdet3d/models/backbones/swin.py:622: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is a deprecated, '
load checkpoint from local path: checkpoints/bevdet4d-sttiny-pure.pth
The model and loaded state dict do not match exactly

size mismatch for img_bev_encoder_backbone.layers.0.0.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for img_bev_encoder_backbone.layers.0.0.downsample.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
unexpected key in source state_dict: pre_process_net.layers.0.0.conv1.weight, pre_process_net.layers.0.0.bn1.weight, pre_process_net.layers.0.0.bn1.bias, pre_process_net.layers.0.0.bn1.running_mean, pre_process_net.layers.0.0.bn1.running_var, pre_process_net.layers.0.0.bn1.num_batches_tracked, pre_process_net.layers.0.0.conv2.weight, pre_process_net.layers.0.0.bn2.weight, pre_process_net.layers.0.0.bn2.bias, pre_process_net.layers.0.0.bn2.running_mean, pre_process_net.layers.0.0.bn2.running_var, pre_process_net.layers.0.0.bn2.num_batches_tracked, pre_process_net.layers.0.0.downsample.weight, pre_process_net.layers.0.0.downsample.bias, pre_process_net.layers.0.1.conv1.weight, pre_process_net.layers.0.1.bn1.weight, pre_process_net.layers.0.1.bn1.bias, pre_process_net.layers.0.1.bn1.running_mean, pre_process_net.layers.0.1.bn1.running_var, pre_process_net.layers.0.1.bn1.num_batches_tracked, pre_process_net.layers.0.1.conv2.weight, pre_process_net.layers.0.1.bn2.weight, pre_process_net.layers.0.1.bn2.bias, pre_process_net.layers.0.1.bn2.running_mean, pre_process_net.layers.0.1.bn2.running_var, pre_process_net.layers.0.1.bn2.num_batches_tracked

Traceback (most recent call last):
File "demo/mono_det_demo.py", line 46, in
main()
File "demo/mono_det_demo.py", line 31, in main
model = init_model(args.config, args.checkpoint, device=args.device)
File "/mnt/data10/qianch/project/BEVDet/mmdet3d/apis/inference.py", line 61, in init_model
if 'CLASSES' in checkpoint['meta']:
KeyError: 'meta'`

But I used the demo provided by official mmdet3D with fcos configs,it gave the correct result.
Can you help?

train error of bevdet4d-sttiny

when i use the master branch to train the model on nuscenes:
python tools/train.py configs/bevdet4d/bevdet4d-sttiny.py --work-dir try

some error appear

RuntimeError: Expected to mark a variable ready only once.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
The code runs ok with only a single GPU, like the following command
python tools/train.py configs/bevdet/bevdet-sttiny.py
However, when I switch to distributed training:
./tools/dist_train.sh configs/bevdet/bevdet-sttiny.py 8,
the program throws the following error
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forwardfunction. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiplecheckpointfunctions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.3) Incorrect unused parameter detection. The return value of theforwardfunction is inspected by the distributed data parallel wrapper to figure out if any of the module's parameters went unused. For unused parameters, DDP would not expect gradients from then. However, if an unused parameter becomes part of the autograd graph at a later point in time (e.g., in a reentrant backward when usingcheckpoint), the gradient will show up unexpectedly. If all parameters in the model participate in the backward pass, you can disable unused parameter detection by passing the keyword argument find_unused_parameters=Falsetotorch.nn.parallel.DistributedDataParallel.
I've read the error massage and found similar post. But their suggested solution is to switch to find_unused_parameters=False. Yet, I have manually checked this argument in mmdetection, and it is set to False by default.

Reproduction

What command or script did you run?

./tools/dist_train.sh configs/bevdet/bevdet-sttiny.py 8

Did you make any modifications on the code or config? Did you understand what you have modified?
The only modification I've made is delete the registered Swin Transformer in mmdetection, before the definition of self-implemented Swin Transformer in this repo.
Specifically, I've inserted del BACKBONES._module_dict['SwinTransformer'] before this line
Otherwise, MMCV will throw an error because of duplicate model definition.
What dataset did you use?
Nuscenes

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.

Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA TITAN RTX
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code
=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden
-DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers
 -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overf
low -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligne
d-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512
=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0
OpenCV: 4.6.0
MMCV: 1.3.18
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.17.0
MMSegmentation: 0.18.0
MMDetection3D: 0.17.2+

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
  I installed PyTorch within a docker container using pip

Error traceback
If applicable, paste the error trackback here.

Traceback (most recent call last):
  File "./tools/train.py", line 224, in <module>
    main()
  File "./tools/train.py", line 213, in main
    train_model(
  File "/home/users/Code/BEVDet/mmdet3d/apis/train.py", line 28, in train_model
    train_detector(
  File "/opt/conda/lib/python3.8/site-packages/mmdet/apis/train.py", line 174, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/hooks/optimizer.py", line 35, in after_train_iter
    runner.outputs['loss'].backward()
  File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 112, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model param
eters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the sam
e part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not suppo
rt such use cases yet.3) Incorrect unused parameter detection. The return value of the `forward` function is inspected by the distributed data parallel wrapper to figure out if any of the module's param
eters went unused. For unused parameters, DDP would not expect gradients from then. However, if an unused parameter becomes part of the autograd graph at a later point in time (e.g., in a reentrant back
ward when using `checkpoint`), the gradient will show up unexpectedly. If all parameters in the model participate in the backward pass, you can disable unused parameter detection by passing the keyword
argument `find_unused_parameters=False` to `torch.nn.parallel.DistributedDataParallel`.

NuScenesDataset: init() got an unexpected keyword argument 'img_info_prototype'

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

TypeError: save_for_backward can only save variables, but argument 1 is of type tuple

Bother again. I encounter the problem of "TypeError: save_for_backward can only save variables, but argument 1 is of type tuple", detailed as blow:
File "..BEVDet/tools/train_debug.py", line 217, in main
train_model(
File "..BEVDet/mmdet3d/apis/train.py", line 28, in train_model
train_detector(
File "..BEVDet/.plu/mmdetection/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "..BEVDet/.plu/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "..BEVDet/.plu/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "..BEVDet/mmdet3d/models/detectors/base.py", line 59, in forward
return self.forward_train(**kwargs)
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 85, in forward_train
img_feats, pts_feats = self.extract_feat(
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 46, in extract_feat
img_feats = self.extract_img_feat(img, img_metas)
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 39, in extract_img_feat
x = self.image_encoder(img[0])
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 25, in image_encoder
x = self.img_backbone(imgs)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/mmdet3d/models/backbones/swin.py", line 804, in forward
x, hw_shape, out, out_hw_shape = stage(x, hw_shape)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/mmdet3d/models/backbones/swin.py", line 519, in forward
x = checkpoint.checkpoint(block, x, hw_shape)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 177, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
TypeError: save_for_backward can only save variables, but argument 1 is of type tuple

Will you give some suggestion, thanks a lot.

Question about intermediate BEV feature of the previous frame

Hello, thanks for your great work!
Does the BEV feature of the previous frame is saved from previous computation, or computed at the same time with current frame? It seems cost less time using previous computation result when inference.

usage of checkpoint

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
TypeError: forward() missing 1 required positional argument: 'hw_shape'

Reproduction

What command or script did you run?
with_cp = True in swin.

torch   == 1.8.1

AttributeError: 'LoadMultiViewImageFromFiles_BEVDet' object has no attribute 'camera_model_consistent'

Hi, thanks for your great work!!

But I come up with the error "AttributeError: 'LoadMultiViewImageFromFiles_BEVDet' object has no attribute 'camera_model_consistent'", will you help me to solve this, thanks a lot!!

I have just simply correct the error by setting the "self.camera_model_consistent" to be None as defaultly, is it right for reproducting your great work.

Why my self_trained pytorch model is much bigger than your pretrained model?

Hi, I trained a bevdet model using the config file: configs/bevdet/bevdet-sttiny.py, the checkpoint of each epoch is almost 660M, while your pretrained model 'bevdet-sttiny-pure.pth' is only 214M.

Why my self-trained model is so big? I just use your config file and do not make any changes to the code.
I tried to dump backbone 'swin-transfomer' to ONNX but failed. Do you know how to export swin transformer to ONNX file?
In your paper, the backbone has another choice 'resnet50', 'resnet101', but i can not find config files based on 'resnet' backbone. Where can i find the 'resnet' based config file?

Loss Convergence Issue with Accelarated BEVPool

Hi, thanks for your great work on BEVDet-series. However, I have some questions about the code.

I have to modify this line

BEVDet/mmdet3d/models/necks/view_transformer.py

Line 164 in 2e559ff

final = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0],

into bev_pool.bev_pool() to make it train correctly.
I notice that the author supports BEVPool in this proposal (thx again!), so I give it a try but found it unable to converge correctly with BEVDet (not 4D, I will try it later). I found there is a related issue #39 , and the author said that BEVPool can be used in training with the latest code. (I cloned the code this afternoon) So I wonder do I miss sth to make it work?

# Here is a training log with BEVPool
2022-07-18 18:51:05,435 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
2022-07-18 18:51:05,436 - mmdet - INFO - Checkpoints will be saved to /nfs/chenzehui/code/BEVDet/work_dirs/bevdet-sttiny_2subset_1x by HardDiskBackend.
2022-07-18 18:54:31,813 - mmdet - INFO - Epoch [1][50/1931]	lr: 2.000e-04, eta: 2 days, 5:02:20, time: 4.125, data_time: 0.173, memory: 23265, task0.loss_xy: 0.1530, task0.loss_z: 0.2406, task0.loss_whl: 0.2138, task0.loss_yaw: 0.3044, task0.loss_vel: 0.0994, task0.loss_heatmap: 29.2677, task1.loss_xy: 0.1517, task1.loss_z: 0.2693, task1.loss_whl: 0.3785, task1.loss_yaw: 0.3092, task1.loss_vel: 0.0783, task1.loss_heatmap: 697.6227, task2.loss_xy: 0.1605, task2.loss_z: 0.2596, task2.loss_whl: 0.5476, task2.loss_yaw: 0.3127, task2.loss_vel: 0.1090, task2.loss_heatmap: 1109.0859, task3.loss_xy: 0.1758, task3.loss_z: 0.1999, task3.loss_whl: 0.2430, task3.loss_yaw: 0.3137, task3.loss_vel: 0.0132, task3.loss_heatmap: 176.5259, task4.loss_xy: 0.1672, task4.loss_z: 0.2124, task4.loss_whl: 0.2097, task4.loss_yaw: 0.3255, task4.loss_vel: 0.0821, task4.loss_heatmap: 2236.7699, task5.loss_xy: 0.1589, task5.loss_z: 0.2017, task5.loss_whl: 0.2519, task5.loss_yaw: 0.3238, task5.loss_vel: 0.0426, task5.loss_heatmap: 182.5948, loss: 4438.3761, grad_norm: 31941.2649
2022-07-18 18:56:23,583 - mmdet - INFO - Epoch [1][100/1931]	lr: 2.001e-04, eta: 1 day, 16:50:53, time: 2.235, data_time: 0.048, memory: 23265, task0.loss_xy: 0.1272, task0.loss_z: 0.2319, task0.loss_whl: 0.0732, task0.loss_yaw: 0.2989, task0.loss_vel: 0.0959, task0.loss_heatmap: 2.7546, task1.loss_xy: 0.1298, task1.loss_z: 0.2616, task1.loss_whl: 0.1806, task1.loss_yaw: 0.3013, task1.loss_vel: 0.0684, task1.loss_heatmap: 4.3315, task2.loss_xy: 0.1311, task2.loss_z: 0.2597, task2.loss_whl: 0.1698, task2.loss_yaw: 0.3076, task2.loss_vel: 0.1020, task2.loss_heatmap: 5.0317, task3.loss_xy: 0.1283, task3.loss_z: 0.1878, task3.loss_whl: 0.1633, task3.loss_yaw: 0.3005, task3.loss_vel: 0.0060, task3.loss_heatmap: 3.3123, task4.loss_xy: 0.1333, task4.loss_z: 0.1964, task4.loss_whl: 0.1455, task4.loss_yaw: 0.3121, task4.loss_vel: 0.0752, task4.loss_heatmap: 7.2879, task5.loss_xy: 0.1284, task5.loss_z: 0.2023, task5.loss_whl: 0.2038, task5.loss_yaw: 0.3080, task5.loss_vel: 0.0391, task5.loss_heatmap: 3.1014, loss: 31.0883, grad_norm: 109.3359
2022-07-18 18:58:13,957 - mmdet - INFO - Epoch [1][150/1931]	lr: 2.001e-04, eta: 1 day, 12:38:41, time: 2.208, data_time: 0.045, memory: 23265, task0.loss_xy: 0.1268, task0.loss_z: 0.2266, task0.loss_whl: 0.0681, task0.loss_yaw: 0.2994, task0.loss_vel: 0.0967, task0.loss_heatmap: 2.7255, task1.loss_xy: 0.1279, task1.loss_z: 0.2594, task1.loss_whl: 0.1746, task1.loss_yaw: 0.3029, task1.loss_vel: 0.0704, task1.loss_heatmap: 3.5595, task2.loss_xy: 0.1273, task2.loss_z: 0.2533, task2.loss_whl: 0.1576, task2.loss_yaw: 0.3035, task2.loss_vel: 0.1016, task2.loss_heatmap: 4.1184, task3.loss_xy: 0.1266, task3.loss_z: 0.1784, task3.loss_whl: 0.1542, task3.loss_yaw: 0.3046, task3.loss_vel: 0.0051, task3.loss_heatmap: 3.0206, task4.loss_xy: 0.1275, task4.loss_z: 0.1727, task4.loss_whl: 0.1413, task4.loss_yaw: 0.3051, task4.loss_vel: 0.0846, task4.loss_heatmap: 4.0120, task5.loss_xy: 0.1263, task5.loss_z: 0.1932, task5.loss_whl: 0.2001, task5.loss_yaw: 0.3066, task5.loss_vel: 0.0354, task5.loss_heatmap: 2.9454, loss: 25.5394, grad_norm: 15.3327
2022-07-18 19:00:04,427 - mmdet - INFO - Epoch [1][200/1931]	lr: 2.002e-04, eta: 1 day, 10:32:01, time: 2.209, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1259, task0.loss_z: 0.2289, task0.loss_whl: 0.0648, task0.loss_yaw: 0.2985, task0.loss_vel: 0.0928, task0.loss_heatmap: 2.7089, task1.loss_xy: 0.1266, task1.loss_z: 0.2612, task1.loss_whl: 0.1694, task1.loss_yaw: 0.3001, task1.loss_vel: 0.0666, task1.loss_heatmap: 3.5405, task2.loss_xy: 0.1275, task2.loss_z: 0.2546, task2.loss_whl: 0.1593, task2.loss_yaw: 0.3038, task2.loss_vel: 0.1045, task2.loss_heatmap: 4.0661, task3.loss_xy: 0.1253, task3.loss_z: 0.1857, task3.loss_whl: 0.1474, task3.loss_yaw: 0.2997, task3.loss_vel: 0.0042, task3.loss_heatmap: 3.0315, task4.loss_xy: 0.1285, task4.loss_z: 0.1815, task4.loss_whl: 0.1393, task4.loss_yaw: 0.3059, task4.loss_vel: 0.0833, task4.loss_heatmap: 3.9987, task5.loss_xy: 0.1255, task5.loss_z: 0.1963, task5.loss_whl: 0.1911, task5.loss_yaw: 0.3041, task5.loss_vel: 0.0368, task5.loss_heatmap: 2.9208, loss: 25.4057, grad_norm: 14.5524
2022-07-18 19:01:55,960 - mmdet - INFO - Epoch [1][250/1931]	lr: 2.004e-04, eta: 1 day, 9:18:32, time: 2.231, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1252, task0.loss_z: 0.2246, task0.loss_whl: 0.0643, task0.loss_yaw: 0.2988, task0.loss_vel: 0.0963, task0.loss_heatmap: 2.7047, task1.loss_xy: 0.1259, task1.loss_z: 0.2597, task1.loss_whl: 0.1703, task1.loss_yaw: 0.3025, task1.loss_vel: 0.0733, task1.loss_heatmap: 3.5218, task2.loss_xy: 0.1300, task2.loss_z: 0.2519, task2.loss_whl: 0.1580, task2.loss_yaw: 0.3054, task2.loss_vel: 0.1031, task2.loss_heatmap: 4.1050, task3.loss_xy: 0.1244, task3.loss_z: 0.1825, task3.loss_whl: 0.1500, task3.loss_yaw: 0.2989, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.8970, task4.loss_xy: 0.1283, task4.loss_z: 0.1793, task4.loss_whl: 0.1368, task4.loss_yaw: 0.3060, task4.loss_vel: 0.0742, task4.loss_heatmap: 3.9614, task5.loss_xy: 0.1256, task5.loss_z: 0.1969, task5.loss_whl: 0.1891, task5.loss_yaw: 0.3061, task5.loss_vel: 0.0367, task5.loss_heatmap: 2.9075, loss: 25.2254, grad_norm: 13.1805
2022-07-18 19:03:46,359 - mmdet - INFO - Epoch [1][300/1931]	lr: 2.005e-04, eta: 1 day, 8:26:02, time: 2.208, data_time: 0.043, memory: 23265, task0.loss_xy: 0.1252, task0.loss_z: 0.2251, task0.loss_whl: 0.0642, task0.loss_yaw: 0.2987, task0.loss_vel: 0.0917, task0.loss_heatmap: 2.6897, task1.loss_xy: 0.1256, task1.loss_z: 0.2599, task1.loss_whl: 0.1693, task1.loss_yaw: 0.3003, task1.loss_vel: 0.0688, task1.loss_heatmap: 3.5091, task2.loss_xy: 0.1247, task2.loss_z: 0.2510, task2.loss_whl: 0.1617, task2.loss_yaw: 0.2970, task2.loss_vel: 0.1042, task2.loss_heatmap: 4.0313, task3.loss_xy: 0.1248, task3.loss_z: 0.1777, task3.loss_whl: 0.1520, task3.loss_yaw: 0.3020, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.8762, task4.loss_xy: 0.1269, task4.loss_z: 0.1714, task4.loss_whl: 0.1345, task4.loss_yaw: 0.3025, task4.loss_vel: 0.0789, task4.loss_heatmap: 3.9629, task5.loss_xy: 0.1254, task5.loss_z: 0.1967, task5.loss_whl: 0.1903, task5.loss_yaw: 0.3057, task5.loss_vel: 0.0366, task5.loss_heatmap: 2.9230, loss: 25.0888, grad_norm: 13.3612
2022-07-18 19:05:36,911 - mmdet - INFO - Epoch [1][350/1931]	lr: 2.007e-04, eta: 1 day, 7:48:20, time: 2.211, data_time: 0.044, memory: 23265, task0.loss_xy: 0.1259, task0.loss_z: 0.2261, task0.loss_whl: 0.0640, task0.loss_yaw: 0.2994, task0.loss_vel: 0.0954, task0.loss_heatmap: 2.6892, task1.loss_xy: 0.1261, task1.loss_z: 0.2570, task1.loss_whl: 0.1697, task1.loss_yaw: 0.3015, task1.loss_vel: 0.0697, task1.loss_heatmap: 3.5118, task2.loss_xy: 0.1246, task2.loss_z: 0.2546, task2.loss_whl: 0.1592, task2.loss_yaw: 0.3028, task2.loss_vel: 0.1029, task2.loss_heatmap: 4.0432, task3.loss_xy: 0.1251, task3.loss_z: 0.1863, task3.loss_whl: 0.1519, task3.loss_yaw: 0.3024, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.7846, task4.loss_xy: 0.1250, task4.loss_z: 0.1772, task4.loss_whl: 0.1340, task4.loss_yaw: 0.3047, task4.loss_vel: 0.0778, task4.loss_heatmap: 3.9504, task5.loss_xy: 0.1257, task5.loss_z: 0.1921, task5.loss_whl: 0.1871, task5.loss_yaw: 0.3066, task5.loss_vel: 0.0371, task5.loss_heatmap: 2.9013, loss: 24.9965, grad_norm: 11.6952
2022-07-18 19:07:27,161 - mmdet - INFO - Epoch [1][400/1931]	lr: 2.009e-04, eta: 1 day, 7:19:03, time: 2.205, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1251, task0.loss_z: 0.2281, task0.loss_whl: 0.0647, task0.loss_yaw: 0.2981, task0.loss_vel: 0.0978, task0.loss_heatmap: 2.6814, task1.loss_xy: 0.1268, task1.loss_z: 0.2599, task1.loss_whl: 0.1701, task1.loss_yaw: 0.3009, task1.loss_vel: 0.0681, task1.loss_heatmap: 3.5153, task2.loss_xy: 0.1271, task2.loss_z: 0.2627, task2.loss_whl: 0.1586, task2.loss_yaw: 0.3031, task2.loss_vel: 0.1004, task2.loss_heatmap: 4.0954, task3.loss_xy: 0.1259, task3.loss_z: 0.1844, task3.loss_whl: 0.1533, task3.loss_yaw: 0.2998, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.7527, task4.loss_xy: 0.1280, task4.loss_z: 0.1776, task4.loss_whl: 0.1376, task4.loss_yaw: 0.3045, task4.loss_vel: 0.0738, task4.loss_heatmap: 3.9593, task5.loss_xy: 0.1259, task5.loss_z: 0.1932, task5.loss_whl: 0.1867, task5.loss_yaw: 0.3057, task5.loss_vel: 0.0362, task5.loss_heatmap: 2.8725, loss: 25.0048, grad_norm: 11.8404
2022-07-18 19:09:17,147 - mmdet - INFO - Epoch [1][450/1931]	lr: 2.012e-04, eta: 1 day, 6:55:24, time: 2.200, data_time: 0.040, memory: 23265, task0.loss_xy: 0.1261, task0.loss_z: 0.2243, task0.loss_whl: 0.0647, task0.loss_yaw: 0.2981, task0.loss_vel: 0.0945, task0.loss_heatmap: 2.6934, task1.loss_xy: 0.1254, task1.loss_z: 0.2544, task1.loss_whl: 0.1711, task1.loss_yaw: 0.3005, task1.loss_vel: 0.0705, task1.loss_heatmap: 3.5251, task2.loss_xy: 0.1250, task2.loss_z: 0.2509, task2.loss_whl: 0.1648, task2.loss_yaw: 0.3018, task2.loss_vel: 0.1045, task2.loss_heatmap: 4.0741, task3.loss_xy: 0.1268, task3.loss_z: 0.1879, task3.loss_whl: 0.1537, task3.loss_yaw: 0.3011, task3.loss_vel: 0.0042, task3.loss_heatmap: 2.7686, task4.loss_xy: 0.1252, task4.loss_z: 0.1752, task4.loss_whl: 0.1351, task4.loss_yaw: 0.3031, task4.loss_vel: 0.0777, task4.loss_heatmap: 3.9659, task5.loss_xy: 0.1261, task5.loss_z: 0.1937, task5.loss_whl: 0.1901, task5.loss_yaw: 0.3048, task5.loss_vel: 0.0362, task5.loss_heatmap: 2.8905, loss: 25.0354, grad_norm: 11.9946
2022-07-18 19:11:07,773 - mmdet - INFO - Epoch [1][500/1931]	lr: 2.014e-04, eta: 1 day, 6:37:04, time: 2.212, data_time: 0.043, memory: 23265, task0.loss_xy: 0.1253, task0.loss_z: 0.2267, task0.loss_whl: 0.0643, task0.loss_yaw: 0.2982, task0.loss_vel: 0.0960, task0.loss_heatmap: 2.6811, task1.loss_xy: 0.1265, task1.loss_z: 0.2567, task1.loss_whl: 0.1698, task1.loss_yaw: 0.3013, task1.loss_vel: 0.0713, task1.loss_heatmap: 3.5068, task2.loss_xy: 0.1269, task2.loss_z: 0.2555, task2.loss_whl: 0.1569, task2.loss_yaw: 0.2997, task2.loss_vel: 0.1028, task2.loss_heatmap: 4.0546, task3.loss_xy: 0.1243, task3.loss_z: 0.1803, task3.loss_whl: 0.1468, task3.loss_yaw: 0.2983, task3.loss_vel: 0.0038, task3.loss_heatmap: 2.7066, task4.loss_xy: 0.1246, task4.loss_z: 0.1709, task4.loss_whl: 0.1354, task4.loss_yaw: 0.3036, task4.loss_vel: 0.0753, task4.loss_heatmap: 3.9158, task5.loss_xy: 0.1257, task5.loss_z: 0.1920, task5.loss_whl: 0.1893, task5.loss_yaw: 0.3050, task5.loss_vel: 0.0363, task5.loss_heatmap: 2.8840, loss: 24.8384, grad_norm: 11.3248
2022-07-18 19:13:01,873 - mmdet - INFO - Epoch [1][550/1931]	lr: 2.017e-04, eta: 1 day, 6:26:34, time: 2.282, data_time: 0.091, memory: 23265, task0.loss_xy: 0.1262, task0.loss_z: 0.2271, task0.loss_whl: 0.0648, task0.loss_yaw: 0.2987, task0.loss_vel: 0.0934, task0.loss_heatmap: 2.6767, task1.loss_xy: 0.1248, task1.loss_z: 0.2595, task1.loss_whl: 0.1696, task1.loss_yaw: 0.3019, task1.loss_vel: 0.0666, task1.loss_heatmap: 3.4998, task2.loss_xy: 0.1279, task2.loss_z: 0.2516, task2.loss_whl: 0.1545, task2.loss_yaw: 0.3011, task2.loss_vel: 0.1050, task2.loss_heatmap: 4.0685, task3.loss_xy: 0.1240, task3.loss_z: 0.1799, task3.loss_whl: 0.1474, task3.loss_yaw: 0.2976, task3.loss_vel: 0.0038, task3.loss_heatmap: 2.6423, task4.loss_xy: 0.1260, task4.loss_z: 0.1817, task4.loss_whl: 0.1323, task4.loss_yaw: 0.3031, task4.loss_vel: 0.0701, task4.loss_heatmap: 3.9359, task5.loss_xy: 0.1258, task5.loss_z: 0.1919, task5.loss_whl: 0.1865, task5.loss_yaw: 0.3059, task5.loss_vel: 0.0366, task5.loss_heatmap: 2.8677, loss: 24.7766, grad_norm: 11.7977

The loss got saturated at about 25.0, while my w/o accelated version get 16.0 loss at 550 iteration :（

Can't inference by the master code

Thanks for your error report and we appreciate it a lot.

Describe the bug
Core dump. After inference around 4000 samples

Reproduction

python tools/test.py ./configs/bevdet/bevdet-sttiny.py ./bevdet-sttiny-pure.pth --eval bbox

Environment

TorchVision: 0.9.1+cu111
OpenCV: 4.5.5
MMCV: 1.4.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+f0647e7

Incorporate BEVFusion's BEV Pooling operation into BEVDet

Hello, I have been trying to replace BEVDet's QuickCumSum operation with BEVFusion's BEV Pooling operation.
https://github.com/mit-han-lab/bevfusion/tree/main/mmdet3d/ops/bev_pool

To do so, I simply have replaced

BEVDet/mmdet3d/models/necks/view_transformer.py

Lines 164 to 169 in e2f4b40

 x, geom_feats = QuickCumsum.apply(x, geom_feats, ranks) 

 # griddify (B x C x Z x X x Y) 

 final = torch.zeros((B, C, nx[2], nx[1], nx[0]), device=x.device) 

 final[geom_feats[:, 3], :, geom_feats[:, 2], geom_feats[:, 1], geom_feats[:, 0]] = x 

 # collapse Z

with

x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])

Where bev_pool is BEVFusion's bev_pool cuda operation.

However, I find that although there is significant speed up, the loss is not decreasing as expected (around 14 at end of epoch 5, while it should be around 9.5).

Looking at the papers, they seem to be equivalent pooling operations, but I was hoping for some guidance in case I missed something.

Thank you!

Question about BEV Augmentation

In the paper description, the bev aug operation is implemented by changing feature and gt simultaneously.

In practice, the operations are conducted both on the output feature of the view transformer and the 3D object detection targets to keep their spatial consistency.

However, in the code base, I can only find bev flip code in class RandomFlip3D, in mmdet3d/datasets/pipelines/transforms_3d.py , which change project matrix and gt simultaneously in the dataloader.

So I am confused about the inconsistency between the paper and code.

Lower results when evaluating released BEVDet checkpoint

Hello, I have tried to evaluate released BEVDet checkpoint as-is on my setup, but I get

mAP: 0.2751                                                                                                                                                                                   
mATE: 0.7179
mASE: 0.2738
mAOE: 0.5512
mAVE: 0.8747
mAAE: 0.2205
NDS: 0.3737
Eval time: 107.4s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.441   0.631   0.167   0.131   1.037   0.254
truck   0.197   0.757   0.225   0.125   0.828   0.227
bus     0.283   0.680   0.185   0.139   1.895   0.350
trailer 0.132   1.053   0.224   0.463   0.547   0.068
construction_vehicle    0.066   0.795   0.484   1.174   0.095   0.358
pedestrian      0.301   0.788   0.305   1.320   0.848   0.412
motorcycle      0.235   0.704   0.262   0.612   1.437   0.090
bicycle 0.182   0.607   0.265   0.875   0.310   0.006
traffic_cone    0.445   0.616   0.333   nan     nan     nan
barrier 0.468   0.547   0.287   0.122   nan     nan

which is lower than the expected 30.8/40.4 mAP/NDS.

I am using A6000 GPUs, torch 1.10.1, cudatoolkit 11.3. Do you know what might be the issue?

I find that I have the exact same numbers as #15 @BoLang615, but I believe I am using the latest version. I would appreciate any pointers for this.

Thank you!

GPU memory consumption

Hi @HuangJunJie2017 thanks for sharing this wonderful work. I am trying to reproduce the bevdet-sttiny but find I can not train it with samples_per_gpu=8 on my 3090 GPU because of OOM. The maximum value for samples_per_gpu on my side is 6. The followings are my environment infos

Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.1+cu111
OpenCV: 4.5.5
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.14.0
MMSegmentation: 0.20.2
MMDetection3D: 0.17.2+f0647e7

Compared with your env, I think the major difference is pytorch 1.9 vs 1.8, I am wondering is this the reason for OOM? have you tried it on pytorch1.8?

Best,
Xuyang.

I found the gt_bbox sending to the forward_train function is flipped compared to the gt_bbox in the outputs of csgbdatasets. Where is the code for the flip above.

Data augmentation

When you resize the image from (900, 1600) into (256, 704), do you need to make the same change on the bounding box ground truth?

When will the code be released? Thank you!

请问什么时候开源代码？

About View Transformer

Hello,
Thank you for your amazing work.

I have a question regarding the view transformer module.
So far, I have figured out that it uses nuscenes dataset's sensor2lidar_rotation and sensor2lidar_translation information, which are stored into rots and trans variables respectively.

My question is what would be the corresponing rotation and translation matrix if I want to use KITTI dataset.
KITTI dataset has following matrices:
P0: camera0 projection matrix after rectification, an 3x4 array

P1: camera1 projection matrix after rectification, an 3x4 array

P2: camera2 projection matrix after rectification, an 3x4 array

P3: camera3 projection matrix after rectification, an 3x4 array

R0_rect: rectifying rotation matrix, an 4x4 array

Tr_velo_to_cam: transformation from Velodyne coordinate to camera coordinate, an 4x4 array

Tr_imu_to_velo: transformation from IMU coordinate to Velodyne coordinate, an 4x4 array }

Which information should I use for trans and rots variables?

Thank you for your help.

I have some confusion in reproduce Bev4D with resnet101 config.

Reference to bevdet4d-sttiny.py, i want to reproduce the setting resnet101 pretrained bu fcos. I change the img size to 928 * 1600, and use the last two feature in fpn fusion, but when batch size only 1 in my device(48G memory), more than 32G memory has been used. I have already used fp16 in training, but the batch size can only afford 2 in per gpu. Can you help me to reproduce the 64 batch size in your paper, maybe there are some config error in my config file.

NuScenesDataset: init() got an unexpected keyword argument 'img_info_prototype'

Hello,I have met such error when I use your *.pth to infer model

Exploration on Transformer-based Head

Hi, since there are plenty of models working with transformer-based head, BEVFormer, PolarFormer, PETR, I wonder if you have tried a transformer-based head? I tried one with a swin-T backbone and init it from a pretrain BEVDet-T. The transformer head is similar to the one in the Object-DGCNN. However, the model seems not converge well (ends up with 1.2 mAP) Therefore I wonder if you have some attempts on it :)
Here is my train config:

_base_ = ['../_base_/datasets/nus-3d.py',
          '../_base_/schedules/cyclic_20e.py',
          '../_base_/default_runtime.py']
# Global
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
# For nuScenes we usually do 10-class detection
class_names = [
    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]

data_config={
    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
    'Ncams': 6,
    'input_size': (256, 704),
    'src_size': (900, 1600),

    # Augmentation
    'resize': (-0.06, 0.11),
    'rot': (-5.4, 5.4),
    'flip': True,
    'crop_h': (0.0, 0.0),
    'resize_test':0.04,
}

# Model
grid_config={
        'xbound': [-51.2, 51.2, 0.8],
        'ybound': [-51.2, 51.2, 0.8],
        'zbound': [-10.0, 10.0, 20.0],
        'dbound': [1.0, 60.0, 1.0],}

voxel_size = [0.1, 0.1, 0.2]

numC_Trans=64

model = dict(
    type='BEVDet',
    img_backbone=dict(
        type='SwinTransformer',
        pretrained='data/pretrain_models/swin_tiny_patch4_window7_224.pth',
        pretrain_img_size=224,
        embed_dims=96,
        patch_size=4,
        window_size=7,
        mlp_ratio=4,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        strides=(4, 2, 2, 2),
        out_indices=(2, 3,),
        qkv_bias=True,
        qk_scale=None,
        patch_norm=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.0,
        use_abs_pos_embed=False,
        act_cfg=dict(type='GELU'),
        norm_cfg=dict(type='LN', requires_grad=True),
        pretrain_style='official',
        output_missing_index_as_none=False),
    img_neck=dict(
        type='FPN_LSS',
        in_channels=384+768,
        out_channels=512,
        extra_upsample=None,
        input_feature_index=(0,1),
        scale_factor=2),
    img_view_transformer=dict(type='ViewTransformerLiftSplatShoot',
                              grid_config=grid_config,
                              data_config=data_config,
                              numC_Trans=numC_Trans),
    img_bev_encoder_backbone = dict(type='ResNetForBEVDet', numC_input=numC_Trans),
    img_bev_encoder_neck = dict(type='FPN_LSS',
                                in_channels=numC_Trans*8+numC_Trans*2,
                                out_channels=256),
    pts_bbox_head=dict(
        type='DGCNN3DHead',
        num_query=300,
        num_classes=10,
        in_channels=256,
        sync_cls_avg_factor=True,
        with_box_refine=True,
        as_two_stage=False,
        # share_conv_channel=256,
        # tasks=[
        #     dict(num_class=10, class_names=class_names),
        # ],
        transformer=dict(
            type='DeformableDetrTransformer',
            encoder=dict(
                type='DetrTransformerEncoder',
                num_layers=2,
                transformerlayers=dict(
                    type='BaseTransformerLayer',
                    attn_cfgs=dict(
                        type='MultiScaleDeformableAttention', embed_dims=256),
                    feedforward_channels=1024,
                    ffn_dropout=0.1,
                    operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
            decoder=dict(
                type='Deformable3DDetrTransformerDecoder',
                num_layers=6,
                return_intermediate=True,
                transformerlayers=dict(
                    type='DetrTransformerDecoderLayer',
                    attn_cfgs=[
                        dict(
                            type='MultiheadAttention',
                            embed_dims=256,
                            num_heads=8,
                            dropout=0.1),
                        dict(
                            type='MultiScaleDeformableAttention',
                            embed_dims=256)
                    ],
                    feedforward_channels=1024,
                    ffn_dropout=0.1,
                    operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
                                     'ffn', 'norm')))),
        bbox_coder=dict(
            type='NMSFreeCoder',
            post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
            pc_range=point_cloud_range,
            max_num=300,
            voxel_size=voxel_size,
            num_classes=10), 
        positional_encoding=dict(
            type='SinePositionalEncoding',
            num_feats=128,
            normalize=True,
            offset=-0.5),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=2.0),
        loss_bbox=dict(type='L1Loss', loss_weight=0.5),
        loss_iou=dict(type='GIoULoss', loss_weight=0.0)), # For DETR compatibility. 
    # model training and testing settings
    train_cfg=dict(pts=dict(
        grid_size=[1024, 1024, 1],
        voxel_size=voxel_size,
        point_cloud_range=point_cloud_range,
        out_size_factor=8,
        assigner=dict(
            type='HungarianAssigner3D',
            cls_cost=dict(type='FocalLossCost', weight=2.0),
            reg_cost=dict(type='BBox3DL1Cost', weight=0.5),
            iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head. 
            pc_range=point_cloud_range))),
    test_cfg=dict(
        pts=dict(
            use_rotate_nms=True,
            nms_across_levels=True,
            nms_pre=1000,
            nms_thr=0.2,
            score_thr=0.05,
            min_bbox_size=0,
            max_num=100)
    ))



# Data
dataset_type = 'NuScenesDataset'
data_root = 'data/nuscenes/'
file_client_args = dict(backend='disk')


train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config),
    dict(
        type='LoadPointsFromFile',
        dummy=True,
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.3925, 0.3925],
        scale_ratio_range=[0.95, 1.05],
        translation_std=[0, 0, 0],
        update_img2lidar=True),
    dict(
        type='RandomFlip3D',
        sync_2d=False,
        flip_ratio_bev_horizontal=0.5,
        flip_ratio_bev_vertical=0.5,
        update_img2lidar=True),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectNameFilter', classes=class_names),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['img_inputs', 'gt_bboxes_3d', 'gt_labels_3d'],
         meta_keys=('filename', 'ori_shape', 'img_shape', 'lidar2img',
                            'depth2img', 'cam2img', 'pad_shape',
                            'scale_factor', 'flip', 'pcd_horizontal_flip',
                            'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d',
                            'img_norm_cfg', 'pcd_trans', 'sample_idx',
                            'pcd_scale_factor', 'pcd_rotation', 'pts_filename',
                            'transformation_3d_flow', 'img_info'))
]

test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config),
    # load lidar points for --show in test.py only
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=5,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points','img_inputs'])
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['img_inputs'])
]

input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)

data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type='CBGSDataset',
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'nuscenes_infos_train.pkl',
            pipeline=train_pipeline,
            classes=class_names,
            test_mode=False,
            use_valid_flag=True,
            modality=input_modality,
            load_interval=2,
            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
            box_type_3d='LiDAR',
            img_info_prototype='bevdet')),
    val=dict(pipeline=test_pipeline, classes=class_names,
        modality=input_modality, img_info_prototype='bevdet'),
    test=dict(pipeline=test_pipeline, classes=class_names,
        modality=input_modality, img_info_prototype='bevdet'))

# Optimizer
lr_config = dict(
    policy='cyclic',
    target_ratio=(5, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)

optimizer = dict(
    type='AdamW', 
    lr=2e-4,
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
            'img_neck': dict(lr_mult=0.1),
            'img_view_transformer': dict(lr_mult=0.1),
            'img_bev_encoder_backbone': dict(lr_mult=0.1),
            'img_bev_encoder_neck': dict(lr_mult=0.1),
        }),
    weight_decay=0.01)
evaluation = dict(interval=6, pipeline=eval_pipeline)
load_from='/nfs/chenzehui/code/BEVDet/work_dirs/bevdet-sttiny/epoch_20.pth'
checkpoint_config = dict(interval=6)
total_epochs = 12
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

After retraining, I found that the projection results always shift forward. If the official model is used, it is normal. Has the author encountered it?

modle training details for nuscenes test split

Hi, could you please share some details regarding to the training for the test split? With Swin-small and an input size 768 x 2112, how can you train the model on 3090?
Looking forward to your reply.

is the bevdet-sttiny-accelerated.py need more GPU memory than bevdet-sttiny.py？

First of all, thank you for the open codebase.

sys.platform: linux
Python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0]
CUDA available: True
GPU 0,1: TITAN RTX
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0+cu102
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=s
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/roon -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBIn-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-paramet-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabe-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-TH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF,USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0+cu102
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+

I just use the base config and bevdet-sttiny-accelerated.py and raise the cuda out of memory error.

FP16 Training

Hi, I was wondering if you have had any success with incorporating FP16?

I have done some experiments, but the large value of the heatmap loss (>10k) at beginning seems to make it difficult. Further, training randomly nan's hundreds of iterations in.

Also, what is "'/mnt/cfs/algorithm/junjie.huang/models/resnet50-0676ba61.pth'" in the recently released R50 config? Is this just the torchvision pretrained model that can be loaded via checkpoint='torchvision://resnet50'?

Question about the inference speed.

Hello, thanks for the great work!
I have found that the inference speed report in bevdet and bevdet4d seems different.
The inference speed in bevdet(704*256) is about 7-8 fps, and in bevdet4d is 15.6 fps, what caused the inconsistency?
You used Bacthsize 4 inference speed or Bachsize 1?

Questions about model training

Great work!
Regarding to model training, I tried to use CenterPoint head for 3D detection in BEV space with image inputs. The training loss looks fine to me, however, the resulting AP is 0. Also, I have tried your training configurations, such as the optimizer type and schedules, the outcoming AP is still near 0, making nonsense.
It would be great if you can share me with some insights.

Problem when testing

I prepared the environment and nuscenes dataset according to the guide. However, when running test.py, I met this problem.
Would please help me find what is wrong?THANKS!

Question about BEVDET4D

Hi author,

in the extract_img_feat function in BEVDetSequentialES (

BEVDet/mmdet3d/models/detectors/bevdet.py

Line 300 in f0647e7

tran = trans[0]

), why to use trans[0] and rots[0] for both frames? Shouldn't we use trans[0] for current frame and trans[1] for adjacent frame?

Thanks a lot!

Collecting reproducing results of BEVDet-Tiny

We observe a variance in reproducing BEVDet-Tiny. It seems the variance is derived from the unstable velocity prediction. We try to collect more reproducing results with this issue.

Logfile for BEVDet

Hello, would it be possible to get a google drive mirror for the logfiles? I am unable to access them through the baidu link.

Thank you for your work

How to use your pretrained model to do inference on nuScenes mini dataset?

Many thanks for your great work!
I want to use your pretrained model (name as, bevdet-sttiny-pure.pth) to do inference on nuScenes mini dataset, and I use the following command:
python tools/test.py configs/bevdet/bevdet-sttiny.py pretrained_models/bevdet-sttiny-pure.pth --show --show-dir infer_res/

But I encounter an error:
BEVDet-master/mmdet3d/models/detectors/mvx_two_stage.py", line 466, in show_results
if isinstance(data['points'][0], DC):
KeyError: 'points'

I checked there are only two keys in data, such as dict_keys(['img_metas', 'img_inputs']), while without 'points' key.
Could you please give me some advice on how to do inference?

Questions on how to train and do inference on Waymo perception dataset

Hi, Junjie
Thanks for your great work. Sorry to bother you. I have some questions about how to train and do inference on Waymo percaption dataset. I wonder if the code support to do so. If true, do I just need to follow the BEVDet/docs/data_preparation.md, converting Waymo datset into Kitti format and training on it? Or should I modify the exsited code files? Can you give me some detailed instructions?

Error on training and testing

I have tried running the training and the test within the Docker environment that it is provided in the repo. However in the docker some of the versions were not compatible so I had to change them. Maybe someone had the same error and could give me a hand?

For training I use:
python tools/train.py configs/bevdet/bevdet-sttiny.py
I get then

Traceback (most recent call last):
File "tools/train.py", line 224, in
main()
File "tools/train.py", line 183, in main
test_cfg=cfg.get('test_cfg'))
File "/mmdetection3d/mmdet3d/models/builder.py", line 84, in build_model
return build_detector(cfg, train_cfg=train_cfg, test_cfg=test_cfg)
File "/mmdetection3d/mmdet3d/models/builder.py", line 58, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/opt/conda/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'BEVDet is not in the models registry'

For testing I use:
python tools/test.py configs/bevdet/bevdet-sttiny.py bevdet-sttiny-pure.pth --show --show-dir ./tmp/

I get the following error:
Traceback (most recent call last):
File "tools/test.py", line 226, in
main()
File "tools/test.py", line 164, in main
dataset = build_dataset(cfg.data.test)
File "/mmdetection3d/mmdet3d/datasets/builder.py", line 41, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: NuScenesDataset: init() got an unexpected keyword argument 'img_info_prototype'

I am using the following versions:
TorchVision: 0.7.0
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+

questions about DETR3D inference speed

In the description of paper, "For monocular paradigms like FCOS3D and PGD, the inference speeds are divided by a factor of 6, as they take each image as an independent sample."

Is the result of DETR3D also divided by 6 in Table2(DETR3D 2.0FPS)? or It should be 12FPS(2*6)

MMCV mmdet3d version issue

With the current docker, there is a problem between versions of the libraries:

Reproduction

docker build -t mmdetection3d docker/ --no-cache

Then:

docker run --gpus all --shm-size=8g -it -v /mnt/nas/experiments/3D_restoration:/mmdetection3d/data mmdetection3d

What command or script did you run?

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

  File "tools/create_data.py", line 6, in <module>
    from tools.data_converter import kitti_converter as kitti
  File "/mmdetection3d/tools/data_converter/kitti_converter.py", line 9, in <module>
    from mmdet3d.core.bbox import box_np_ops, points_cam2img
  File "/mmdetection3d/mmdet3d/__init__.py", line 5, in <module>
    import mmseg
  File "/opt/conda/lib/python3.7/site-packages/mmseg/__init__.py", line 59, in <module>
    f'MMCV=={mmcv.__version__} is used but incompatible. ' \
AssertionError: MMCV==1.3.8 is used but incompatible. Please install mmcv>=(1, 3, 13, 0, 0, 0), <=(1, 4, 0, 0, 0, 0).

However, if I am installing a newer version of MMCV, there is another similar error for a different library.
So far the workarounds found do not help to solve it.

System:
Ubuntu 20.04

	x, geom_feats = QuickCumsum.apply(x, geom_feats, ranks)

	# griddify (B x C x Z x X x Y)
	final = torch.zeros((B, C, nx[2], nx[1], nx[0]), device=x.device)
	final[geom_feats[:, 3], :, geom_feats[:, 2], geom_feats[:, 1], geom_feats[:, 0]] = x
	# collapse Z

huangjunjie2017 / bevdet Goto Github PK

bevdet's People

Contributors

Stargazers

Watchers

Forkers

bevdet's Issues

with acceleration

without acceleration

Recommend Projects

Recommend Topics

Recommend Org