huangjunjie2017 / bevdet Goto Github PK
View Code? Open in Web Editor NEWOfficial code base of the BEVDet series .
License: Apache License 2.0
Official code base of the BEVDet series .
License: Apache License 2.0
Hello, Thanks for sharing the paper. Since you are using BEV space for the final detection, so I am wondering whether the output from your network should be under bev bin imgs (the same as Lift-splat-shoot). So when you do the evaluation, do you need to project the bev outputs back to each images?
Hi. When I retrain BEVDet-sttiny with the config file bevdet-sttiny.py for 20 epochs, I get the result on val set:
mAP: 0.3049
mATE: 0.6762
mASE: 0.2743
mAOE: 0.5235
mAVE: 0.9782
mAAE: 0.2653
NDS: 0.3807
Eval time: 103.4s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.509 0.539 0.160 0.118 0.946 0.227
truck 0.209 0.677 0.219 0.113 0.955 0.222
bus 0.326 0.677 0.187 0.087 2.124 0.468
trailer 0.157 1.020 0.231 0.375 1.061 0.173
construction_vehicle 0.069 0.829 0.484 1.126 0.096 0.375
pedestrian 0.333 0.750 0.305 1.360 0.882 0.544
motorcycle 0.243 0.729 0.265 0.618 1.544 0.110
bicycle 0.198 0.543 0.281 0.794 0.217 0.002
traffic_cone 0.501 0.512 0.324 nan nan nan
barrier 0.503 0.486 0.286 0.120 nan nan
It seems the NDS is lower than the model you provided (38.1 v.s 40.4).
More information can be seen in the log file (http://www.junbin.xyz/fileURL/20220607_222110.log). Can you help me? Thanks.
Hi @HuangJunJie2017 thanks for sharing this wonderful work. Could you please share the config file of BEVDet-base and BEVDet4D-base?
Great piece of work. I have some question about the fps stated in readme file. I download the checkpoints and use the commands as below.
python tools/analysis_tools/benchmark.py configs/bevdet/bevdet-sttiny-accelerated.py $checkpoint
python tools/analysis_tools/benchmark.py configs/bevdet/bevdet-sttiny.py $checkpoint
For BEVDet4D-tiny, I can only get 3.6FPS on a A100 GPU. For BEVDet-tiny with acceleration, I can only get 12 FPS. Both of these results are lower than the number listed in readme/paper. Any thing I did wrong?
Thanks for your great works.
I want to draw 3D detection in multi camera images, or in the BEV format.
I want to know any scripts meet my requirements.
I used test.py but failed with errors.
Hi, I have some self-prepared images, and camera parameters, how can I do inference using your pretrained model?
I could not find a 'bev_demo' python file under /demo folder. Could you please give me some advice?
Hi Junjie,
I appreciate your excellent work. I am trying to evaluate the provided BEVDet-Tiny model on nuScenes val set.
The command was "bash ./tools/dist_test.sh configs/bevdet/bevdet-sttiny.py checkpoints/bevdet-sttiny-pure.pth 1 --eval bbox --out ./workdirs/bevdey-sttiny-eval-results.pkl". And I got the following results:
mAP: 0.2751
mATE: 0.7179
mASE: 0.2738
mAOE: 0.5512
mAVE: 0.8749
mAAE: 0.2206
NDS: 0.3737
Eval time: 120.9s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.441 0.631 0.167 0.131 1.037 0.254
truck 0.197 0.757 0.225 0.125 0.828 0.227
bus 0.283 0.680 0.185 0.139 1.895 0.350
trailer 0.132 1.053 0.224 0.463 0.547 0.068
construction_vehicle 0.066 0.795 0.484 1.174 0.095 0.358
pedestrian 0.301 0.788 0.305 1.320 0.848 0.412
motorcycle 0.235 0.704 0.262 0.612 1.438 0.090
bicycle 0.182 0.607 0.265 0.875 0.310 0.006
traffic_cone 0.445 0.616 0.333 nan nan nan
barrier 0.468 0.547 0.287 0.122 nan nan
I am wondering why I could not get the reported mAP values (30.8). Did I miss something here?
Thank you.
It says in the paper that the network predict position shifting that is irrelevant to the ego-motion, but it seems like the position shifting is given as input. As i know, the aligned matrix is given in the dataset pipeline.
May i ask which part is the network predict the position shift
Hi, could you guys release the detection results of the validation set in the standard .json format? I wish to perform some stat analysis over the results. Thanks.
Hi,
I tried visualize the output on the nuScenes mini dataset and use the command:
python tools/test.py configs/bevdet4d/bevdet4d-sttiny.py checkpoints/bevdet4d-sttiny-pure.pth --eval 'mAP' --eval-options 'show=False' 'out_dir=/xxx/project/BEVDet/result'
The model output the eval metrics,but nothing saved below the result folder.I found it a bit confused to visualize the output,can you help out?
Hello, thanks for your great work.
I encountered an error training bevdet4d-sttiny:
TypeError: 'module' object is not callable
bev_feat = self.img_view_transformer.voxel_pooling(geom, volume)
File "/ws/BEVDet/mmdet3d/models/necks/view_transformer.py", line 165, in voxel_pooling
self.nx[1])
I reach out to the code
modify:
final = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])
to
final = bev_pool.bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])
With the modification, training starts without error, but I don't know whether it is right.
my env:
Python: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0]
GCC: gcc (GCC) 7.3.0
PyTorch: 1.9.0
TorchVision: 0.10.0
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+196fed9
Hi,
I was trying to inference the test data under the demo/data folder using bevdet pretrained model,it shows problem as follow:
`(bevdet) qianch@sv2-px:/mnt/data10/qianch/project/BEVDet$ python demo/mono_det_demo.py demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525.jpg demo/data/nuscenes/n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525_mono3d.coco.json configs/bevdet/bevdet-sttiny.py checkpoints/bevdet4d-sttiny-pure.pth /mnt/data10/qianch/project/BEVDet/mmdet3d/models/backbones/swin.py:622: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is a deprecated, '
load checkpoint from local path: checkpoints/bevdet4d-sttiny-pure.pth
The model and loaded state dict do not match exactly
size mismatch for img_bev_encoder_backbone.layers.0.0.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for img_bev_encoder_backbone.layers.0.0.downsample.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
unexpected key in source state_dict: pre_process_net.layers.0.0.conv1.weight, pre_process_net.layers.0.0.bn1.weight, pre_process_net.layers.0.0.bn1.bias, pre_process_net.layers.0.0.bn1.running_mean, pre_process_net.layers.0.0.bn1.running_var, pre_process_net.layers.0.0.bn1.num_batches_tracked, pre_process_net.layers.0.0.conv2.weight, pre_process_net.layers.0.0.bn2.weight, pre_process_net.layers.0.0.bn2.bias, pre_process_net.layers.0.0.bn2.running_mean, pre_process_net.layers.0.0.bn2.running_var, pre_process_net.layers.0.0.bn2.num_batches_tracked, pre_process_net.layers.0.0.downsample.weight, pre_process_net.layers.0.0.downsample.bias, pre_process_net.layers.0.1.conv1.weight, pre_process_net.layers.0.1.bn1.weight, pre_process_net.layers.0.1.bn1.bias, pre_process_net.layers.0.1.bn1.running_mean, pre_process_net.layers.0.1.bn1.running_var, pre_process_net.layers.0.1.bn1.num_batches_tracked, pre_process_net.layers.0.1.conv2.weight, pre_process_net.layers.0.1.bn2.weight, pre_process_net.layers.0.1.bn2.bias, pre_process_net.layers.0.1.bn2.running_mean, pre_process_net.layers.0.1.bn2.running_var, pre_process_net.layers.0.1.bn2.num_batches_tracked
Traceback (most recent call last):
File "demo/mono_det_demo.py", line 46, in
main()
File "demo/mono_det_demo.py", line 31, in main
model = init_model(args.config, args.checkpoint, device=args.device)
File "/mnt/data10/qianch/project/BEVDet/mmdet3d/apis/inference.py", line 61, in init_model
if 'CLASSES' in checkpoint['meta']:
KeyError: 'meta'`
But I used the demo provided by official mmdet3D with fcos configs,it gave the correct result.
Can you help?
Checklist
Describe the bug
The code runs ok with only a single GPU, like the following command
python tools/train.py configs/bevdet/bevdet-sttiny.py
However, when I switch to distributed training:
./tools/dist_train.sh configs/bevdet/bevdet-sttiny.py 8
,
the program throws the following error
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the
forwardfunction. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple
checkpointfunctions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.3) Incorrect unused parameter detection. The return value of the
forwardfunction is inspected by the distributed data parallel wrapper to figure out if any of the module's parameters went unused. For unused parameters, DDP would not expect gradients from then. However, if an unused parameter becomes part of the autograd graph at a later point in time (e.g., in a reentrant backward when using
checkpoint), the gradient will show up unexpectedly. If all parameters in the model participate in the backward pass, you can disable unused parameter detection by passing the keyword argument
find_unused_parameters=Falseto
torch.nn.parallel.DistributedDataParallel.
I've read the error massage and found similar post. But their suggested solution is to switch to find_unused_parameters=False
. Yet, I have manually checked this argument in mmdetection, and it is set to False by default.
Reproduction
./tools/dist_train.sh configs/bevdet/bevdet-sttiny.py 8
Did you make any modifications on the code or config? Did you understand what you have modified?
The only modification I've made is delete the registered Swin Transformer in mmdetection, before the definition of self-implemented Swin Transformer in this repo.
Specifically, I've inserted del BACKBONES._module_dict['SwinTransformer']
before this line
Otherwise, MMCV will throw an error because of duplicate model definition.
What dataset did you use?
Nuscenes
Environment
python mmdet3d/utils/collect_env.py
to collect necessary environment infomation and paste it here.Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA TITAN RTX
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code
=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden
-DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers
-Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overf
low -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligne
d-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512
=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.9.0
OpenCV: 4.6.0
MMCV: 1.3.18
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.17.0
MMSegmentation: 0.18.0
MMDetection3D: 0.17.2+
Error traceback
If applicable, paste the error trackback here.
Traceback (most recent call last):
File "./tools/train.py", line 224, in <module>
main()
File "./tools/train.py", line 213, in main
train_model(
File "/home/users/Code/BEVDet/mmdet3d/apis/train.py", line 28, in train_model
train_detector(
File "/opt/conda/lib/python3.8/site-packages/mmdet/apis/train.py", line 174, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
self.call_hook('after_train_iter')
File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/opt/conda/lib/python3.8/site-packages/mmcv/runner/hooks/optimizer.py", line 35, in after_train_iter
runner.outputs['loss'].backward()
File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/opt/conda/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 112, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model param
eters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the sam
e part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not suppo
rt such use cases yet.3) Incorrect unused parameter detection. The return value of the `forward` function is inspected by the distributed data parallel wrapper to figure out if any of the module's param
eters went unused. For unused parameters, DDP would not expect gradients from then. However, if an unused parameter becomes part of the autograd graph at a later point in time (e.g., in a reentrant back
ward when using `checkpoint`), the gradient will show up unexpectedly. If all parameters in the model participate in the backward pass, you can disable unused parameter detection by passing the keyword
argument `find_unused_parameters=False` to `torch.nn.parallel.DistributedDataParallel`.
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
A clear and concise description of what the bug is.
Reproduction
A placeholder for the command.
Environment
python mmdet3d/utils/collect_env.py
to collect necessary environment infomation and paste it here.$PATH
, $LD_LIBRARY_PATH
, $PYTHONPATH
, etc.)Error traceback
If applicable, paste the error trackback here.
A placeholder for trackback.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Bother again. I encounter the problem of "TypeError: save_for_backward can only save variables, but argument 1 is of type tuple", detailed as blow:
File "..BEVDet/tools/train_debug.py", line 217, in main
train_model(
File "..BEVDet/mmdet3d/apis/train.py", line 28, in train_model
train_detector(
File "..BEVDet/.plu/mmdetection/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/epoch_based_runner.py", line 29, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "..BEVDet/.plu/mmcv/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "..BEVDet/.plu/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/.plu/mmcv/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "..BEVDet/mmdet3d/models/detectors/base.py", line 59, in forward
return self.forward_train(**kwargs)
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 85, in forward_train
img_feats, pts_feats = self.extract_feat(
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 46, in extract_feat
img_feats = self.extract_img_feat(img, img_metas)
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 39, in extract_img_feat
x = self.image_encoder(img[0])
File "..BEVDet/mmdet3d/models/detectors/bevdet.py", line 25, in image_encoder
x = self.img_backbone(imgs)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/mmdet3d/models/backbones/swin.py", line 804, in forward
x, hw_shape, out, out_hw_shape = stage(x, hw_shape)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "..BEVDet/mmdet3d/models/backbones/swin.py", line 519, in forward
x = checkpoint.checkpoint(block, x, hw_shape)
File "/workspace/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 177, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
TypeError: save_for_backward can only save variables, but argument 1 is of type tuple
Will you give some suggestion, thanks a lot.
Hello, thanks for your great work!
Does the BEV feature of the previous frame is saved from previous computation, or computed at the same time with current frame? It seems cost less time using previous computation result when inference.
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
TypeError: forward() missing 1 required positional argument: 'hw_shape'
Reproduction
with_cp = True
in swin.torch == 1.8.1
Hi, thanks for your great work!!
But I come up with the error "AttributeError: 'LoadMultiViewImageFromFiles_BEVDet' object has no attribute 'camera_model_consistent'", will you help me to solve this, thanks a lot!!
I have just simply correct the error by setting the "self.camera_model_consistent" to be None as defaultly, is it right for reproducting your great work.
Hi, I trained a bevdet model using the config file: configs/bevdet/bevdet-sttiny.py, the checkpoint of each epoch is almost 660M, while your pretrained model 'bevdet-sttiny-pure.pth' is only 214M.
Hi, thanks for your great work on BEVDet-series. However, I have some questions about the code.
bev_pool.bev_pool()
to make it train correctly.# Here is a training log with BEVPool
2022-07-18 18:51:05,435 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
2022-07-18 18:51:05,436 - mmdet - INFO - Checkpoints will be saved to /nfs/chenzehui/code/BEVDet/work_dirs/bevdet-sttiny_2subset_1x by HardDiskBackend.
2022-07-18 18:54:31,813 - mmdet - INFO - Epoch [1][50/1931] lr: 2.000e-04, eta: 2 days, 5:02:20, time: 4.125, data_time: 0.173, memory: 23265, task0.loss_xy: 0.1530, task0.loss_z: 0.2406, task0.loss_whl: 0.2138, task0.loss_yaw: 0.3044, task0.loss_vel: 0.0994, task0.loss_heatmap: 29.2677, task1.loss_xy: 0.1517, task1.loss_z: 0.2693, task1.loss_whl: 0.3785, task1.loss_yaw: 0.3092, task1.loss_vel: 0.0783, task1.loss_heatmap: 697.6227, task2.loss_xy: 0.1605, task2.loss_z: 0.2596, task2.loss_whl: 0.5476, task2.loss_yaw: 0.3127, task2.loss_vel: 0.1090, task2.loss_heatmap: 1109.0859, task3.loss_xy: 0.1758, task3.loss_z: 0.1999, task3.loss_whl: 0.2430, task3.loss_yaw: 0.3137, task3.loss_vel: 0.0132, task3.loss_heatmap: 176.5259, task4.loss_xy: 0.1672, task4.loss_z: 0.2124, task4.loss_whl: 0.2097, task4.loss_yaw: 0.3255, task4.loss_vel: 0.0821, task4.loss_heatmap: 2236.7699, task5.loss_xy: 0.1589, task5.loss_z: 0.2017, task5.loss_whl: 0.2519, task5.loss_yaw: 0.3238, task5.loss_vel: 0.0426, task5.loss_heatmap: 182.5948, loss: 4438.3761, grad_norm: 31941.2649
2022-07-18 18:56:23,583 - mmdet - INFO - Epoch [1][100/1931] lr: 2.001e-04, eta: 1 day, 16:50:53, time: 2.235, data_time: 0.048, memory: 23265, task0.loss_xy: 0.1272, task0.loss_z: 0.2319, task0.loss_whl: 0.0732, task0.loss_yaw: 0.2989, task0.loss_vel: 0.0959, task0.loss_heatmap: 2.7546, task1.loss_xy: 0.1298, task1.loss_z: 0.2616, task1.loss_whl: 0.1806, task1.loss_yaw: 0.3013, task1.loss_vel: 0.0684, task1.loss_heatmap: 4.3315, task2.loss_xy: 0.1311, task2.loss_z: 0.2597, task2.loss_whl: 0.1698, task2.loss_yaw: 0.3076, task2.loss_vel: 0.1020, task2.loss_heatmap: 5.0317, task3.loss_xy: 0.1283, task3.loss_z: 0.1878, task3.loss_whl: 0.1633, task3.loss_yaw: 0.3005, task3.loss_vel: 0.0060, task3.loss_heatmap: 3.3123, task4.loss_xy: 0.1333, task4.loss_z: 0.1964, task4.loss_whl: 0.1455, task4.loss_yaw: 0.3121, task4.loss_vel: 0.0752, task4.loss_heatmap: 7.2879, task5.loss_xy: 0.1284, task5.loss_z: 0.2023, task5.loss_whl: 0.2038, task5.loss_yaw: 0.3080, task5.loss_vel: 0.0391, task5.loss_heatmap: 3.1014, loss: 31.0883, grad_norm: 109.3359
2022-07-18 18:58:13,957 - mmdet - INFO - Epoch [1][150/1931] lr: 2.001e-04, eta: 1 day, 12:38:41, time: 2.208, data_time: 0.045, memory: 23265, task0.loss_xy: 0.1268, task0.loss_z: 0.2266, task0.loss_whl: 0.0681, task0.loss_yaw: 0.2994, task0.loss_vel: 0.0967, task0.loss_heatmap: 2.7255, task1.loss_xy: 0.1279, task1.loss_z: 0.2594, task1.loss_whl: 0.1746, task1.loss_yaw: 0.3029, task1.loss_vel: 0.0704, task1.loss_heatmap: 3.5595, task2.loss_xy: 0.1273, task2.loss_z: 0.2533, task2.loss_whl: 0.1576, task2.loss_yaw: 0.3035, task2.loss_vel: 0.1016, task2.loss_heatmap: 4.1184, task3.loss_xy: 0.1266, task3.loss_z: 0.1784, task3.loss_whl: 0.1542, task3.loss_yaw: 0.3046, task3.loss_vel: 0.0051, task3.loss_heatmap: 3.0206, task4.loss_xy: 0.1275, task4.loss_z: 0.1727, task4.loss_whl: 0.1413, task4.loss_yaw: 0.3051, task4.loss_vel: 0.0846, task4.loss_heatmap: 4.0120, task5.loss_xy: 0.1263, task5.loss_z: 0.1932, task5.loss_whl: 0.2001, task5.loss_yaw: 0.3066, task5.loss_vel: 0.0354, task5.loss_heatmap: 2.9454, loss: 25.5394, grad_norm: 15.3327
2022-07-18 19:00:04,427 - mmdet - INFO - Epoch [1][200/1931] lr: 2.002e-04, eta: 1 day, 10:32:01, time: 2.209, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1259, task0.loss_z: 0.2289, task0.loss_whl: 0.0648, task0.loss_yaw: 0.2985, task0.loss_vel: 0.0928, task0.loss_heatmap: 2.7089, task1.loss_xy: 0.1266, task1.loss_z: 0.2612, task1.loss_whl: 0.1694, task1.loss_yaw: 0.3001, task1.loss_vel: 0.0666, task1.loss_heatmap: 3.5405, task2.loss_xy: 0.1275, task2.loss_z: 0.2546, task2.loss_whl: 0.1593, task2.loss_yaw: 0.3038, task2.loss_vel: 0.1045, task2.loss_heatmap: 4.0661, task3.loss_xy: 0.1253, task3.loss_z: 0.1857, task3.loss_whl: 0.1474, task3.loss_yaw: 0.2997, task3.loss_vel: 0.0042, task3.loss_heatmap: 3.0315, task4.loss_xy: 0.1285, task4.loss_z: 0.1815, task4.loss_whl: 0.1393, task4.loss_yaw: 0.3059, task4.loss_vel: 0.0833, task4.loss_heatmap: 3.9987, task5.loss_xy: 0.1255, task5.loss_z: 0.1963, task5.loss_whl: 0.1911, task5.loss_yaw: 0.3041, task5.loss_vel: 0.0368, task5.loss_heatmap: 2.9208, loss: 25.4057, grad_norm: 14.5524
2022-07-18 19:01:55,960 - mmdet - INFO - Epoch [1][250/1931] lr: 2.004e-04, eta: 1 day, 9:18:32, time: 2.231, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1252, task0.loss_z: 0.2246, task0.loss_whl: 0.0643, task0.loss_yaw: 0.2988, task0.loss_vel: 0.0963, task0.loss_heatmap: 2.7047, task1.loss_xy: 0.1259, task1.loss_z: 0.2597, task1.loss_whl: 0.1703, task1.loss_yaw: 0.3025, task1.loss_vel: 0.0733, task1.loss_heatmap: 3.5218, task2.loss_xy: 0.1300, task2.loss_z: 0.2519, task2.loss_whl: 0.1580, task2.loss_yaw: 0.3054, task2.loss_vel: 0.1031, task2.loss_heatmap: 4.1050, task3.loss_xy: 0.1244, task3.loss_z: 0.1825, task3.loss_whl: 0.1500, task3.loss_yaw: 0.2989, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.8970, task4.loss_xy: 0.1283, task4.loss_z: 0.1793, task4.loss_whl: 0.1368, task4.loss_yaw: 0.3060, task4.loss_vel: 0.0742, task4.loss_heatmap: 3.9614, task5.loss_xy: 0.1256, task5.loss_z: 0.1969, task5.loss_whl: 0.1891, task5.loss_yaw: 0.3061, task5.loss_vel: 0.0367, task5.loss_heatmap: 2.9075, loss: 25.2254, grad_norm: 13.1805
2022-07-18 19:03:46,359 - mmdet - INFO - Epoch [1][300/1931] lr: 2.005e-04, eta: 1 day, 8:26:02, time: 2.208, data_time: 0.043, memory: 23265, task0.loss_xy: 0.1252, task0.loss_z: 0.2251, task0.loss_whl: 0.0642, task0.loss_yaw: 0.2987, task0.loss_vel: 0.0917, task0.loss_heatmap: 2.6897, task1.loss_xy: 0.1256, task1.loss_z: 0.2599, task1.loss_whl: 0.1693, task1.loss_yaw: 0.3003, task1.loss_vel: 0.0688, task1.loss_heatmap: 3.5091, task2.loss_xy: 0.1247, task2.loss_z: 0.2510, task2.loss_whl: 0.1617, task2.loss_yaw: 0.2970, task2.loss_vel: 0.1042, task2.loss_heatmap: 4.0313, task3.loss_xy: 0.1248, task3.loss_z: 0.1777, task3.loss_whl: 0.1520, task3.loss_yaw: 0.3020, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.8762, task4.loss_xy: 0.1269, task4.loss_z: 0.1714, task4.loss_whl: 0.1345, task4.loss_yaw: 0.3025, task4.loss_vel: 0.0789, task4.loss_heatmap: 3.9629, task5.loss_xy: 0.1254, task5.loss_z: 0.1967, task5.loss_whl: 0.1903, task5.loss_yaw: 0.3057, task5.loss_vel: 0.0366, task5.loss_heatmap: 2.9230, loss: 25.0888, grad_norm: 13.3612
2022-07-18 19:05:36,911 - mmdet - INFO - Epoch [1][350/1931] lr: 2.007e-04, eta: 1 day, 7:48:20, time: 2.211, data_time: 0.044, memory: 23265, task0.loss_xy: 0.1259, task0.loss_z: 0.2261, task0.loss_whl: 0.0640, task0.loss_yaw: 0.2994, task0.loss_vel: 0.0954, task0.loss_heatmap: 2.6892, task1.loss_xy: 0.1261, task1.loss_z: 0.2570, task1.loss_whl: 0.1697, task1.loss_yaw: 0.3015, task1.loss_vel: 0.0697, task1.loss_heatmap: 3.5118, task2.loss_xy: 0.1246, task2.loss_z: 0.2546, task2.loss_whl: 0.1592, task2.loss_yaw: 0.3028, task2.loss_vel: 0.1029, task2.loss_heatmap: 4.0432, task3.loss_xy: 0.1251, task3.loss_z: 0.1863, task3.loss_whl: 0.1519, task3.loss_yaw: 0.3024, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.7846, task4.loss_xy: 0.1250, task4.loss_z: 0.1772, task4.loss_whl: 0.1340, task4.loss_yaw: 0.3047, task4.loss_vel: 0.0778, task4.loss_heatmap: 3.9504, task5.loss_xy: 0.1257, task5.loss_z: 0.1921, task5.loss_whl: 0.1871, task5.loss_yaw: 0.3066, task5.loss_vel: 0.0371, task5.loss_heatmap: 2.9013, loss: 24.9965, grad_norm: 11.6952
2022-07-18 19:07:27,161 - mmdet - INFO - Epoch [1][400/1931] lr: 2.009e-04, eta: 1 day, 7:19:03, time: 2.205, data_time: 0.046, memory: 23265, task0.loss_xy: 0.1251, task0.loss_z: 0.2281, task0.loss_whl: 0.0647, task0.loss_yaw: 0.2981, task0.loss_vel: 0.0978, task0.loss_heatmap: 2.6814, task1.loss_xy: 0.1268, task1.loss_z: 0.2599, task1.loss_whl: 0.1701, task1.loss_yaw: 0.3009, task1.loss_vel: 0.0681, task1.loss_heatmap: 3.5153, task2.loss_xy: 0.1271, task2.loss_z: 0.2627, task2.loss_whl: 0.1586, task2.loss_yaw: 0.3031, task2.loss_vel: 0.1004, task2.loss_heatmap: 4.0954, task3.loss_xy: 0.1259, task3.loss_z: 0.1844, task3.loss_whl: 0.1533, task3.loss_yaw: 0.2998, task3.loss_vel: 0.0041, task3.loss_heatmap: 2.7527, task4.loss_xy: 0.1280, task4.loss_z: 0.1776, task4.loss_whl: 0.1376, task4.loss_yaw: 0.3045, task4.loss_vel: 0.0738, task4.loss_heatmap: 3.9593, task5.loss_xy: 0.1259, task5.loss_z: 0.1932, task5.loss_whl: 0.1867, task5.loss_yaw: 0.3057, task5.loss_vel: 0.0362, task5.loss_heatmap: 2.8725, loss: 25.0048, grad_norm: 11.8404
2022-07-18 19:09:17,147 - mmdet - INFO - Epoch [1][450/1931] lr: 2.012e-04, eta: 1 day, 6:55:24, time: 2.200, data_time: 0.040, memory: 23265, task0.loss_xy: 0.1261, task0.loss_z: 0.2243, task0.loss_whl: 0.0647, task0.loss_yaw: 0.2981, task0.loss_vel: 0.0945, task0.loss_heatmap: 2.6934, task1.loss_xy: 0.1254, task1.loss_z: 0.2544, task1.loss_whl: 0.1711, task1.loss_yaw: 0.3005, task1.loss_vel: 0.0705, task1.loss_heatmap: 3.5251, task2.loss_xy: 0.1250, task2.loss_z: 0.2509, task2.loss_whl: 0.1648, task2.loss_yaw: 0.3018, task2.loss_vel: 0.1045, task2.loss_heatmap: 4.0741, task3.loss_xy: 0.1268, task3.loss_z: 0.1879, task3.loss_whl: 0.1537, task3.loss_yaw: 0.3011, task3.loss_vel: 0.0042, task3.loss_heatmap: 2.7686, task4.loss_xy: 0.1252, task4.loss_z: 0.1752, task4.loss_whl: 0.1351, task4.loss_yaw: 0.3031, task4.loss_vel: 0.0777, task4.loss_heatmap: 3.9659, task5.loss_xy: 0.1261, task5.loss_z: 0.1937, task5.loss_whl: 0.1901, task5.loss_yaw: 0.3048, task5.loss_vel: 0.0362, task5.loss_heatmap: 2.8905, loss: 25.0354, grad_norm: 11.9946
2022-07-18 19:11:07,773 - mmdet - INFO - Epoch [1][500/1931] lr: 2.014e-04, eta: 1 day, 6:37:04, time: 2.212, data_time: 0.043, memory: 23265, task0.loss_xy: 0.1253, task0.loss_z: 0.2267, task0.loss_whl: 0.0643, task0.loss_yaw: 0.2982, task0.loss_vel: 0.0960, task0.loss_heatmap: 2.6811, task1.loss_xy: 0.1265, task1.loss_z: 0.2567, task1.loss_whl: 0.1698, task1.loss_yaw: 0.3013, task1.loss_vel: 0.0713, task1.loss_heatmap: 3.5068, task2.loss_xy: 0.1269, task2.loss_z: 0.2555, task2.loss_whl: 0.1569, task2.loss_yaw: 0.2997, task2.loss_vel: 0.1028, task2.loss_heatmap: 4.0546, task3.loss_xy: 0.1243, task3.loss_z: 0.1803, task3.loss_whl: 0.1468, task3.loss_yaw: 0.2983, task3.loss_vel: 0.0038, task3.loss_heatmap: 2.7066, task4.loss_xy: 0.1246, task4.loss_z: 0.1709, task4.loss_whl: 0.1354, task4.loss_yaw: 0.3036, task4.loss_vel: 0.0753, task4.loss_heatmap: 3.9158, task5.loss_xy: 0.1257, task5.loss_z: 0.1920, task5.loss_whl: 0.1893, task5.loss_yaw: 0.3050, task5.loss_vel: 0.0363, task5.loss_heatmap: 2.8840, loss: 24.8384, grad_norm: 11.3248
2022-07-18 19:13:01,873 - mmdet - INFO - Epoch [1][550/1931] lr: 2.017e-04, eta: 1 day, 6:26:34, time: 2.282, data_time: 0.091, memory: 23265, task0.loss_xy: 0.1262, task0.loss_z: 0.2271, task0.loss_whl: 0.0648, task0.loss_yaw: 0.2987, task0.loss_vel: 0.0934, task0.loss_heatmap: 2.6767, task1.loss_xy: 0.1248, task1.loss_z: 0.2595, task1.loss_whl: 0.1696, task1.loss_yaw: 0.3019, task1.loss_vel: 0.0666, task1.loss_heatmap: 3.4998, task2.loss_xy: 0.1279, task2.loss_z: 0.2516, task2.loss_whl: 0.1545, task2.loss_yaw: 0.3011, task2.loss_vel: 0.1050, task2.loss_heatmap: 4.0685, task3.loss_xy: 0.1240, task3.loss_z: 0.1799, task3.loss_whl: 0.1474, task3.loss_yaw: 0.2976, task3.loss_vel: 0.0038, task3.loss_heatmap: 2.6423, task4.loss_xy: 0.1260, task4.loss_z: 0.1817, task4.loss_whl: 0.1323, task4.loss_yaw: 0.3031, task4.loss_vel: 0.0701, task4.loss_heatmap: 3.9359, task5.loss_xy: 0.1258, task5.loss_z: 0.1919, task5.loss_whl: 0.1865, task5.loss_yaw: 0.3059, task5.loss_vel: 0.0366, task5.loss_heatmap: 2.8677, loss: 24.7766, grad_norm: 11.7977
The loss got saturated at about 25.0, while my w/o accelated version get 16.0 loss at 550 iteration :(
Thanks for your error report and we appreciate it a lot.
Describe the bug
Core dump. After inference around 4000 samples
Reproduction
python tools/test.py ./configs/bevdet/bevdet-sttiny.py ./bevdet-sttiny-pure.pth --eval bbox
Environment
TorchVision: 0.9.1+cu111
OpenCV: 4.5.5
MMCV: 1.4.0
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+f0647e7
Hello, I have been trying to replace BEVDet's QuickCumSum operation with BEVFusion's BEV Pooling operation.
https://github.com/mit-han-lab/bevfusion/tree/main/mmdet3d/ops/bev_pool
To do so, I simply have replaced
BEVDet/mmdet3d/models/necks/view_transformer.py
Lines 164 to 169 in e2f4b40
x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])
Where bev_pool
is BEVFusion's bev_pool cuda operation.
However, I find that although there is significant speed up, the loss is not decreasing as expected (around 14 at end of epoch 5, while it should be around 9.5).
Looking at the papers, they seem to be equivalent pooling operations, but I was hoping for some guidance in case I missed something.
Thank you!
In the paper description, the bev aug operation is implemented by changing feature and gt simultaneously.
In practice, the operations are conducted both on the output feature of the view transformer and the 3D object detection targets to keep their spatial consistency.
However, in the code base, I can only find bev flip code in class RandomFlip3D, in mmdet3d/datasets/pipelines/transforms_3d.py , which change project matrix and gt simultaneously in the dataloader.
So I am confused about the inconsistency between the paper and code.
Hello, I have tried to evaluate released BEVDet checkpoint as-is on my setup, but I get
mAP: 0.2751
mATE: 0.7179
mASE: 0.2738
mAOE: 0.5512
mAVE: 0.8747
mAAE: 0.2205
NDS: 0.3737
Eval time: 107.4s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.441 0.631 0.167 0.131 1.037 0.254
truck 0.197 0.757 0.225 0.125 0.828 0.227
bus 0.283 0.680 0.185 0.139 1.895 0.350
trailer 0.132 1.053 0.224 0.463 0.547 0.068
construction_vehicle 0.066 0.795 0.484 1.174 0.095 0.358
pedestrian 0.301 0.788 0.305 1.320 0.848 0.412
motorcycle 0.235 0.704 0.262 0.612 1.437 0.090
bicycle 0.182 0.607 0.265 0.875 0.310 0.006
traffic_cone 0.445 0.616 0.333 nan nan nan
barrier 0.468 0.547 0.287 0.122 nan nan
which is lower than the expected 30.8/40.4 mAP/NDS.
I am using A6000 GPUs, torch 1.10.1, cudatoolkit 11.3. Do you know what might be the issue?
I find that I have the exact same numbers as #15 @BoLang615, but I believe I am using the latest version. I would appreciate any pointers for this.
Thank you!
Hi @HuangJunJie2017 thanks for sharing this wonderful work. I am trying to reproduce the bevdet-sttiny but find I can not train it with samples_per_gpu=8
on my 3090 GPU because of OOM. The maximum value for samples_per_gpu on my side is 6. The followings are my environment infos
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.9.1+cu111
OpenCV: 4.5.5
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.14.0
MMSegmentation: 0.20.2
MMDetection3D: 0.17.2+f0647e7
Compared with your env, I think the major difference is pytorch 1.9 vs 1.8, I am wondering is this the reason for OOM? have you tried it on pytorch1.8?
Best,
Xuyang.
When you resize the image from (900, 1600) into (256, 704), do you need to make the same change on the bounding box ground truth?
Hello,
Thank you for your amazing work.
I have a question regarding the view transformer module.
So far, I have figured out that it uses nuscenes dataset's sensor2lidar_rotation
and sensor2lidar_translation
information, which are stored into rots
and trans
variables respectively.
My question is what would be the corresponing rotation and translation matrix if I want to use KITTI dataset.
KITTI dataset has following matrices:
P0: camera0 projection matrix after rectification, an 3x4 array
P1: camera1 projection matrix after rectification, an 3x4 array
P2: camera2 projection matrix after rectification, an 3x4 array
P3: camera3 projection matrix after rectification, an 3x4 array
R0_rect: rectifying rotation matrix, an 4x4 array
Tr_velo_to_cam: transformation from Velodyne coordinate to camera coordinate, an 4x4 array
Tr_imu_to_velo: transformation from IMU coordinate to Velodyne coordinate, an 4x4 array }
Which information should I use for trans
and rots
variables?
Thank you for your help.
Reference to bevdet4d-sttiny.py, i want to reproduce the setting resnet101 pretrained bu fcos. I change the img size to 928 * 1600, and use the last two feature in fpn fusion, but when batch size only 1 in my device(48G memory), more than 32G memory has been used. I have already used fp16 in training, but the batch size can only afford 2 in per gpu. Can you help me to reproduce the 64 batch size in your paper, maybe there are some config error in my config file.
Hello,I have met such error when I use your *.pth to infer model
Hi, since there are plenty of models working with transformer-based head, BEVFormer, PolarFormer, PETR, I wonder if you have tried a transformer-based head? I tried one with a swin-T backbone and init it from a pretrain BEVDet-T. The transformer head is similar to the one in the Object-DGCNN. However, the model seems not converge well (ends up with 1.2 mAP) Therefore I wonder if you have some attempts on it :)
Here is my train config:
_base_ = ['../_base_/datasets/nus-3d.py',
'../_base_/schedules/cyclic_20e.py',
'../_base_/default_runtime.py']
# Global
# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
# For nuScenes we usually do 10-class detection
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
data_config={
'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
'Ncams': 6,
'input_size': (256, 704),
'src_size': (900, 1600),
# Augmentation
'resize': (-0.06, 0.11),
'rot': (-5.4, 5.4),
'flip': True,
'crop_h': (0.0, 0.0),
'resize_test':0.04,
}
# Model
grid_config={
'xbound': [-51.2, 51.2, 0.8],
'ybound': [-51.2, 51.2, 0.8],
'zbound': [-10.0, 10.0, 20.0],
'dbound': [1.0, 60.0, 1.0],}
voxel_size = [0.1, 0.1, 0.2]
numC_Trans=64
model = dict(
type='BEVDet',
img_backbone=dict(
type='SwinTransformer',
pretrained='data/pretrain_models/swin_tiny_patch4_window7_224.pth',
pretrain_img_size=224,
embed_dims=96,
patch_size=4,
window_size=7,
mlp_ratio=4,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
strides=(4, 2, 2, 2),
out_indices=(2, 3,),
qkv_bias=True,
qk_scale=None,
patch_norm=True,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.0,
use_abs_pos_embed=False,
act_cfg=dict(type='GELU'),
norm_cfg=dict(type='LN', requires_grad=True),
pretrain_style='official',
output_missing_index_as_none=False),
img_neck=dict(
type='FPN_LSS',
in_channels=384+768,
out_channels=512,
extra_upsample=None,
input_feature_index=(0,1),
scale_factor=2),
img_view_transformer=dict(type='ViewTransformerLiftSplatShoot',
grid_config=grid_config,
data_config=data_config,
numC_Trans=numC_Trans),
img_bev_encoder_backbone = dict(type='ResNetForBEVDet', numC_input=numC_Trans),
img_bev_encoder_neck = dict(type='FPN_LSS',
in_channels=numC_Trans*8+numC_Trans*2,
out_channels=256),
pts_bbox_head=dict(
type='DGCNN3DHead',
num_query=300,
num_classes=10,
in_channels=256,
sync_cls_avg_factor=True,
with_box_refine=True,
as_two_stage=False,
# share_conv_channel=256,
# tasks=[
# dict(num_class=10, class_names=class_names),
# ],
transformer=dict(
type='DeformableDetrTransformer',
encoder=dict(
type='DetrTransformerEncoder',
num_layers=2,
transformerlayers=dict(
type='BaseTransformerLayer',
attn_cfgs=dict(
type='MultiScaleDeformableAttention', embed_dims=256),
feedforward_channels=1024,
ffn_dropout=0.1,
operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
decoder=dict(
type='Deformable3DDetrTransformerDecoder',
num_layers=6,
return_intermediate=True,
transformerlayers=dict(
type='DetrTransformerDecoderLayer',
attn_cfgs=[
dict(
type='MultiheadAttention',
embed_dims=256,
num_heads=8,
dropout=0.1),
dict(
type='MultiScaleDeformableAttention',
embed_dims=256)
],
feedforward_channels=1024,
ffn_dropout=0.1,
operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
'ffn', 'norm')))),
bbox_coder=dict(
type='NMSFreeCoder',
post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
pc_range=point_cloud_range,
max_num=300,
voxel_size=voxel_size,
num_classes=10),
positional_encoding=dict(
type='SinePositionalEncoding',
num_feats=128,
normalize=True,
offset=-0.5),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=2.0),
loss_bbox=dict(type='L1Loss', loss_weight=0.5),
loss_iou=dict(type='GIoULoss', loss_weight=0.0)), # For DETR compatibility.
# model training and testing settings
train_cfg=dict(pts=dict(
grid_size=[1024, 1024, 1],
voxel_size=voxel_size,
point_cloud_range=point_cloud_range,
out_size_factor=8,
assigner=dict(
type='HungarianAssigner3D',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBox3DL1Cost', weight=0.5),
iou_cost=dict(type='IoUCost', weight=0.0), # Fake cost. This is just to make it compatible with DETR head.
pc_range=point_cloud_range))),
test_cfg=dict(
pts=dict(
use_rotate_nms=True,
nms_across_levels=True,
nms_pre=1000,
nms_thr=0.2,
score_thr=0.05,
min_bbox_size=0,
max_num=100)
))
# Data
dataset_type = 'NuScenesDataset'
data_root = 'data/nuscenes/'
file_client_args = dict(backend='disk')
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config),
dict(
type='LoadPointsFromFile',
dummy=True,
coord_type='LIDAR',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.3925, 0.3925],
scale_ratio_range=[0.95, 1.05],
translation_std=[0, 0, 0],
update_img2lidar=True),
dict(
type='RandomFlip3D',
sync_2d=False,
flip_ratio_bev_horizontal=0.5,
flip_ratio_bev_vertical=0.5,
update_img2lidar=True),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectNameFilter', classes=class_names),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['img_inputs', 'gt_bboxes_3d', 'gt_labels_3d'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'lidar2img',
'depth2img', 'cam2img', 'pad_shape',
'scale_factor', 'flip', 'pcd_horizontal_flip',
'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d',
'img_norm_cfg', 'pcd_trans', 'sample_idx',
'pcd_scale_factor', 'pcd_rotation', 'pts_filename',
'transformation_3d_flow', 'img_info'))
]
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config),
# load lidar points for --show in test.py only
dict(
type='LoadPointsFromFile',
coord_type='LIDAR',
load_dim=5,
use_dim=5,
file_client_args=file_client_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points','img_inputs'])
])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img_inputs'])
]
input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False)
data = dict(
samples_per_gpu=8,
workers_per_gpu=4,
train=dict(
type='CBGSDataset',
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'nuscenes_infos_train.pkl',
pipeline=train_pipeline,
classes=class_names,
test_mode=False,
use_valid_flag=True,
modality=input_modality,
load_interval=2,
# we use box_type_3d='LiDAR' in kitti and nuscenes dataset
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='LiDAR',
img_info_prototype='bevdet')),
val=dict(pipeline=test_pipeline, classes=class_names,
modality=input_modality, img_info_prototype='bevdet'),
test=dict(pipeline=test_pipeline, classes=class_names,
modality=input_modality, img_info_prototype='bevdet'))
# Optimizer
lr_config = dict(
policy='cyclic',
target_ratio=(5, 1e-4),
cyclic_times=1,
step_ratio_up=0.4,
)
optimizer = dict(
type='AdamW',
lr=2e-4,
paramwise_cfg=dict(
custom_keys={
'img_backbone': dict(lr_mult=0.1),
'img_neck': dict(lr_mult=0.1),
'img_view_transformer': dict(lr_mult=0.1),
'img_bev_encoder_backbone': dict(lr_mult=0.1),
'img_bev_encoder_neck': dict(lr_mult=0.1),
}),
weight_decay=0.01)
evaluation = dict(interval=6, pipeline=eval_pipeline)
load_from='/nfs/chenzehui/code/BEVDet/work_dirs/bevdet-sttiny/epoch_20.pth'
checkpoint_config = dict(interval=6)
total_epochs = 12
runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
Hi, could you please share some details regarding to the training for the test split? With Swin-small and an input size 768 x 2112, how can you train the model on 3090?
Looking forward to your reply.
First of all, thank you for the open codebase.
sys.platform: linux
Python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0]
CUDA available: True
GPU 0,1: TITAN RTX
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0+cu102
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=s
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/roon -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBIn-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-paramet-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabe-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-TH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF,USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.10.0+cu102
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+
I just use the base config and bevdet-sttiny-accelerated.py and raise the cuda out of memory error.
Hi, I was wondering if you have had any success with incorporating FP16?
I have done some experiments, but the large value of the heatmap loss (>10k) at beginning seems to make it difficult. Further, training randomly nan's hundreds of iterations in.
Also, what is "'/mnt/cfs/algorithm/junjie.huang/models/resnet50-0676ba61.pth'" in the recently released R50 config? Is this just the torchvision pretrained model that can be loaded via checkpoint='torchvision://resnet50'?
Hello, thanks for the great work!
I have found that the inference speed report in bevdet and bevdet4d seems different.
The inference speed in bevdet(704*256) is about 7-8 fps, and in bevdet4d is 15.6 fps, what caused the inconsistency?
You used Bacthsize 4 inference speed or Bachsize 1?
Great work!
Regarding to model training, I tried to use CenterPoint head for 3D detection in BEV space with image inputs. The training loss looks fine to me, however, the resulting AP is 0. Also, I have tried your training configurations, such as the optimizer type and schedules, the outcoming AP is still near 0, making nonsense.
It would be great if you can share me with some insights.
Hi author,
in the extract_img_feat function in BEVDetSequentialES (
BEVDet/mmdet3d/models/detectors/bevdet.py
Line 300 in f0647e7
Thanks a lot!
We observe a variance in reproducing BEVDet-Tiny. It seems the variance is derived from the unstable velocity prediction. We try to collect more reproducing results with this issue.
Hello, would it be possible to get a google drive mirror for the logfiles? I am unable to access them through the baidu link.
Thank you for your work
Many thanks for your great work!
I want to use your pretrained model (name as, bevdet-sttiny-pure.pth) to do inference on nuScenes mini dataset, and I use the following command:
python tools/test.py configs/bevdet/bevdet-sttiny.py pretrained_models/bevdet-sttiny-pure.pth --show --show-dir infer_res/
But I encounter an error:
BEVDet-master/mmdet3d/models/detectors/mvx_two_stage.py", line 466, in show_results
if isinstance(data['points'][0], DC):
KeyError: 'points'
I checked there are only two keys in data, such as dict_keys(['img_metas', 'img_inputs']), while without 'points' key.
Could you please give me some advice on how to do inference?
Hi, Junjie
Thanks for your great work. Sorry to bother you. I have some questions about how to train and do inference on Waymo percaption dataset. I wonder if the code support to do so. If true, do I just need to follow the BEVDet/docs/data_preparation.md, converting Waymo datset into Kitti format and training on it? Or should I modify the exsited code files? Can you give me some detailed instructions?
I have tried running the training and the test within the Docker environment that it is provided in the repo. However in the docker some of the versions were not compatible so I had to change them. Maybe someone had the same error and could give me a hand?
For training I use:
python tools/train.py configs/bevdet/bevdet-sttiny.py
I get then
Traceback (most recent call last):
File "tools/train.py", line 224, in
main()
File "tools/train.py", line 183, in main
test_cfg=cfg.get('test_cfg'))
File "/mmdetection3d/mmdet3d/models/builder.py", line 84, in build_model
return build_detector(cfg, train_cfg=train_cfg, test_cfg=test_cfg)
File "/mmdetection3d/mmdet3d/models/builder.py", line 58, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/opt/conda/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'BEVDet is not in the models registry'
For testing I use:
python tools/test.py configs/bevdet/bevdet-sttiny.py bevdet-sttiny-pure.pth --show --show-dir ./tmp/
I get the following error:
Traceback (most recent call last):
File "tools/test.py", line 226, in
main()
File "tools/test.py", line 164, in main
dataset = build_dataset(cfg.data.test)
File "/mmdetection3d/mmdet3d/datasets/builder.py", line 41, in build_dataset
dataset = build_from_cfg(cfg, DATASETS, default_args)
File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: NuScenesDataset: init() got an unexpected keyword argument 'img_info_prototype'
I am using the following versions:
TorchVision: 0.7.0
OpenCV: 4.6.0
MMCV: 1.3.13
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.2+
In the description of paper, "For monocular paradigms like FCOS3D and PGD, the inference speeds are divided by a factor of 6, as they take each image as an independent sample."
Is the result of DETR3D also divided by 6 in Table2(DETR3D 2.0FPS)? or It should be 12FPS(2*6)
With the current docker, there is a problem between versions of the libraries:
Reproduction
docker build -t mmdetection3d docker/ --no-cache
Then:
docker run --gpus all --shm-size=8g -it -v /mnt/nas/experiments/3D_restoration:/mmdetection3d/data mmdetection3d
What command or script did you run?
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
File "tools/create_data.py", line 6, in <module>
from tools.data_converter import kitti_converter as kitti
File "/mmdetection3d/tools/data_converter/kitti_converter.py", line 9, in <module>
from mmdet3d.core.bbox import box_np_ops, points_cam2img
File "/mmdetection3d/mmdet3d/__init__.py", line 5, in <module>
import mmseg
File "/opt/conda/lib/python3.7/site-packages/mmseg/__init__.py", line 59, in <module>
f'MMCV=={mmcv.__version__} is used but incompatible. ' \
AssertionError: MMCV==1.3.8 is used but incompatible. Please install mmcv>=(1, 3, 13, 0, 0, 0), <=(1, 4, 0, 0, 0, 0).
However, if I am installing a newer version of MMCV, there is another similar error for a different library.
So far the workarounds found do not help to solve it.
System:
Ubuntu 20.04
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.