chushihyun / mt-detr Goto Github PK
View Code? Open in Web Editor NEW[WACV 2023] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion: Official Pytorch Implementation
[WACV 2023] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion: Official Pytorch Implementation
Hello,
Getting an error while running an inference python tools/test.py configs/mt_detr/mt_detr_c+l+r.py checkpoint/model/mt_detr_c+l+r.pth
.
Traceback (most recent call last):
File "tools/test.py", line 251, in <module>
main()
File "tools/test.py", line 216, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
File "/home/user/user/mt_detr/mmdet/apis/test.py", line 36, in single_gpu_test
imgs = tensor2imgs(img_tensor, **img_metas[0]['img_norm_cfg'])
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/image/misc.py", line 32, in tensor2imgs
assert len(mean) == 3
AssertionError
It seems like it is defined for 3 modalities, so it has 3 values. For Camera Only
, it works fine.
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53, 123.675, 116.28, 103.53,123.675, 116.28, 103.53]
,std=[58.395, 57.12, 57.375, 58.395, 57.12, 57.375, 58.395, 57.12, 57.375], to_rgb=False)
I tried with the same values as Camera Only
but encountered the below error. So later in the code, it expects in the above order only. So, there is no meaning to changing it.
Traceback (most recent call last):
File "tools/test.py", line 251, in <module>
main()
File "tools/test.py", line 216, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
File "/home/user/user/mt_detr/mmdet/apis/test.py", line 25, in single_gpu_test
for i, data in enumerate(data_loader):
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/user/mt_detr/mmdet/datasets/custom.py", line 192, in __getitem__
return self.prepare_test_img(idx)
File "/home/user/user/mt_detr/mmdet/datasets/custom.py", line 235, in prepare_test_img
return self.pipeline(results)
File "/home/user/user/mt_detr/mmdet/datasets/pipelines/compose.py", line 40, in __call__
data = t(data)
File "/home/user/user/mt_detr/mmdet/datasets/pipelines/test_time_aug.py", line 106, in __call__
data = self.transforms(_results)
File "/home/user/user/mt_detr/mmdet/datasets/pipelines/compose.py", line 40, in __call__
data = t(data)
File "/home/user/user/mt_detr/mmdet/datasets/pipelines/transforms.py", line 665, in __call__
results[key] = mmcv.imnormalize(results[key], self.mean, self.std,
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/image/photometric.py", line 22, in imnormalize
return imnormalize_(img, mean, std, to_rgb)
File "/home/user/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/image/photometric.py", line 42, in imnormalize_
cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img) # inplace
cv2.error: OpenCV(4.8.1) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::<unnamed>::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'
> Invalid number of channels in input image:
> 'VScn::contains(scn)'
> where
> 'scn' is 9
Any help would be appreciated.
Thanks,
K
Hello,
While trying to train a model, I am getting the below registry error. Any help would be highly appreciated.
Thanks,
K
Update:
This issue was only observed after installing the apex from NVIDIA. I followed these steps to install apex.
Without apex, it's working perfectly fine.
2023-10-28 13:26:22,048 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
GPU 0: Tesla V100-PCIE-16GB
CUDA_HOME: /usr/local/cuda-10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.89
GCC: gcc (GCC) 10.1.0
PyTorch: 1.10.0+cu102
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu102
OpenCV: 4.8.1
MMCV: 1.3.17
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.14.0+b76d7cd
------------------------------------------------------------
2023-10-28 13:26:24,328 - mmdet - INFO - Distributed training: False
2023-10-28 13:26:26,626 - mmdet - INFO - Config:
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 4200), (500, 4200), (600, 4200)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=1),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=1),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=1,
workers_per_gpu=8,
train=dict(
type='CocoDataset',
ann_file='data/coco_annotation/train_clear_simple.json',
img_prefix='data/cam_stereo_left_lut/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='AutoAugment',
policies=[[{
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
'multiscale_mode':
'value',
'keep_ratio':
True
}],
[{
'type': 'Resize',
'img_scale': [(400, 4200), (500, 4200),
(600, 4200)],
'multiscale_mode': 'value',
'keep_ratio': True
}, {
'type': 'RandomCrop',
'crop_type': 'absolute_range',
'crop_size': (384, 600),
'allow_negative_crop': True
}, {
'type':
'Resize',
'img_scale': [(480, 1333), (512, 1333),
(544, 1333), (576, 1333),
(608, 1333), (640, 1333),
(672, 1333), (704, 1333),
(736, 1333), (768, 1333),
(800, 1333)],
'multiscale_mode':
'value',
'override':
True,
'keep_ratio':
True
}]]),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=1),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
],
filter_empty_gt=False,
classes=('Vehicle', 'Pedestrian')),
val=dict(
type='CocoDataset',
ann_file='data/coco_annotation/val_clear_simple.json',
img_prefix='data/cam_stereo_left_lut/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=1),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
],
classes=('Vehicle', 'Pedestrian')),
test=dict(
type='CocoDataset',
ann_file='data/coco_annotation/test_clear_simple.json',
img_prefix='data/cam_stereo_left_lut/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=1),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
],
classes=('Vehicle', 'Pedestrian')))
evaluation = dict(interval=1, metric='bbox')
checkpoint_config = dict(interval=1)
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
type='DeformableDETR',
backbone=dict(
type='CameraOnly',
net1='ConvNeXt',
net2='ResNet',
net3='ResNet',
args1=dict(
in_chans=3,
depths=[3, 3, 27, 3],
dims=[128, 256, 512, 1024],
drop_path_rate=0.7,
layer_scale_init_value=1.0,
out_indices=(1, 2, 3),
pretrained='checkpoint/convnext_base_22k_1k_384.pth'),
args2=dict(
depth=50,
num_stages=4,
base_channels=1,
out_indices=(1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(
type='Pretrained', checkpoint='torchvision://resnet50')),
args3=dict(
depth=50,
num_stages=4,
base_channels=1,
out_indices=(1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(
type='Pretrained', checkpoint='torchvision://resnet50'))),
neck=dict(
type='ChannelMapper',
in_channels=[256, 512, 1024],
kernel_size=1,
out_channels=256,
act_cfg=None,
norm_cfg=dict(type='GN', num_groups=32),
num_outs=4),
bbox_head=dict(
type='DeformableDETRHead',
num_query=300,
num_classes=2,
in_channels=2048,
sync_cls_avg_factor=True,
as_two_stage=True,
transformer=dict(
type='DeformableDetrTransformer',
encoder=dict(
type='DetrTransformerEncoder',
num_layers=6,
transformerlayers=dict(
type='BaseTransformerLayer',
attn_cfgs=dict(
type='MultiScaleDeformableAttention', embed_dims=256),
feedforward_channels=1024,
ffn_dropout=0.1,
operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
decoder=dict(
type='DeformableDetrTransformerDecoder',
num_layers=6,
return_intermediate=True,
transformerlayers=dict(
type='DetrTransformerDecoderLayer',
attn_cfgs=[
dict(
type='MultiheadAttention',
embed_dims=256,
num_heads=8,
dropout=0.1),
dict(
type='MultiScaleDeformableAttention',
embed_dims=256)
],
feedforward_channels=1024,
ffn_dropout=0.1,
operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
'ffn', 'norm')))),
positional_encoding=dict(
type='SinePositionalEncoding',
num_feats=128,
normalize=True,
offset=-0.5),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=2.0),
loss_bbox=dict(type='L1Loss', loss_weight=5.0),
loss_iou=dict(type='GIoULoss', loss_weight=2.0),
with_box_refine=True),
train_cfg=dict(
assigner=dict(
type='HungarianAssigner',
cls_cost=dict(type='FocalLossCost', weight=2.0),
reg_cost=dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0))),
test_cfg=dict(max_per_img=100))
optimizer = dict(
constructor='LearningRateDecayOptimizerConstructor',
type='AdamW',
lr=0.0001,
betas=(0.9, 0.999),
weight_decay=0.05,
paramwise_cfg=dict(
decay_rate=0.7, decay_type='layer_wise_multi', num_layers=12))
optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))
lr_config = dict(policy='step', step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
classes = ('Vehicle', 'Pedestrian')
work_dir = '/home/username/username/link_scratch_dir/username/model_weights/mt_detr_weights/work_dirs/camera_only_single_gpu'
gpu_ids = range(0, 1)
Traceback (most recent call last):
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmdet/models/detectors/deformable_detr.py", line 9, in __init__
super(DETR, self).__init__(*args, **kwargs)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmdet/models/detectors/single_stage.py", line 31, in __init__
self.backbone = build_backbone(backbone)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmdet/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
raise KeyError(
KeyError: 'CameraOnly is not in the models registry'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 188, in <module>
main()
File "tools/train.py", line 158, in main
model = build_detector(
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmdet/models/builder.py", line 57, in build_detector
return DETECTORS.build(
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/home/username/anaconda3/envs/mt_detr/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "DeformableDETR: 'CameraOnly is not in the models registry'"
Any news on when pretrained models and code will be released?
Hello, thank you for your excellent work! I'm having trouble when trying to do multi-GPU training on your work, considering that your experiments are conducted on a single Nvidia A6000 GPU, I'm wondering that if you have tried parallel training on multi GPUs, or if you woud be kind enough to help me solving this problem. Thanks!
Hello,
Do we need an apex
installed for MT-DETR? I keep on getting the apex is not installed
text on the terminal.
I tried installing with pip install -v --no-cache-dir .
, but it messes up with my environment. I suspect #2 issue is because of apex installation only.
PyTorch: 1.10.0+cu102
Python: 3.8.18
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.