liewfeng / imted Goto Github PK

View Code? Open in Web Editor NEW

66.0 2.0 8.0 6.95 MB

[ICCV 2023] Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Home Page: https://arxiv.org/abs/2205.09613

License: Apache License 2.0

Python 99.84% Shell 0.08% Dockerfile 0.09%

few-shot-object-detection iccv2023 object-detection vision-transformer

imted's Introduction

imTED: Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Code of our ICCV 2023 paper Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection. Blog in Chinese is available here.

The code is based on mmdetection, please refer to get_started.md and MMDET_README.md to set up the environment and prepare the data.

Config Files and Performance and Trained Weights

We provide 9 configuration files in the configs directory.

Config File	Backbone	Epochs	Box AP	Mask AP	Download
imted_faster_rcnn_vit_small_3x_coco	ViT-S	36	48.2		model
imted_faster_rcnn_vit_base_3x_coco	ViT-B	36	52.9		model
imted_faster_rcnn_vit_large_3x_coco	ViT-L	36	55.4		model
imted_mask_rcnn_vit_small_3x_coco	ViT-S	36	48.7	42.7	model
imted_mask_rcnn_vit_base_3x_coco	ViT-B	36	53.3	46.4	model
imted_mask_rcnn_vit_large_3x_coco	ViT-L	36	55.5	48.1	model
imted_faster_rcnn_vit_base_2x_base_training_coco	ViT-B	24	50.6		model
imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco	ViT-B	108	23.0		model
imted_faster_rcnn_vit_base_2x_finetuning_30shot_coco	ViT-B	108	30.4		model

MAE Pre-training

The pre-trained model is trained with the official MAE code. For ViT-S, we use a 4-layer decoder with dimension 256 for 800 epochs of pre-training. For ViT-B, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the official MAE weight. For ViT-L, we use an 8-layer decoder with dimension 512 for 1600 epochs of pre-training. Pre-trained weights can be downloaded from the official MAE weight.

Last Step of Preparation

For all experiments, remember to modify the path of pre-trained weights in the configuration files, e.g. configs/imted/imted_faster_rcnn_vit_small_3x_coco.py.

For few-shot experiments, please refer to FsDet for data preparation. Remember to modify the path of json in the configuration files, e.g. configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py. Json files used for few-shot training and evaluation can also be downloaded from here.

Evaluating with 1 GPU

tools/dist_test.sh "path/to/config/file.py" "path/to/trained/weights.pth" 1 --eval bbox

Training with 8 GPUs

tools/dist_train.sh "path/to/config/file.py" 8

Few-shot Training with 8 GPUs

Base Training

tools/dist_train.sh configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_base_training_coco.py 8

Finetuning

Replace the the ckeckpoint path of your own checkpoint from base training or just use our provided checkpoint here.

tools/dist_train.sh configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_30shot_coco.py 8

Acknowledgement

This project is based on MAE, mmdetection and timm. Thanks for their wonderful works.

Some works based on imTED

Citation

If you find imTED is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@inproceedings{liu2023integrally,
  title={Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection},
  author={Liu, Feng and Zhang, Xiaosong and Peng, Zhiliang and Guo, Zonghao and Wan, Fang and Ji, Xiangyang and Ye, Qixiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6825--6834},
  year={2023}
}

imted's People

Contributors

Stargazers

Watchers

Forkers

bohao-lee zhangxiaosong18 all-for-code cv-det noticeable zp1018 lifeng9472 hellopoohpooh

imted's Issues

A question about the finetuning few shot configuration regarding the FsDet preperation: "Which annotation file to provide is not clear"

I have a custom dataset with following summary:
| class | images boxes |
│ car │ 8144 │ 188839 │
│ person │ 9228 │139964│
│ Total │ 9946 | 328803 │

There are 2 base classes in my dataset. Later I prepared another class with 355 images and 2000~ boxes and merged this into my main dataset. Then I prepared my split using the prepare_coco_few_shot.py. And have the exact format as mentioned for my 3 classes.

Training:
I trained my model without the 3rd (novel class) [namely I had 2 classes while training] using theconfigs/imted/imted_faster_rcnn_vit_base_mae_3x_coco.py and obtained a pretrained model let's call it X.
I proceeded to finetune my model on the few shot task for the novel classes. Created a configuration file containing all 3 classes with the novel one being the last. I am confused about how to fill the part below in the few shot configuration file:

pretrained = "path/to/the/X.pth"
.....

classes =  ('truck', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 
'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 
'tennis racket', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 
'pizza', 'donut', 'cake', 'bed', 'toilet', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 
'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'boat', 
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'bottle', 'chair', 'couch', 'potted plant', 'dining table', 'tv')
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        pipeline=train_pipeline,
        classes = classes, 
        ann_file='path/of/annotations/json'),
    val=dict(
        classes = classes, 
        ann_file='path/of/annotations/json'),
    test=dict(
        classes = classes, 
        ann_file='path/of/annotations/json'))

I did the below configuration at first:

classes =  ('person', 'car', 'mynovel')
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        pipeline=train_pipeline,
        classes = classes, 
        ann_file= 'data/cocosplit/seed1/full_box_30shot_mynovel_trainval.json',
        img_prefix='data/merged_dataset/all/'), 
    val=dict(
        classes = classes, 
        ann_file='data/merged_dataset/annotations/fsod_test.json',
        img_prefix='data/merged_dataset/all/'), 
    test=dict(
        classes = classes, 
        ann_file= 'data/merged_dataset/annotations/fsod_test.json', 
        img_prefix='data/merged_dataset/all/'))

However, the results I am getting at epoch 102 is:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.004
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.003
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.043
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.043
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.043
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.003
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.030
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.120

Note that, my prior classes (car and person) got around 0.724 AP score for above metrics.

Also I left load_from = None as I didn't know what to put there because I only got one pretrained model and I am training the model that I can load right now.

I also considered training with the fsod_train but I have more than 30 shots for the novel class there. I checked the original trainvalno5k.json and it's also the same.

To summarize, I am having problems writing the configuration file for the few shot. I think, how you utilize the datasplit/{trainvalno5k, 5k}.json and the novel shots (the json files under the cocosplit/seeds) is ambiguous and would be very happy if you could shed a light on this topic.

Weights of ViT-S

Hello, I am trying to reproduce your work however I can't find the weights for the ViT-S model. The weights for ViT-B and ViT-L are available but not for ViT-S.

Would it be possible to provide a link to download the ViT-S weights?

Thanks

Evaluation for few-shot object detection

Hi,
On how many images did you evaluate for novel classes for few-shot object detection on the MS-COCO benchmark (60 base classes and 20 novel classes)?
And how did you obtain that from this link : http://dl.yf.io/fs-det/datasets/cocosplit/

Best

KeyError: 'LayerDecayOptimizerConstructorBackboneFronzen is not in the optimizer builder registry'

Excuse me, but I got the following error while training with imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco.py. Can you help explain it? thank you

@LiewFeng

route

May I ask what file path should be placed here

I got an error when I used the command to evaluate

I got an error when I used the command to evaluate bash ./tools/dist_test.sh work_dirs/imted_faster_rcnn_vit_base_2x_finetuning_CID/imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco.py work_dirs/imted_faster_rcnn_vit_base_2x_finetuning_CID/epoch_18.pth 1 --eval bbox

I had some problems visualizing the detection

Hello, I'm sorry to bother you again. I find that your paper has no visual inspection effect. Have you done anything about that? Or is there a demo

COCO JSON file formatting for finetuning

Hi, The config file for finetuning accepts only 1 json for train but in the data preparation steps show that there are multiple json files (http://dl.yf.io/fs-det/datasets/cocosplit/) for each class.
Can you please let me know the steps how you combined these files ?

Problem when finetuning few-shot model

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
Apparently, the LayerDecayOptimizerConstructorBackboneFronzen is not in the optimizer builder registry. Where can I find it?

Reproduction

I use the following command

python tools/train.py configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco.py

Error traceback

KeyError: 'LayerDecayOptimizerConstructorBackboneFronzen is not in the optimizer builder registry'

I tried to search for the LayerDecayOptimizerConstructorBackboneFronzen in the code but I don't find it. Please tell me where can I find the code associated with this layer.

Thanks.

json

data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( pipeline=train_pipeline, classes = classes, ann_file='path/of/annotations/json'), val=dict( classes = classes, ann_file='path/of/annotations/json'), test=dict( classes = classes, ann_file='path/of/annotations/json'))

Hello, I would like to ask how JSON needs to be configured,I changed to full_ Box_ 10shot_ Frisbee_ The error has been reported continuously since trainval.json, as follows：

`2023-10-29 15:52:05,608 - mmdet - INFO - load model from: pre/mae_pretrain_vit_base_full.pth
2023-10-29 15:52:05,954 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: mask_token, decoder_pos_embed, norm.weight, norm.bias, decoder_embed.weight, decoder_embed.bias, decoder_blocks.0.norm1.weight, decoder_blocks.0.norm1.bias, decoder_blocks.0.attn.qkv.weight, decoder_blocks.0.attn.qkv.bias, decoder_blocks.0.attn.proj.weight, decoder_blocks.0.attn.proj.bias, decoder_blocks.0.norm2.weight, decoder_blocks.0.norm2.bias, decoder_blocks.0.mlp.fc1.weight, decoder_blocks.0.mlp.fc1.bias, decoder_blocks.0.mlp.fc2.weight, decoder_blocks.0.mlp.fc2.bias, decoder_blocks.1.norm1.weight, decoder_blocks.1.norm1.bias, decoder_blocks.1.attn.qkv.weight, decoder_blocks.1.attn.qkv.bias, decoder_blocks.1.attn.proj.weight, decoder_blocks.1.attn.proj.bias, decoder_blocks.1.norm2.weight, decoder_blocks.1.norm2.bias, decoder_blocks.1.mlp.fc1.weight, decoder_blocks.1.mlp.fc1.bias, decoder_blocks.1.mlp.fc2.weight, decoder_blocks.1.mlp.fc2.bias, decoder_blocks.2.norm1.weight, decoder_blocks.2.norm1.bias, decoder_blocks.2.attn.qkv.weight, decoder_blocks.2.attn.qkv.bias, decoder_blocks.2.attn.proj.weight, decoder_blocks.2.attn.proj.bias, decoder_blocks.2.norm2.weight, decoder_blocks.2.norm2.bias, decoder_blocks.2.mlp.fc1.weight, decoder_blocks.2.mlp.fc1.bias, decoder_blocks.2.mlp.fc2.weight, decoder_blocks.2.mlp.fc2.bias, decoder_blocks.3.norm1.weight, decoder_blocks.3.norm1.bias, decoder_blocks.3.attn.qkv.weight, decoder_blocks.3.attn.qkv.bias, decoder_blocks.3.attn.proj.weight, decoder_blocks.3.attn.proj.bias, decoder_blocks.3.norm2.weight, decoder_blocks.3.norm2.bias, decoder_blocks.3.mlp.fc1.weight, decoder_blocks.3.mlp.fc1.bias, decoder_blocks.3.mlp.fc2.weight, decoder_blocks.3.mlp.fc2.bias, decoder_blocks.4.norm1.weight, decoder_blocks.4.norm1.bias, decoder_blocks.4.attn.qkv.weight, decoder_blocks.4.attn.qkv.bias, decoder_blocks.4.attn.proj.weight, decoder_blocks.4.attn.proj.bias, decoder_blocks.4.norm2.weight, decoder_blocks.4.norm2.bias, decoder_blocks.4.mlp.fc1.weight, decoder_blocks.4.mlp.fc1.bias, decoder_blocks.4.mlp.fc2.weight, decoder_blocks.4.mlp.fc2.bias, decoder_blocks.5.norm1.weight, decoder_blocks.5.norm1.bias, decoder_blocks.5.attn.qkv.weight, decoder_blocks.5.attn.qkv.bias, decoder_blocks.5.attn.proj.weight, decoder_blocks.5.attn.proj.bias, decoder_blocks.5.norm2.weight, decoder_blocks.5.norm2.bias, decoder_blocks.5.mlp.fc1.weight, decoder_blocks.5.mlp.fc1.bias, decoder_blocks.5.mlp.fc2.weight, decoder_blocks.5.mlp.fc2.bias, decoder_blocks.6.norm1.weight, decoder_blocks.6.norm1.bias, decoder_blocks.6.attn.qkv.weight, decoder_blocks.6.attn.qkv.bias, decoder_blocks.6.attn.proj.weight, decoder_blocks.6.attn.proj.bias, decoder_blocks.6.norm2.weight, decoder_blocks.6.norm2.bias, decoder_blocks.6.mlp.fc1.weight, decoder_blocks.6.mlp.fc1.bias, decoder_blocks.6.mlp.fc2.weight, decoder_blocks.6.mlp.fc2.bias, decoder_blocks.7.norm1.weight, decoder_blocks.7.norm1.bias, decoder_blocks.7.attn.qkv.weight, decoder_blocks.7.attn.qkv.bias, decoder_blocks.7.attn.proj.weight, decoder_blocks.7.attn.proj.bias, decoder_blocks.7.norm2.weight, decoder_blocks.7.norm2.bias, decoder_blocks.7.mlp.fc1.weight, decoder_blocks.7.mlp.fc1.bias, decoder_blocks.7.mlp.fc2.weight, decoder_blocks.7.mlp.fc2.bias, decoder_norm.weight, decoder_norm.bias, decoder_pred.weight, decoder_pred.bias

missing keys in source state_dict: fpn1.0.weight, fpn1.0.bias, fpn1.1.weight, fpn1.1.bias, fpn1.1.running_mean, fpn1.1.running_var, fpn1.3.weight, fpn1.3.bias, fpn2.0.weight, fpn2.0.bias

2023-10-29 15:52:05,979 - mmdet - INFO - loading checkpoint for <class 'mmdet.models.roi_heads.bbox_heads.mae_bbox_head.MAEBBoxHead'>
Use load_from_local loader
2023-10-29 15:52:06,150 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: cls_token, mask_token, decoder_norm.weight, decoder_norm.bias, decoder_pred.weight, decoder_pred.bias

missing keys in source state_dict: fc_cls.weight, fc_cls.bias, fc_reg.weight, fc_reg.bias, decoder_box_norm.weight, decoder_box_norm.bias

Use load_from_local loader
Traceback (most recent call last):
File "./tools/test.py", line 220, in
main()
File "./tools/test.py", line 177, in main
checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 513, in load_checkpoint
checkpoint = _load_checkpoint(filename, map_location, logger)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 451, in _load_checkpoint
return CheckpointLoader.load_checkpoint(filename, map_location, logger)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 244, in load_checkpoint
return checkpoint_loader(filename, map_location)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 261, in load_from_local
checkpoint = torch.load(filename, map_location=map_location)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/serialization.py", line 585, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/serialization.py", line 755, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xe2'.
Traceback (most recent call last):
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in
main()
File "/root/miniconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/open-mmlab/bin/python', '-u', './tools/test.py', '--local_rank=0', 'configs/imted/few_shot/imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco.py', 'work_dirs/imted_faster_rcnn_vit_base_2x_finetuning_10shot_coco/mae_vit_small_800e.pth', '--launcher', 'pytorch', '--eval', 'bbox']' returned non-zero exit status 1.`

Codes for the baseline （ViT based Faster R-CNN）

Hi, very thanks for releasing the codes for imTED.

I am very interested in the comparison between the ViT-based Faster R-CNN and the imTED, however, I did not find the codes for the baseline. Could you kindly provide the codes related to the baseline e.g., the model and the training? Thanks for your time and consideration!

Number of Parameters

Hi, what is the total number of parameters in your models?

About the environment?

Thank you for your work,i would llike to know the environment of your project, such as the version of mmcv, it seems can't work with high version.

The detection head uses a randomly initialized decoder instead of a decoder using pretrained MAE weights

Hello, excuse me, do you have a config file that does not use the decoder with MAE pre-trained weights, but uses the decoder with randomly initialized parameters as the detection header

ValueError: imTED: checkpoint path pre/Weights/ is invalid

Hello, I would like to consult. I put MAE pre-training weights, but it still gives an error. Could you help me explain it

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/dfs/data/imTED/mmdet/models/detectors/imted.py", line 20, in init
super(imTED, self).init(
File "/dfs/data/imTED/mmdet/models/detectors/two_stage.py", line 48, in init
self.init_weights(pretrained=pretrained)
File "/dfs/data/imTED/mmdet/models/detectors/two_stage.py", line 68, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/dfs/data/imTED/mmdet/models/backbones/vision_transformer.py", line 132, in init_weights
raise ValueError(f"checkpoint path {pretrained} is invalid")
ValueError: checkpoint path pre/Weights/ is invalid

question about load checkpoint

hello, thanks for you excellent work first! i met some question when loading weights, there are so many unexpect keys as below, is this normal?

About the few shot results on voc0712 dataset

Thank you for your contribution, did you conduct the few shot study on voc0712 dataset?

mmcv

May I ask how many versions of 'mmcv' need to be downloaded? The prompt for me to download 2.0.0 is that I cannot use 'Config' and 'DictAction', as well as' mmcv. parallel 'and' mmcv. runner '

Did other shots try? Like 20shot

Hello, is there any difference between the configs files of 10shot and 30shot? Or I want to test the effect in 20shot. What should I change

How to fed the roi feature after MFM into MAE decoder？

I am trying to migrate this project (ImTed) to mmdetection3.x. However, I have some doubts about how to send roi features after the Multi-scale Feature Modulator to mae decoder.
I found that the bbox_feats' shape are like torch.Size([512, 384, 7, 7])，but the self.bbox_head, i.e., the mae decoder requires inputs like x.shape = [b_s, dim, W, H], how to solve this?

def _bbox_forward(self, x, rois):
        """Box head forward function used in both training and testing."""
        # TODO: a more flexible way to decide which feature maps to use
        if self.with_mfm:
            ss_bbox_feats = self.ss_bbox_roi_extractor(
                [x[-1]], rois)
            x = [self.mfm_fc(x[i]) for i in range(self.ms_bbox_roi_extractor.num_inputs)]
            ms_bbox_feats = self.ms_bbox_roi_extractor(
                x[:self.ms_bbox_roi_extractor.num_inputs], rois) # multi scale

            factor = self.mfm_factor.reshape(1, -1, 1, 1).expand_as(ms_bbox_feats)
            bbox_feats = ss_bbox_feats + ms_bbox_feats * factor
        else:
            bbox_feats = self.bbox_roi_extractor(
                x[:self.bbox_roi_extractor.num_inputs], rois)
        if self.with_shared_head:
            bbox_feats = self.shared_head(bbox_feats)
        cls_score, bbox_pred = self.bbox_head(bbox_feats)

Excuse me, are there any test results on the PASCAL VOC dataset?

Hello, I see that you only tested the effect on coco dataset, have you experimented on PASCAL VOC? Or can you provide the weights you trained on the coco dataset? thank you