Git Product home page Git Product logo

cdn's Introduction

CDN

Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection".

Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Miao Lu, Yongliang Wang, Chen Gao and Xiaobo Li.

Installation

Installl the dependencies.

pip install -r requirements.txt

Data preparation

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

data
 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy
     :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

CDN
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2stage-q64.pth \
        --num_queries 64

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2stage.pth \
        --dataset vcoco

Training

After the preparation, you can start training with the following commands. The whole training is split into two steps: CDN base model training and dynamic re-weighting training. The trainings of CDN-S for HICO-DET and V-COCO are shown as follows.

HICO-DET

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/checkpoint_last.pth \
        --output_dir logs/ \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --obj_reweight \
        --verb_reweight \
        --lr 1e-5 \
        --lr_backbone 1e-6 \
        --use_nms_filter

V-COCO

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage.pth \
        --output_dir logs \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 81 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --num_queries 100 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/checkpoint_last.pth \
        --output_dir logs/ \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 81 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --num_queries 100 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --verb_reweight \
        --lr 1e-5 \
        --lr_backbone 1e-6 \
        --use_nms_filter

Evaluation

HICO-DET

You can conduct the evaluation with trained parameters for HICO-DET as follows.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_cdn_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --eval \
        --use_nms_filter

V-COCO

Firstly, you need the add the following main function to the vsrl_eval.py in data/v-coco.

if __name__ == '__main__':
  import sys

  vsrl_annot_file = 'data/vcoco/vcoco_test.json'
  coco_file = 'data/instances_vcoco_all_2014.json'
  split_file = 'data/splits/vcoco_test.ids'

  vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

  det_file = sys.argv[1]
  vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Next, for the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file with the following command. and then evaluate it as follows.

python generate_vcoco_official.py \
        --param_path pretrained/vcoco_cdn_s.pth \
        --save_path vcoco.pickle \
        --hoi_path data/v-coco \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --use_nms_filter

cd data/v-coco
python vsrl_eval.py vcoco.pickle

Results

HICO-DET

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO) Download
CDN-S (R50) 31.44 27.39 32.64 34.09 29.63 35.42 model
CDN-B (R50) 31.78 27.55 33.05 34.53 29.73 35.96 model
CDN-L (R101) 32.07 27.19 33.53 34.79 29.48 36.38 model

D: Default, KO: Known object

V-COCO

Scenario 1 Scenario 2 Download
CDN-S (R50) 61.68 63.77 model
CDN-B (R50) 62.29 64.42 model
CDN-L (R101) 63.91 65.89 model

Citation

Please consider citing our paper if it helps your research.

@article{zhang2021mining,
  title={Mining the Benefits of Two-stage and One-stage HOI Detection},
  author={Zhang, Aixi and Liao, Yue and Liu, Si and Lu, Miao and Wang, Yongliang and Gao, Chen and Li, Xiaobo},
  journal={arXiv preprint arXiv:2108.05077},
  year={2021}
}

License

CDN is released under the Apache 2.0 license. See LICENSE for additional details.

Acknowledge

Some of the codes are built upon PPDM, DETR and QPIC. Thanks them for their great works!

cdn's People

Contributors

yueliao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cdn's Issues

The model providing is not the best.

Hi~ the CDN-L(R101) model you provide show the result of 63.01(Scenario 1) and 64.45(Scenario 2), which is not same as the result 63.91 and 65.89 in paper. So could you provide the model with best performance? Thanks a lot.

dynamic reweighting causes performance degradation in reproducing

Hi,
thanks for sharing the code! Great work!

I have a small question in reproducing your result.
I run the CDN-S model (res50, 3+3). It gave a result of about 31.5 or 31.2 (I run 2 times) after the first training stage (train the whole model with re gular loss). But after the second training stage (decoupled training) is finished, the performance downgrades to 31.0 and 30.4 for these 2 runs separately. For full mAP, rare mAP and non-rare mAP, this trick seems to be not helpful.

So I wonder what could goes wrong during my reproduction or what can be the reason. I will paste the commands and log below. Thanks. Nice day :3

How to evaluate with KO mode?

Thanks so much for your work! Would it be possible for you to release the codes for evaluating under KO mode? I am running into trouble with that.

Ask for visual.py

Hello,

Thanks for your great work.
I run your code and get the result.json file.
Now I want to visual the image and predicted annotations to analyze the error samples.

Can you provide a visual script?
Thank you!

Confusion about parameter conversion

Hello,
Thanks a lot for your work. As I can see in the mentioned line, you are renaming DETR decoder layer to stage2_decoder. However, in your model there is no stage2_decoder. Does that mean you are not initializing the decoders from DETR?

ps['model'][k.replace('decoder', 'stage2_decoder')] = ps['model'][k].clone()

About the HOI-A dataset?

Thanks for the great work, I was wondering if CDN would work on the HOI-A dataset. Do you have any suggestions if I want to test it? Thanks.

Evaluation of pretrained model on HICO-Det is lower than reported.

Hi, I downloaded your CDN-S/B/L pretrained model, and evaluate them locally with your script.
The results are all lower than the reported results. Specifically,
Model S mAP: 0.3150153102455715 mAP rare: 0.27220402283111156 mAP non-rare: 0.32780309739534524 mean max recall:
0.6390865334740454
Model B mAP: 0.31671305044814135 mAP rare: 0.27124143008976015 mAP non-rare: 0.3302954825032422 mean max recall:
0.6436087461863998
Model L mAP: 0.3196075598370539 mAP rare: 0.2724063291553318 mAP non-rare: 0.3337066287419839 mean max recall:
0.6492344626931886

I'm wondering what caused the inconsistency. what's the version of opencv-python in your environment? I'm using opencv 4.5.1.

corre_vcoco.npy file

Thank you for your good work! Because pycocotools-python2.7 Installation is not supported in my computer so I don't get the converted file. Please provide the corre_vcoco.npy file. Thanks!

Two questions about the detail

Hello, thanks for your research! I have two questions:

  1. CDN give 100 prediction HOI triplets on every image of V-COCO, and then apply PNMS on the top-100 result. After checking your code and I find you don't set threshold to filter the final result. So the final number of predicting HOI triplets may be 40 or 50 at least, but the ground-truth for one image hava no more than 10 HOI triplets. How do you deal with this?

  2. In your paper, you do some useful research about the Human-object Pair Generation (use the HO-pair decoder to replace the Faster-RCNN in iCAN baseline). Recently I'm also want to do the same expriment, but I'm confused about the details. In CDN, we have 100 prediction for every image, So we could apply Hungarian algorithm to match the preds and labels for a batch. But if we use HO-pair decoder to replace Faster-RCNN in iCAN and then set a threshold to generate human-pairs, then may be one image generate k1 HO pairs, but another image generate k2 HO pairs (k1 is not equal to k2). Under these circumstances, we should apply Hungarian algorithm to match preds and labels for every image??

Sorry for long and complex questions.

ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops'

A small tip, in case the future researcher have the similar issue.

my torchvision.__version__ is "0.12.0+cu113"
and this line

if float(torchvision.__version__[:3]) < 0.7:

should be modified as:

# if float(torchvision.__version__[:3]) < 0.7:
if int(torchvision.__version__.split('.')[1]) < 7:  # my torchvision.__version__ is "0.12.0+cu113"
    from torchvision.ops import _new_empty_tensor
    from torchvision.ops.misc import _output_size

Ablation experiments for CDN

Hi, Dr.Liao,

Thank you for your excellent work of your CDN.
We have a problem where you report the performance of different variants on the test set in the CDN to choose the best framework. For example:
1
Do you divide the validation set for Hico-Det, and if not, can you tell me why you did it? We have encountered unprecedented difficulties.

Thank you very much for your consideration. I look forward to hearing from you soon.

Sincerely yours,
yaoyaosanqi.

v-coco

Thank you very much for your outstanding work!
An error occurred when I ran the convert_vcoco_annotations.py script.
vcoco_trainval.json,vcoco_test.json these two files cannot be found.
How should I get them?

Real-time capability

Hello,
Very interesting paper and project.
I would be interested to know how many frames per second the model achieves on which hardware?

Best regards

Code of experiment

Hi!

Thanks for your impressive work! I noticed that there is an experiment which use pair generated by HO-PD while interaction classifier is iCAN. The performance gain(+9mAP) is really impressive! I really want to try it myself. Could you please release the code of this experiment?

Thanks a lot!

Touger

In `SetCriterionHOI` `__init__`, how the initial object (`self.obj_nums_init`) and verb (`self.verb_nums_init`) numbers are set?

In SetCriterionHOI __init__, how the initial object (self.obj_nums_init) and verb (self.verb_nums_init) initial numbers are set? There two arrays for HICO-DET and V-COCO are hard coded here.

Are these just counts over categories from ground truth annotations? I performed an count for the verb categories using the following code snippet. This computation is done at the end of HICODetection __init__ here.

      from collections import Counter
      hoi_annotations = [
          ann['hoi_category_id']
          for annotation in self.annotations
          for ann in annotation['hoi_annotation']
      ]
      hoi_annotations_valid = {
          k: v
          for k, v in Counter(hoi_annotations).items()
          if k in self._valid_verb_ids
      }
      verb_nums = [(k, hoi_annotations_valid[k]) for k in self._valid_verb_ids]
verb_nums = [
(1, 176), (2, 98), (3, 56), (4, 181), (5, 198), (6, 75), (7, 284), (8, 319), (9, 4), (10, 274),
(11, 23), (12, 1995), (13, 138), (14, 121), (15, 27), (16, 17), (17, 116), (18, 87), (19, 2206), (20, 1904),
(21, 2250), (22, 180), (23, 2), (24, 396), (25, 45), (26, 257), (27, 127), (28, 5), (29, 16), (30, 439),
(31, 85), (32, 103), (33, 214), (34, 39), (35, 117), (36, 11), (37, 176), (38, 63), (39, 4051), (40, 757),
(41, 380), (42, 2385), (43, 1626), (44, 44), (45, 2), (46, 401), (47, 154), (48, 160), (49, 540), (50, 51),
(51, 7), (52, 17), (53, 23), (54, 527), (55, 247), (56, 6), (57, 463), (58, 29), (59, 27), (60, 71),
(61, 1519), (62, 992), (63, 6), (64, 1), (65, 226), (66, 27), (67, 1), (68, 367), (69, 60), (70, 94),
(71, 1), (72, 22), (73, 26), (74, 591), (75, 175), (76, 1051), (77, 1), (78, 4), (79, 158), (80, 76),
(81, 6), (82, 70), (83, 49), (84, 3), (85, 1), (86, 102), (87, 52), (88, 112), (89, 80), (90, 1266),
(91, 1), (92, 327), (93, 28), (94, 228), (95, 913), (96, 55), (97, 10), (98, 245), (99, 96), (100, 5),
(101, 5), (102, 52), (103, 32), (104, 49), (105, 6), (106, 127), (107, 504), (108, 5), (109, 2338), (110, 3209),
(111, 235), (112, 97), (113, 2), (114, 12), (115, 11), (116, 326), (117, 26)
]

As shown above, the obtained verb_nums did not match self.verb_nums_init here.

questions about `args.use_matching`

Thank you for the nice work of HOI. I'm currently following your work

I have a question about the args.use_matching. It seems that you did not use the matching_embed when training the model.
Is that true all the reported model were trained with args.use_matching=False ?

Custom Dataset

Hello,

Thanks for this great project! I have a quick question; I'm adapting the CDN code to a custom dataset. However, I can't seem to understand the format. I have converted my dataset to the VCOCO format and I was able to extract the dict keys: (['image_id', 'ann_id', 'role_object_id', 'label', 'action_name', 'role_name', 'include']) but I don't understand the pattern in the trainval.json, train.json, val.json, and test.json.

Results.json file has too much prediction.

Hi, I run the test code for HICO by hico_cdn_s.pth, which got the mAP is 31.3616%, Nice work!

At the same time, it generates a results.json file. However, I review the file and found that predict has too much.
I don't know why is this (I'm a freshman in HOI).

Can you provide your results.json or figure out some advice.

Thank you!

(PS: I will upload my results.json file for you to check. Thanks again.)
image

How to draw a heatmap for decoder?

Sorry to interrupt, but I'm very interested in how to get a heatmap about the decoder like fig.3 mentioned in paper, because the input and output of decoder are both a pair of query(100, N, 256), and the multihead_attn function is torch wrapped. So, I would appreciate any advice.

HICO-DET finetuned Detr

HI thank u for sharing such a nice work!

I currently use HOI models for custom datasets and I wonder if u have plans to share pretrained DETR with fine-tuned on HICO-DET.

Thanks

some code questions

for verb_score in verb_scores:
#64 query corresponding to verb class
#verb_query = np.sum(np.sort(verb_score.numpy())[::-1][:topN])
verb_query = torch.max(verb_score)
#find the highest value corresponding to every query
obj_scores[index] = verb_query #total 9 verb query put to the object_score query
index += 1
thres = np.sort(obj_scores.numpy())[::-1][topN]
keep = obj_scores > thres
print(keep)
out_sub_boxes = outputs['pred_sub_boxes']
out_obj_boxes = outputs['pred_obj_boxes']
What does keep = obj_scores > thres mean? And are my annotations right?

Expected training time

Hi,

Thank you so much for sharing this awesome work!
I was wondering how long it usually takes (i.e. hours or days) to train CDN given the settings explained in the paper.
Also, do you know how sensitive the model is to batch size? For instance, if I wanted to iterate faster and use a larger batch size (while changing the lr accordingly) for training, would that significantly affect the performance?

I guess I could run these experiments myself and figure it out, but I wanted to know in advance before launching jobs given the already limited compute resources.

Thanks!

CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm`

Getting the error, RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling 'cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)' while trying the following training command:

python -m torch.distributed.launch \
        --nproc_per_node=1 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

I am using python 3.7, CUDA 10.1.

Questions about the interactive score

Thx for your contribution to HOI. As I am reading your paper and codes, I raise a question that confuses me:
I didn't find the variable in the codebase corrsponding to the interactive score. I also failed to find the training of this score. The postprocessing seems to use only the verb score and the maximum object score. Could you tell me where you define the variable and how you use it?

vs = vs * os.unsqueeze(1)

Versions

@YueLiao, Can you please mention the following versions used for this repository.

  • python
  • cuda
  • cython
  • opencv-python

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.