wangt-cn / vc-r-cnn Goto Github PK

View Code? Open in Web Editor NEW

350.0 14.0 61.0 2.4 MB

[CVPR 2020] The official pytorch implementation of ``Visual Commonsense R-CNN''

License: MIT License

Dockerfile 0.30% Python 87.50% Shell 0.56% HTML 0.30% C++ 2.21% Cuda 9.14%

vc-r-cnn's Introduction

Visual Commonsense R-CNN (VC R-CNN)

[NEW]: We have provided the training code of VC R-CNN and detailed readme file. 🌟

[NEW]: the VC Feature pretrained on MSCOCO is provided. Just have a try! 🌟

This repository contains the official PyTorch implementation and the proposed VC feature for CVPR 2020 Paper "Visual Commonsense R-CNN". For technical details, please refer to:

Visual Commonsense R-CNN
Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Key Words: Causal Intervention; Visual Common Sense; Representation Learning
[Paper], [Zhihu Article], [15min Slides], [Video]

Bibtex

If you find our VC feature and code helpful, please kindly consider citing:

@inproceedings{wang2020visual,
  title={Visual commonsense r-cnn},
  author={Wang, Tan and Huang, Jianqiang and Zhang, Hanwang and Sun, Qianru},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10760--10770},
  year={2020}
}

Overview
- Highlights
- What can you get from this repo? [The Road Map]
VC Feature
- Pretrained on COCO
- Downstream Vision & Language Tasks
VC R-CNN Framework
Our experience you may need

Overview

This project aims to build a visual commonsense representation learning framework based on the current object detection codebase with un-/self-supervised learning.

Highlights:

VC Feature
- Effective: Our visual commonsense representation encodes the``sense-making'' knowledge between object RoIs with causal intervention rather than just trivial correlation prediction. Compared to the previous widely used Up-Down Feature, our VC can be regarded as an effective supplementary knowledge that models the interaction between objects for the downstream tasks.
- Easy to Use: As we introduced in our paper, the VC Feature is extracted by providing the RoI boxes coordinates. Then the VC Feature can be just concatenated on the previous visual object features (e.g., Up-Down Feature) and ready to roll. (Ps: the concatenation maybe too simple for some cases or tasks, users can try something else and welcome feedback.)
- Easy to Expand: With a learned VC R-CNN framework, we can easily extract VC Features for any images and prepare them as an ``augmentation feature'' for the currently used representation conveniently.
VC R-CNN
- Fast, Memory-efficient, Multi-GPU: Our VC R-CNN framework is based on the well-known maskrcnn-benchmark from facebook. Therefore, our VC R-CNN just inherit all its advantages. (It's pity that the detectron2 had not been released when I am working on this project, however maskrcnn-benchmark can be a stable version.)
- Support customized dataset: Users can easily add COCO-style datasets to train VC R-CNN on other images.

What can you get from this repo? [The Road Map]

I want to use your VC Feature pretrained on COCO:
- Download the VC Feature on COCO and concatenate it on Up-Down feature for usage.
- You can also try other methods to use the VC Feature rather than just concatenation.
I want to retrain your VC R-CNN on COCO:
- Perform Training on COCO Dataset
I want to train the VC R-CNN on my own dataset and extract VC Features:

VC Feature

For easy-to-use, here we directly provide the pretrained VC Features on the entire MSCOCO dataset based on the Up-Down feature's boxes in the below links (The link is updated). The features are stored in tar.gz format. The previous features can be found in OLD_FEATURE.

10-100 VC Features per image:

COCO 2014 Train/Val Image Features (123K / 5G) Google Drive
COCO 2014 Testing Image Features (41K / 2G) Google Drive
COCO 2015 Testing Image Features (81K / 3G) Google Drive

10-100 Updown Features per image:

For those who may have no access to the Up-Down feature, here we also provide the Updown feature here. Then you can directly use numpy.concatenate (The feature dimension is 3072 : 2048+1024):

COCO 2014 Train/Val Image Features (123K / 21G) Google Drive
COCO 2014 Testing Image Features (41K / 7G) Google Drive
COCO 2015 Testing Image Features (81K / 13G) Google Drive

10-100 Updown Boxes

For users can extract VC Features if they want, here we also provide the Updown feature box coordinates:

COCO 2014 Train/Val Image Boxes Google Drive
COCO 2014 Testing Image Features Google Drive
COCO 2015 Testing Image Features Google Drive

How to use after download

Unzip the file with command:

tar -xzvf file_name

The feature format (The shape of each numpy file is [n x 1024]):

coco_trainval/test_year
 |---image_id1.npy
 |---image_id2.npy
  ...
 |---image_idN.npy

Concatenate on the previous feature in the downstream task training.

Downstream Vision & Language Tasks

Please check Downstream Tasks for more details:

Some tips for using in downstream tasks

We recommend users to add the dimension of the start multi-layers (embedding layer, fc and so on) in the downstream networks since the feature size add from 2048 to 3072 (for Up-Down Feature).
The learning rate can be slighted reduced.
We find the self-attentive operation on feature (e.g., the refining encoder in AoANet) may hurt the effectiveness of our VC Feature. Details can be kindly found at the bottom of Page 7 in our paper.
The concatenation of Up-Down and VC Feature maybe too simple for some downstream tasks. It can be regarded as a baseline and I believe there would be more potential on VC Feature.

VC R-CNN Framework

Installation

Please check INSTALL.md for installation instructions.

Perform Training on COCO Dataset

Prepare Training Data

First, you need to download the COCO dataset and annotations. We assume that you save them in /path_to_COCO_dataset/
Then you need modify the path in vc_rcnn/config/paths_catalog.py, containing the DATA_DIR and DATASETS path.

Training Parameters

default.py: OUTPUT_DIR denotes the model output dir. TENSORBOARD_EXPERIMENT is the tensorboard loger output dir. Another parameter the user may need notice is the SOLVER.IMS_PER_BATCH which denotes the number of total images per batch.
Config file (e.g., e2e_mask_rcnn_R_101_FPN_1x.yaml): The main parameters the user may pay attention to is the training schedule and learning rate, and the used dataset.
Parameters about VC: They are in the end of default.py with annotations. Users can make changes according to their own situation.

Running

Most of the configuration files that we provide assume that we are running 2 images on each GPU with 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities:

1. Single GPU Training: Modify the cfg parameters. Here is an example:

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" --skip-test SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

Ps: To running more images on one GPU, you can refer to the [maskrcnn-benchmark].

2. Multi-GPU training: The maskrcnn-benchmark directly support the multi-gpu training with torch.distributed.launch. You can run the command like (you need change $NGPUS to the num of GPU you use):

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "path/to/config/file.yaml" --skip-test MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN images_per_gpu x 1000

Notes:

In our experiments, we adopted e2e_mask_rcnn_R_101_FPN_1x.yaml without the Mask Branch (set False) as our config file.
When training VC, actually we need not test scripts, thus we set --skip-test to skip the test process after training. The test script is used to extract vc feature. Or if you design your own test, you can remove --skip-test.
The MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN denotes that the proposals are selected for per the batch rather than per image in the default training. The value is calculated by 1000 x images-per-gpu. Here we have 2 images per GPU, therefore we set the number as 1000 x 2 = 2000. If we have 8 images per GPU, the value should be set as 8000. See #672@maskrcnn-benchmark for more details.
Please note that the learning rate & iteration change rule follows the scheduling rules from Detectron, which means the lr need to be set 2x if the number of GPUs become 2x. In our methods, the learning rate is set for 4 GPUs and each GPU has 2 images.
In my practice, the learning rate can not be best customized since the VC training is not a supervised model and you cannot measure the goodness of the VC model from training procedure. We have provide a general suitable learning rate and you can make some slight modification.
You can turn on the Tensorboard logger by add --use-tensorboard into command (Need to install tensorflow and tensorboardx first).
The confounder dictionary dic_coco.npy and the prior stat_prob.npy are in the tools.

Evaluation (Feature Extraction)

1. Using your own model

Since the goal of our VC R-CNN is to train the visual commonsense representations by self-supervised learning, we have no metrics for evaluation and we treat it as the feature extraction process.

Specifically, you can just run the following command to achieve the features.

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "path/to/config/file.yaml" TEST.IMS_PER_BATCH images_per_gpu x $GPUS

Please note that before running, you need to set the suitable path for BOUNDINGBOX_FILE and FEATURE_SAVE_PATH in default.py. (Recall that just given image and bounding box coordinate, our VC R-CNN can extract the VC Feature)

2. Using our pretrained VC model on COCO

Here we also provide our pretrained VC model. You can put it into the model dictionary and set the last_checkpoint with the absolute path of model_final.pth. Then run the command:

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "path/to/config/file.yaml" TEST.IMS_PER_BATCH images_per_gpu x $GPUS

Add your Customized Dataset

1. Training on customized dataset

For learning VC Feature on your own dataset, the crux is to make your own dataset COCO-style (can refer to the data format in detection task) and design the dataloader file, for example coco.py and openimages.py. Here we provide an example for reference.

from vc_rcnn.structures.bounding_box import BoxList

class MyDataset(object):
    def __init__(self, ...):
        # load the paths and image annotation files you will need in __getitem__

    def __getitem__(self, idx):
        # load the image as a PIL Image
        image = ...

        # load the bounding boxes as a list of list of boxes.
        boxes = [[0, 0, 10, 10], [10, 20, 50, 50]]
        # and labels
        labels = torch.tensor([10, 20])

        # create a BoxList from the boxes. Please pay attention to the box FORM (XYXY or XYWH or another)
        boxlist = BoxList(boxes, image.size, mode="xyxy")
        # add the labels to the boxlist
        boxlist.add_field("labels", labels)
		# Here you can also add many other characters to the boxlist in addition to the labels, for example `image_id', `category_id' and so on.
        
        if self.transforms:
            image, boxlist = self.transforms(image, boxlist)

        # return the image, the boxlist and the idx in your dataset
        return image, boxlist, idx

    def get_img_info(self, idx):
        # get img_height and img_width. This is used if
        # we want to split the batches according to the aspect ratio
        # of the image, as it can be more efficient than loading the
        # image from disk
        return {"height": img_height, "width": img_width}

Then, you need modify the following files:

vc_rcnn/data/datasets/__init__.py: add it to __all__
vc_rcnn/config/paths_catalog.py: DatasetCatalog.DATASETS and corresponding if clause in DatasetCatalog.get()

2. Extracting features of customized dataset

Recall that with the trained VC R-CNN, we can directly extract VC Features given raw images and bounding box coordinates. Therefore, the method to design dataloader is similar to the above. The only difference is you may want to load box coordinates file for feature extraction and the labels, classes is unnecessary.

You can also refer to openimages.py and vcr.py.

3. Some Tips and Traps

As our experiment results shown in paper, training our VC R-CNN on a larger dataset cannot bring much gain to the downstream tasks on other datasets. The probable reason maybe the COCO is enough to learn the commonsense feature for its downstream tasks. Therefore we suggest users: if you want to perform downstream tasks on Dataset A, you can firstly train our VC on the Dataset A.
When you design the Dataloader file, the most important thing is to pay attention to the box format (XYXY or XYWH) and adopt the correct command to load them. I have made this mistakes at the beginning of my project.

Our experience you may need

Here we provide our experience (mainly the failure 2333) in training our VC R-CNN and using VC Feature. And we hope it can provide some help or possible ideas to the users to further develop this field :)

1. For Training VC R-CNN

After reading the paper MoCo: Momentum Contrast for Unsupervised Visual Representation Learning by Kaiming He, I have tried to construct a better dictionary learning scheme for our VC Feature self-supervised learning. In our current implementation, the dictionary keep constant during training and we want to borrow the idea from MoCo to UPDATE the confounder dictionary iteratively. Since the iterative step (How often is the dictionary updated) can be set arbitrarily, I have tried a few steps but the result is similar. We want to further explore this in our future work.
We have tried to add an 'Observation Window' into data sampling, which means for each image we just sample a window contains, for example 10 objects randomly each time. We want the model can learn the latent spatial relationship at the same time, however, the results can be worse.

2. For the VC Feature

As we discussed in our paper, our VC Feature achieves a less significant gain on VQA task than that for image captioning. We thought the possible reason can be the limited ability of the current question understanding. We are also wondering if we can train the Vision & Language Commonsense Representations in the future.
In all the downstream tasks, we just concatenate the VC Feature on the previous Up-Down feature. This operation maybe too simple and I believe it does NOT reach VC 's full potential. My own bandwidth is limited but I know if more researchers try to use it and design more suitable downstream models, maybe we can create more better results :)
The evaluation for the feature (self-supervised learning) can be too trivial and hard. This is the problem for all the self-supervised learning problem. The model performance cannot be estimated in training procedure. And if the things you want to learn is the feature (just like us), you need to evaluate it on many downstream tasks. Therefore, how to find a more effective way to evaluate the learning process can be a good point for research.

Acknowledgement

I really appreciate Kaihua Tang, Yulei Niu, Xu Yang, Jiaxin Qi, Xinting Hu and Dong Zhang for their greatly helpful advice and lending me GPUs!

If you have any questions or concerns, please kindly email to Tan Wang.

vc-r-cnn's People

Contributors

Stargazers

Watchers

vc-r-cnn's Issues

Inference to single image

Hi,
I am trying to modify your code to inference to single image (with given path and detected boxes coordinates in xyxy form).
Here is what I have added to your inference.py:

However, when I run the test_net.py, which will call the single_inference function, I faced this error:
RuntimeError: size mismatch, m1: [43008 x 7], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

Full traceback:
File "test_net.py", line 188, in
main()
File "test_net.py", line 145, in main
single_inference(model, img, boxes, "cuda")
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\engine\inference.py", line 184, in single_inference
prediction = single_image_compute(model, img, boxes, device, inference_timer)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\engine\inference.py", line 84, in single_image_compute
output = model(pil_img.to(device), targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\detector\generalized_rcnn.py", line 61, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\roi_heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\box_head.py", line 52, in forward
class_logits_causal_list = self.causal_predictor(x, proposals)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 96, in forward
xzs = [self.z_dic(feature_pre_obj, dic_z, prior) for feature_pre_obj in feature_split]
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 96, in
xzs = [self.z_dic(feature_pre_obj, dic_z, prior) for feature_pre_obj in feature_split]
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 111, in z_dic
attention = torch.mm(self.Wy(y), self.Wz(dic_z).t()) / (self.embedding_size ** 0.5)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [43008 x 7], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

Please help me to understand your model better ^^ Thank you ;)

error: identifier "AT_CHECK" is undefined

I followed the instructions to install VC-R-CNN, however when I run the last command python setup.py build develop in install.sh, I got errors:

error: identifier "AT_CHECK" is undefined

I tried the solution in another issue , I replaced AT_CHECK by TORCH_CHECK in file vc_rcnn/csrc/cuda/deform_conv_cuda.cu and vc_rcnn/csrc/cuda/deform_pool_cuda.cu and it worked.

Actually, I don't know why this happened.
If anyone get the same error, I hope it helps.

Links to VC feature not working

Hi. Thanks for your great work!

The links to VC feature are not working. Can you please update them again?

Hyperparameters for Multi-GPU training

Hi, @Wangt-CN

I use the following commands to perform multi-gpu training:

export NGPUS=4
CUDA_VISIBLE_DEVICES=2,3,4,5 python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

Should I add the hyperparameters from the single-gpu training command?

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

Problems with following install.md file while reproducing

Hello! First of all, thank you for your hard work.

While I am reproducing VC R-CNN, there was a big problem.

In the INSTALL.md file, there is a command like
"conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0"
and this command doesn't work.

Since I do not know the exact version of pytorch-nightly, so I can not follow the other command also.
(Docker commands are also do not work)

So, I carefully ask you to share the complete docker image file (not the docker file in the /docker directory)

Regards.

Seeking suggestion regarding combining VC features with other visual features

Hi,
Thanks for your fantastic work! I am trying to apply your work for videos. For that, I am trying to combine the VC features with the I3D features. While doing so, I am facing a few challenges. First of all, I have seen that for each frame of a video I get VC features with Nx1024 size where N represents the detected bounding boxes in the object which doesn't match with the size of I3D features. So, I was doing elementwise addition of all the features of the N bounding boxes to get a single feature representation of shape 1024.

Do you think it's a good idea? Will the features be preserved if I do addition like this? If not, do you have a better idea on how to do it so that I can combine with the I3D features?

Thanks!

Up_Down_VC downstream task

Hi,
Thank you for such a great open-source here.
I have succeeded in making an inference to the raw image base on ruotianluo 's codebase.
I am also interest on using your VC Feature there, so I guess it will work with VC concatenated with UpDown Feature on MSCOCO dataset.
About raw data inference, it is mentioned by ruotianluo that it will not work with UpDown Feature. Therefore, is it possible to concatenate your VC Feature with the pre-trained ResNet features, or we should use another mechanism? Thank you very much.

The links to "10-100 VC Features per image" and "10-100 Updown Features per image" are invalid

Hi, the links to "10-100 VC Features per image" and "10-100 Updown Features per image" are invalid. Can you update new links?
Thanks!

nan loss while training

I use
CUDA_VISIBLE_DEVICES=2 python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" --skip-test SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

this command to follow your instruction and I use coco 2017 train and val data.

While training, the loss keeps around 8 and did not drop.
after 6000 steps, the model spits nan loss.

do you have any idea why nan loss is coming?
What is the problem?

transformer refining

Hello. Thanks for your great work! It's really a big contribution to the CV community.
You've mentioned that the performance is worse when you feed your concatenated features (BU + VC) to the transformer refining model in AoANet directly. Have you tried refining first (running only BU features through the AoANet refiner), getting the refined features, and then concatenating those refined features with the VC features?

Can you kindly provide VC features on VCR dataset?

Sir, thank you for your great work and it insights me a lot. My current reaseach topic is visual commonsense reasoning, so I hope you can kindly provide extracted VC features on VCR dataset for me.

code

excuse me,when will you show your code,sorry to disturb you

VC Feature's link dead

The VC Feature in Google Drive is dead,would you please update them again?

Code for making dic_coco.npy and the prior stat_prob.npy

Could you also share the code for making dic_coco.npy and the prior stat_prob.npy? Thanks
And in order to construct dic_coco.npy with ground-truth bboxes, I should modify the modeling/detector/generalized_rcnn.py in maskrcnn-benchmark as following, right?

        # we directly use bounding box coordinates from ground truth label
        if self.training:
            proposals = [target for target in targets]
        else:
            devices = features[0].get_device()
            proposals = [target.to(devices) for target in targets]

Where can I get 'BOUNDINGBOX_FILE'?

Hello. Thanks for your great work!

Where can I get 'BOUNDINGBOX_FILE' cocobu_box which is needed for feature extraction?

some problems

Hello author, I would like to ask how Figure 7 in the paper was drawn, and where does the feature mentioned in Figure 7 represent the output of the selected model?

The problem in reproducing the results of image captioning

Hi,
Thank you for your great work.

I'm trying to reproduce the results of image captioning by following steps:

Download Karpathy splits of COCO, and run the code of "scripts/prepro_labels.py" to prepare the data.
Download the Bottom-up and VC features with your link.
Train the model with the cross entropy loss:
"python train.py --id topdown --caption_model topdown --input_json data/cocotalk.json --input_label_h5 data/cocotalk_label.h5 --input_att_dir_vc [the/path/to/VC_Feature/trainval] --input_att_dir [the/path/to/Updown_Feature] --batch_size 50 --learning_rate 3e-4 --checkpoint_path log_topdown --save_checkpoint_every 2200 --val_images_use 5000 --rnn_size 2048 --input_encoding_size 1024 --max_epochs 30 --language_eval 1"
Evaluate the model with the code:
python eval.py --model log_topdown/model-best.pth --infos_path log_topdown/infos_topdown-best.pkl --dump_images 0 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 50 --split test
The results are:
{'Bleu_1': 0.7625701835246635, 'Bleu_2': 0.6021042790224688, 'Bleu_3': 0.46398074453035226, 'Bleu_4': 0.35592428819070027, 'METEOR': 0.27917788348120276, 'ROUGE_L': 0.566515050577319, 'CIDEr': 1.136820918673527, 'bad_count_rate': 0.0014}
which are much lower than the reported.

So, my question is that are there any important settings when reproducing the reported results?

questions about dict Z

Z is built by making average on RoIs of the same class, how to deal with the different size of RoIs in one class?
How can I get the dict Z of my customized dataset?

_C.DIC_FILE not found when run test_net.py

Hi,
Thank you for your great work.
I have read your tutorial to run the test_net.py. However, I don't know how to get the pre-prepared dictionary file path for intervention (numpy format) (default is: '/.../dic_coco.npy')
So could you please guide me where could I download these files in Parameters of VC and change the directory? Thank you.

36 VC Features per image

大佬，请问能够提供36 VC Features per image的版本吗，我想做vqa方面的研究，想尝试使用您的VC特征，但希望能是每张图36个对象的版本

cannot import name '_C' from 'vc_rcnn'

'_C' in 'vc_rcnn/' appears to be missing. Command line and error is below:

$ python3 tools/test_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH images_per_gpu x $GPUS
Traceback (most recent call last):
File "tools/test_net.py", line 13, in
from vc_rcnn.data import make_data_loader
File "./vc_rcnn/data/init.py", line 2, in
from .build import make_data_loader
File "./vc_rcnn/data/build.py", line 11, in
from . import datasets as D
File "./vc_rcnn/data/datasets/init.py", line 3, in
from .coco import COCODataset
File "./vc_rcnn/data/datasets/coco.py", line 8, in
from vc_rcnn.structures.segmentation_mask import SegmentationMask
File "./vc_rcnn/structures/segmentation_mask.py", line 5, in
from vc_rcnn.layers.misc import interpolate
File "./vc_rcnn/layers/init.py", line 10, in
from .nms import nms
File "./vc_rcnn/layers/nms.py", line 3, in
from vc_rcnn import _C
ImportError: cannot import name '_C' from 'vc_rcnn' (./vc_rcnn/init.py)

How can I use pretrained VC-R-CNN for inference on a specify image?

As the title said, I would like to know the way to use pretrained model to generate caption for a specify image.

Where "last_checkpoint" should be modified to reflect the absolute path of "model_final.pth"?

"2. Using our pretrained VC model on COCO

Here we also provide our pretrained VC model. You can put it into the model dictionary and set the last_checkpoint with the absolute path of model_final.pth. Then run the command:

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "path/to/config/file.yaml" TEST.IMS_PER_BA"

Can you please describe where "last_checkpoint" should be modified to reflect the absolute path of "model_final.pth"?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.