Git Product home page Git Product logo

vc-r-cnn's People

Contributors

wangt-cn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vc-r-cnn's Issues

Inference to single image

Hi,
I am trying to modify your code to inference to single image (with given path and detected boxes coordinates in xyxy form).
Here is what I have added to your inference.py:

image
image

However, when I run the test_net.py, which will call the single_inference function, I faced this error:
RuntimeError: size mismatch, m1: [43008 x 7], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

Full traceback:
File "test_net.py", line 188, in
main()
File "test_net.py", line 145, in main
single_inference(model, img, boxes, "cuda")
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\engine\inference.py", line 184, in single_inference
prediction = single_image_compute(model, img, boxes, device, inference_timer)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\engine\inference.py", line 84, in single_image_compute
output = model(pil_img.to(device), targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\detector\generalized_rcnn.py", line 61, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\roi_heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\box_head.py", line 52, in forward
class_logits_causal_list = self.causal_predictor(x, proposals)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 96, in forward
xzs = [self.z_dic(feature_pre_obj, dic_z, prior) for feature_pre_obj in feature_split]
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 96, in
xzs = [self.z_dic(feature_pre_obj, dic_z, prior) for feature_pre_obj in feature_split]
File "D:\Video_Surveillance_project\VC-R-CNN\vc_rcnn\modeling\roi_heads\box_head\roi_box_predictors.py", line 111, in z_dic
attention = torch.mm(self.Wy(y), self.Wz(dic_z).t()) / (self.embedding_size ** 0.5)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "D:\Anaconda3\envs\vc_rcnn\lib\site-packages\torch\nn\functional.py", line 1372, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [43008 x 7], m2: [2048 x 1024] at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/pytorch_1579082551706/work/aten/src\THC/generic/THCTensorMathBlas.cu:290

Please help me to understand your model better ^^ Thank you ;)

Can you kindly provide VC features on VCR dataset?

Sir, thank you for your great work and it insights me a lot. My current reaseach topic is visual commonsense reasoning, so I hope you can kindly provide extracted VC features on VCR dataset for me.

questions about dict Z

  1. Z is built by making average on RoIs of the same class, how to deal with the different size of RoIs in one class?
  2. How can I get the dict Z of my customized dataset?

Code for making dic_coco.npy and the prior stat_prob.npy

  1. Could you also share the code for making dic_coco.npy and the prior stat_prob.npy? Thanks

  2. And in order to construct dic_coco.npy with ground-truth bboxes, I should modify the modeling/detector/generalized_rcnn.py in maskrcnn-benchmark as following, right?

        # we directly use bounding box coordinates from ground truth label
        if self.training:
            proposals = [target for target in targets]
        else:
            devices = features[0].get_device()
            proposals = [target.to(devices) for target in targets]

Hyperparameters for Multi-GPU training

Hi, @Wangt-CN

I use the following commands to perform multi-gpu training:

export NGPUS=4
CUDA_VISIBLE_DEVICES=2,3,4,5 python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

Should I add the hyperparameters from the single-gpu training command?

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

some problems

Hello author, I would like to ask how Figure 7 in the paper was drawn, and where does the feature mentioned in Figure 7 represent the output of the selected model?

Where "last_checkpoint" should be modified to reflect the absolute path of "model_final.pth"?

"2. Using our pretrained VC model on COCO

Here we also provide our pretrained VC model. You can put it into the model dictionary and set the last_checkpoint with the absolute path of model_final.pth. Then run the command:

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "path/to/config/file.yaml" TEST.IMS_PER_BA"

Can you please describe where "last_checkpoint" should be modified to reflect the absolute path of "model_final.pth"?

Seeking suggestion regarding combining VC features with other visual features

Hi,
Thanks for your fantastic work! I am trying to apply your work for videos. For that, I am trying to combine the VC features with the I3D features. While doing so, I am facing a few challenges. First of all, I have seen that for each frame of a video I get VC features with Nx1024 size where N represents the detected bounding boxes in the object which doesn't match with the size of I3D features. So, I was doing elementwise addition of all the features of the N bounding boxes to get a single feature representation of shape 1024.

Do you think it's a good idea? Will the features be preserved if I do addition like this? If not, do you have a better idea on how to do it so that I can combine with the I3D features?

Thanks!

The problem in reproducing the results of image captioning

Hi,
Thank you for your great work.

I'm trying to reproduce the results of image captioning by following steps:

  1. Download Karpathy splits of COCO, and run the code of "scripts/prepro_labels.py" to prepare the data.
  2. Download the Bottom-up and VC features with your link.
  3. Train the model with the cross entropy loss:
    "python train.py --id topdown --caption_model topdown --input_json data/cocotalk.json --input_label_h5 data/cocotalk_label.h5 --input_att_dir_vc [the/path/to/VC_Feature/trainval] --input_att_dir [the/path/to/Updown_Feature] --batch_size 50 --learning_rate 3e-4 --checkpoint_path log_topdown --save_checkpoint_every 2200 --val_images_use 5000 --rnn_size 2048 --input_encoding_size 1024 --max_epochs 30 --language_eval 1"
  4. Evaluate the model with the code:
    python eval.py --model log_topdown/model-best.pth --infos_path log_topdown/infos_topdown-best.pkl --dump_images 0 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 50 --split test
  5. The results are:
    {'Bleu_1': 0.7625701835246635, 'Bleu_2': 0.6021042790224688, 'Bleu_3': 0.46398074453035226, 'Bleu_4': 0.35592428819070027, 'METEOR': 0.27917788348120276, 'ROUGE_L': 0.566515050577319, 'CIDEr': 1.136820918673527, 'bad_count_rate': 0.0014}
    which are much lower than the reported.

So, my question is that are there any important settings when reproducing the reported results?

Up_Down_VC downstream task

Hi,
Thank you for such a great open-source here.
I have succeeded in making an inference to the raw image base on ruotianluo 's codebase.
I am also interest on using your VC Feature there, so I guess it will work with VC concatenated with UpDown Feature on MSCOCO dataset.
About raw data inference, it is mentioned by ruotianluo that it will not work with UpDown Feature. Therefore, is it possible to concatenate your VC Feature with the pre-trained ResNet features, or we should use another mechanism? Thank you very much.

code

excuse me,when will you show your code,sorry to disturb you

error: identifier "AT_CHECK" is undefined

I followed the instructions to install VC-R-CNN, however when I run the last command python setup.py build develop in install.sh, I got errors:

error: identifier "AT_CHECK" is undefined

I tried the solution in another issue , I replaced AT_CHECK by TORCH_CHECK in file vc_rcnn/csrc/cuda/deform_conv_cuda.cu and vc_rcnn/csrc/cuda/deform_pool_cuda.cu and it worked.

Actually, I don't know why this happened.
If anyone get the same error, I hope it helps.

nan loss while training

I use
CUDA_VISIBLE_DEVICES=2 python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" --skip-test SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000

this command to follow your instruction and I use coco 2017 train and val data.

While training, the loss keeps around 8 and did not drop.
after 6000 steps, the model spits nan loss.

do you have any idea why nan loss is coming?
What is the problem?

Problems with following install.md file while reproducing

Hello! First of all, thank you for your hard work.

While I am reproducing VC R-CNN, there was a big problem.

In the INSTALL.md file, there is a command like
"conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0"
and this command doesn't work.

Since I do not know the exact version of pytorch-nightly, so I can not follow the other command also.
(Docker commands are also do not work)

So, I carefully ask you to share the complete docker image file (not the docker file in the /docker directory)

Regards.

cannot import name '_C' from 'vc_rcnn'

'_C' in 'vc_rcnn/' appears to be missing. Command line and error is below:

$ python3 tools/test_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH images_per_gpu x $GPUS
Traceback (most recent call last):
File "tools/test_net.py", line 13, in
from vc_rcnn.data import make_data_loader
File "./vc_rcnn/data/init.py", line 2, in
from .build import make_data_loader
File "./vc_rcnn/data/build.py", line 11, in
from . import datasets as D
File "./vc_rcnn/data/datasets/init.py", line 3, in
from .coco import COCODataset
File "./vc_rcnn/data/datasets/coco.py", line 8, in
from vc_rcnn.structures.segmentation_mask import SegmentationMask
File "./vc_rcnn/structures/segmentation_mask.py", line 5, in
from vc_rcnn.layers.misc import interpolate
File "./vc_rcnn/layers/init.py", line 10, in
from .nms import nms
File "./vc_rcnn/layers/nms.py", line 3, in
from vc_rcnn import _C
ImportError: cannot import name '_C' from 'vc_rcnn' (./vc_rcnn/init.py)

transformer refining

Hello. Thanks for your great work! It's really a big contribution to the CV community.
You've mentioned that the performance is worse when you feed your concatenated features (BU + VC) to the transformer refining model in AoANet directly. Have you tried refining first (running only BU features through the AoANet refiner), getting the refined features, and then concatenating those refined features with the VC features?

_C.DIC_FILE not found when run test_net.py

Hi,
Thank you for your great work.
I have read your tutorial to run the test_net.py. However, I don't know how to get the pre-prepared dictionary file path for intervention (numpy format) (default is: '/.../dic_coco.npy')
So could you please guide me where could I download these files in Parameters of VC and change the directory? Thank you.

36 VC Features per image

大佬,请问能够提供36 VC Features per image的版本吗,我想做vqa方面的研究,想尝试使用您的VC特征,但希望能是每张图36个对象的版本

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.