yikang-li / msdn Goto Github PK

View Code? Open in Web Editor NEW

226.0 226.0 51.0 192 KB

This is our PyTorch implementation of Multi-level Scene Description Network (MSDN) proposed in our ICCV 2017 paper.

Home Page: http://cvboy.com/publication/iccv2017_msdn/

Python 89.92% C++ 0.23% Cuda 3.98% C 5.27% Shell 0.32% MATLAB 0.28%

msdn's People

Contributors

Stargazers

Watchers

msdn's Issues

train_rpn_region.py ??

@yikang-li
Where is train_rpn_region.py??

GPU memory leakage during evaluation

Thanks for your work. I tried to run eval.sh and got an error called "cuda out of memory" .
env:
-- python 2.7
-- pytorch 0.4.1
-- cuda 9.0
-- gpu nvidia titan x

I found it's caused by a class named "RoIPoolFunction(Function)". In "forward" function of this class , there're some assignment expressions such as "self.output = output" and if I comment these expressions , the code works. I guess , when the model running in the eval mode , tensor like "self.output" wont be released (or grad of tensor ?) and memory leakage happens.

ImportError: No module named _roi_pooling

![image](https://user-images.githubusercontent.com/54923253/117979962-0c570e00-b366-11eb-8bdb-21c5dd618a1c.png
Is there missing a file???

Version of code

Thank you for your code
hi

I want to know your version of torch , cuda, python,

best regard
thank you

Results for PredCls and PhrCls

Hi Yikang,

I'm looking to replicate the results for the other visual genome scene graph evaluation modes. To get the test results under your evaluation, I would need to run something like the following, right?

        if args.mode == 'sggen':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=True)
        elif args.mode == 'phrcls':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=False, use_gt_boxes=True, use_gt_regions=False)
        elif args.mode == 'predcls':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=False, use_gt_boxes=True, use_gt_regions=False, only_predicate=True)

I had to change a couple of things too:

Hierarchical_Descriptive_Model.evaluate throws an error when use_gt_boxes=True because im_info is a 1 x 3 tensor. I got the best results when I uncommented the division by the image scale (which makes sense as then the GT boxes are at the same scale as the ROI proposals), can you confirm that e.g. gt_boxes_object = gt_objects[:, :4] is right?
https://github.com/yikang-li/MSDN/blob/master/faster_rcnn/MSDN.py#L118 seems like it contains a bug, because only the top couple of ROIs are overwritten. Can you confirm that it should be changed to object_rois = object_rois_gt?

However, even when I did these things, I can't match your paper results for PredCls and PhrCls. For PredCls for instance, I get around 37% R@50 and 46% R@50. Is there something else you did to get these numbers?

Thanks! -Rowan

Some problems with train & eval

Thank you for your detailed project explanation.

With your provided data and code, I've been trying training and eval.

While training, your readme states:
CUDA_VISIBLE_DEVICES=0 python train_rpn_region.py
You mean train_rpn_region.py as train_rpn.py?

Also, I can't find faster_rcnn.roi_data_layer.roidb

Furthermore, with evaluation code,
RPN_v3 module doesn't exist. Do you mean from RPN_v3 import RPN as from RPN import RPN?

Also, eval still doesn't work with error, I still have following error:

➜  MSDN git:(master) ✗ bash eval.sh
Traceback (most recent call last):
  File "train_hdn.py", line 13, in <module>
    from faster_rcnn.MSDN import Hierarchical_Descriptive_Model
  File "/home/junho/MSDN/faster_rcnn/MSDN.py", line 45, in <module>
    class Hierarchical_Descriptive_Model(MSDN_base):
NameError: name 'MSDN_base' is not defined

eval.sh is your provided eval code

CUDA_VISIBLE_DEVICES=0 python train_hdn.py \   
	--resume_training --resume_model ./pretrained_models/HDN_1_iters_alt_normal_I_LSTM_with_bias_with_dropout_0_5_nembed_256_nhidden_512_with_region_regression_resume_SGD_best.h5 \   
	--dataset_option=normal  --MPS_iter=1 \   
	--caption_use_bias --caption_use_dropout \   
	--rnn_type LSTM_normal

I am referring faster-rcnn code (https://github.com/longcw/faster_rcnn_pytorch) but still need some help because your implementation is fork version of it.
Thank you very much for releasing the code.

GPU memory usage

I am detecting some memory issue when running evaluation on Cuda 9. There is a GPU mem leaks in which the image tensor did not get release after a batch.
On cuda 8 the issue did not appear.

unable to Build the Cython modules for nms and the roi_pooling layer

Hi!
I am studying your paper on MSDN and trying to run your model on my computer. When i try to execute cd MSDN-master/faster_rcnn ./make.sh It gives the following error

Traceback (most recent call last):
File "setup.py", line 59, in
CUDA = locate_cuda()
File "setup.py", line 52, in locate_cuda
for k, v in cudaconfig.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
Compiling roi pooling kernels by nvcc...
./make.sh: line 10: nvcc: command not found
Traceback (most recent call last):
File "build.py", line 3, in
from torch.utils.ffi import create_extension
File "/home/faaiz/anaconda3/lib/python3.7/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Please help if you know anything about the error i am getting

About training set

Hi, thank you for your nice code.
In section 4.1 of the paper, you mention that the training set contains 70998 images, however the provided cleansed Visual Genome dataset contains 46164 images for training. What's the difference between the two dataset and do they provide similar result?
Thank you

Performance of model

Hi Yikang.
I am studying your code.

I ran training code from provided RPN model.
and evaluate trained model using 'eval.sh'

but, I got low performance than provided full model.
My trained model's performance is as follows.
Recall@50 : 8.772%
Recall@100 : 10.908%

Used training command is
CUDA_VISIBLE_DEVICES=0 python train_hdn.py
--load_RPN
--saved_model_path=./output/RPN/RPN_region_full_best.h5
--dataset_option=normal --enable_clip_gradient
--step_size=2
--MPS_iter=1
--caption_use_bias
--caption_use_dropout
--rnn_type LSTM_normal

How can I get normal performance?

Also I found difference from between git code and paper.
paper's MPS_iter is 2, but code's baseline is 1.
paper's message passing method is Message_Passing_Unit_v2(add) , but code's baseline is Message_Passing_Unit_v1(mean).
Could you let me know why there are different?

PredCls, PhrCls

Hi, Thank you for your nice code.
Would you please tell me the method of measuring PredCls and PhrCls?

Failed to use the caption function in faster_rcnn/MSDN.py

when I put an img_path in the caption function in faster_rcnn/MSDN.py, I got the error
"File "/home/wangsijin/projects/new_MSDN/faster_rcnn/RPN.py", line 127, in forward
im_data = Variable(im_data.cuda())
AttributeError: 'numpy.ndarray' object has no attribute 'cuda'"

Then I translated the ndarray to a tensor using "torch.from_numpy()" and got the error
"RuntimeError: Given groups=1, weight[64, 3, 3, 3], so expected input[1, 600, 800, 3] to have 3 channels, but got 600 channels instead"

Then I used the ".transpose()" to transpose the 16008003 tensor to 13600800 tensor. And I got an error again
"File "/home/wangsijin/projects/new_MSDN/faster_rcnn/network.py", line 64, in np_to_variable
v = Variable(torch.from_numpy(x).type(dtype))
RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized dimensions are not supported in PyTorch"

I don't know how to solve it and hope someone can help me. Thanks a lot.

Dataset on your papers and evaluate object detection performance

Hi, I have some question regarding the dataset and evaluate the object detection.
At the moment, I'm using the normal dataset ( train_normal.json, 46164 images) to train a similar Faster-RCNN with the one used in your work for object detection on Visual Genome, then I want to get the same mAP as in your paper (6.72 for Faster RCNN only). From this issue, I know that you used a different dataset in the paper, so I have some question:

How could I get the exact same training and testing dataset as your papers.
How did you evaluate mAP object detection of Faster RCNN and your MSDN?
Thank you!

Build/Dependency Problems

When trying to build faster_rcnn, I get the following error:

~/s/e/M/faster_rcnn> ./make.sh 
  File "setup.py", line 89
    print extra_postargs
                       ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(extra_postargs)?
Compiling roi pooling kernels by nvcc...
Traceback (most recent call last):
  File "build.py", line 3, in <module>
    from torch.utils.ffi import create_extension
  File "/usr/lib/python3.7/site-packages/torch/utils/ffi/__init__.py", line 1, in <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Maybe the readme should explicitly state python2 if that is the issue? Maybe a requirements.txt should be made for making a conda environment if older versions of packages are needed?

I tried using a python=2.7 anaconda env, and got the same result.

Trafic limits error occurs when attempting to download full pre-trained network

Hello, Thank you for your great article and source code.
When I click on the Full pre-trained network link you provided, I get the following error: What should I do?

GRU Unit

Hi, in your code, there is GRU unit. I know it's used to update the feature. But from your code, it seems not exactly like what you said in your paper . To my understanding, there will be a FC layer on object_sub and another different FC layer on object_obj. But in your code, it seems you calculate the average of object_sub and object_obj, and then use one FC layer on that average and another FC on feature_obj. Did I understand something wrong?

GRU_input_feature_object = (object_sub + object_obj) / 2.
out_feature_object = feature_obj + self.GRU_object(GRU_input_feature_object, feature_obj)

Using Pre-trained Models

Hi, Thank you for your nice code.
Would you please upload a code which one can use the pre-trained models by giving a query image? (i.e., one give an image to the code and the code by using the pre-trained models outputs its corresponding sentences).

'Tensor' object has no attribute 'astype'

File "/home/frank/MSDN/faster_rcnn/fast_rcnn/bbox_transform.py", line 78, in bbox_transform_inv_hdn
boxes = boxes.astype(deltas.dtype, copy=False)
AttributeError: 'Tensor' object has no attribute 'astype'

facing ffi error

ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Please suggest me how I can resolve this issue.

loss : nan problem

thank you for your nice code
I am studying your paper MSDN.
so i tried to implement your code.
I successfully implemented your code but, I got this ploblem

Epoch: [0][1000/15000] [lr: 0.01] [Solver: SGD]
Batch_Time: 0.322s FRCNN Loss: nan RPN Loss: 0.9130
[Loss] obj_cls_loss: nan obj_box_loss: nan pred_cls_loss: nan, caption_loss: 9.0235, region_box_loss: nan, region_objectness_loss: nan
[object] tp: 0.00, tf: 0.00, fg/bg=(46/175)
[predicate] tp: 0.00, tf: 0.00, fg/bg=(99/412)
[region] tp: 0.00, tf: 0.00, fg/bg=(48/76)

Most loss is Nan ...
there is no warning in console.
if you know the reason why of this problem..give me some help.

thank you!!

Fail to download data (trained models, cleansed VG dataset)

Hi there,

firstly thank you for your fantastic work! I fail to access the files of 1) trained full model, 2) trained RPN 3) cleansed Visual Genome dataset in your step 4 & 5. The dropbox link seems down. Could you please check for me?

Thanks :)

About Inverse Weight

Hello,
thank for your beautiful codes.

I want to ask you for how can I by myself transform the 'objects' and 'predicate' from unicode to float?
Is there some function that can be used to do it?

Now I'm trying to use your code with ImageNet. I can't find in the Internet how to get the inverse weight.

I'm looking forward to your answer.

Thank you.

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

Hello, yikang-li!
when I runing: CUDA_VISIBLE_DEVICES=0 python train_rpn.py，I get this error：

/home/tp/MSDN-master/faster_rcnn/RPN.py:140: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob = F.softmax(rpn_cls_score_reshape)
/home/tp/.local/lib/python2.7/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_rpn.py", line 194, in
main()
File "train_rpn.py", line 73, in main
train(train_loader, net, optimizer, epoch)
File "train_rpn.py", line 117, in train
target_net(im_data, im_info.numpy(), gt_objects.numpy()[0], gt_regions.numpy()[0])
File "/home/tp/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/tp/MSDN-master/faster_rcnn/RPN.py", line 182, in forward
self.build_loss(rpn_cls_score_reshape, rpn_bbox_pred, rpn_data)
File "/home/tp/MSDN-master/faster_rcnn/RPN.py", line 240, in build_loss
rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) /(fg_cnt + 1e-4)
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

How do I resolve this error? I use python2.7, cuda9.0 ,pytorch0.4.1

yikang-li / msdn Goto Github PK

msdn's People

Contributors

Stargazers

Watchers

Forkers

msdn's Issues

Recommend Projects

Recommend Topics

Recommend Org