msracver / deformable-convnets Goto Github PK

View Code? Open in Web Editor NEW

4.0K 4.0K 952.0 9.21 MB

Deformable Convolutional Networks

License: MIT License

Python 73.88% Batchfile 0.01% Shell 0.01% Makefile 0.02% C 0.40% Cuda 18.09% C++ 7.59%

deformable-convnets's People

Contributors

Stargazers

Watchers

Forkers

defypp knhuq lixingyi8947 terrychenism wanjinchang tornadomeet dreadlord1984 pustar baiyancheng20 ishansohony aitechnology zhunzhong07 marchong daviehr loretoparisi johndpope codeaudit jalused holdlen2dh jing-luo hope-yao lyp100208 wjgaas liyangdev arsenluca bigworld123 caomw ml-lab stillkeeptry ancientmooner ilovecv hq01 phoenix1917 guoyilin sarah20187 geekrick88 dydjw9 stupidzz naivescript 646677064 kekedan ifighting tony32769 notlaughinggirl cuhk-pjs carrierlxk zbxzc35 zw-shen josephkj chrisyang jia-honghenrylee zhoudaqing mornydew firestonelib michaelfeng87 lixalm gd-zhang mursalal yiranzhong hli2020 hkcaesar bengzh xupengcoding youhebuke bharatsingh430 yukkysaito kingofoz yuechengli b2220333 jaloveapple tucciresearch zgsxwsdxg runngezhang ducta-qc 123chengbo shengchun bikong2 dushulang lilhope fyang26 holygen dengshuo lijuan123 justinhochn opencvfun githubfragments leempan feitiandemiaomi unsky tonysy midasc doubleking0625 htkseason saitamandd woyaofeixiang zhec liyi14 v-italy peterouzh wucpmark

deformable-convnets's Issues

Would you open source your deformable-conv and deformable-roipool in caffe?

Questions about the deform_psroi_demo.py

In deform_psroi_demo.py, I have two questions about the details:
1.Why the offset of bounding box(red box) comes from output of rfcn_cls_offset layer rather than the output of rfcn_bbox_offset layer? I think the latter is more related to sub bbox location, or because the rfcn_cls_offset is related to foreground object?
2.Why set the value of trans_std to 0.1 in function show_dpsroi_offset? Thank you!!

The question of offset

Hi, in the deformable convolution, the offset is from the output data of the configured convolution layer. I am curious why you process it in that way rather than add some parameters likely adding weight parameter to deforableconv?

Does faster rcnn implement supports class-agnostic and ohem?

In faster_rcnn/cfgs/resnet_v1_101_v712_rcnn_end2end.yaml, I see the two options are set as false, but I think it does support class-agnostic and ohem. So I set those two options as true and conducted the training process, but the detection result are very poor, that to say only a few object are detected.

mxnet compile failed

System configuration:Ubuntu14.04,cuda8,cudnn5
Error info when compiling mxnet:
/usr/include/c++/4.8/bits/stl_vector.h:919:7: note: no known conversion for argument 1 from ‘nnvm::dim_t* {aka long int*}’ to ‘unsigned int*&&’
make: *** [build/src/operator/custom/custom.o] Error 1
Does anyone know how to fix it?

Demo issue

Traceback (most recent call last): File "./rfcn/demo.py", line 129, in <module> main() File "./rfcn/demo.py", line 50, in main sym = sym_instance.get_symbol(config, is_train=False) File "/mnt/Deformable-ConvNets/rfcn/symbols/resnet_v1_101_rfcn_dcn.py", line 725, in get_symbol relu1 = self.get_resnet_v1_conv5(conv_feat) File "/mnt/Deformable-ConvNets/rfcn/symbols/resnet_v1_101_rfcn_dcn.py", line 633, in get_resnet_v1_conv5 res5a_branch2b = mx.contrib.symbol.DeformableConvolution(name='res5a_branch2b', data=res5a_branch2a_relu, offset=res5a_branch2b_offset, AttributeError: 'module' object has no attribute 'DeformableConvolution'

Error when Train My Own DataSets

Hi@Orpine,
I've read the Deformable ConvNets paper, it's amazing! Now, I have a face dataset to train, so I change the pascal_voc.py and config.py from 21 classes to 2 classes.I run this :
python ./experiments/rfcn/rfcn_end2end_train_test.py --cfg ./experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml

but it errors:

[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304: : [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

[14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal

Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]

terminate called recursively
Segmentation fault(core dumped)

I wanna why this happened, and how to solve this?

The question of import

When I run the R-FCN demo, It reports error like this. Can someone give me some help?

MXNet version

How could I download this specific version MXNet@(commit 62ecb60)? It seems not an available version in MXNet?

Trainning Error

I got the following error when i tried to train voc data.
I use python3.6 and the newest mxnet.

Error in proposal_target.infer_shape: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/mxnet-0.10.0-py3.6.egg/mxnet/operator.py", line 621, in infer_shape_entry
ret = op_prop.infer_shape(shapes)
File "/media/Deformable-ConvNets/rfcn/operator_py/proposal_target.py", line 102, in infer_shape
rois = rpn_rois_shape[0] + gt_boxes_shape[0] if self._batch_rois == -1 else self._batch_rois
IndexError: list index out of range

And I found that in_shape[1] is NONE in the infer_shape function (line 100) of rfcn/operator_py/proposal_target.py and in_shape[0] is [300 5]

Any built-in data augmentations?

Thanks for your great work!

My training data set is pretty small, and I wonder if there are any built-in data augmentations in your code? If so, how to configure it?

Thanks!

could you provide pretrained model in BaiduYun?

I can not download your pretrained model in onedrive.
./model/rfcn_dcn_coco-0000.params
./model/rfcn_coco-0000.params
./model/rcnn_dcn_coco-0000.params
./model/rcnn_coco-0000.params
./model/deeplab_dcn_cityscapes-0000.params
./model/deeplab_cityscapes-0000.params
./model/deform_conv-0000.params
./model/deform_psroi-0000.params

could you provide pretrained model in BaiduYun?
Thanks!

ImportError: cannot import name bbox_overlaps_cython

I feel like this is a stupid question, but when I finished the installation and run python ./rfcn/demo.py

Traceback (most recent call last):
  File "./rfcn/demo.py", line 17, in <module>
    from utils.image import resize, transform
  File "/net/mlfs01/export/users/cyma/codes/Deformable-ConvNets/rfcn/../lib/utils/image.py", line 6, in <module>
    from bbox.bbox_transform import clip_boxes
  File "/net/mlfs01/export/users/cyma/codes/Deformable-ConvNets/rfcn/../lib/bbox/bbox_transform.py", line 6, in <module>
    from bbox import bbox_overlaps_cython
ImportError: cannot import name bbox_overlaps_cython

It's obviously that the python can't import from bbox.pyx file.
Adding the following before from bbox import bbox_overlaps_cython in bbox_transform.py will force it to import from pyx file.

import pyximport
pyximport.install()

from bbox import bbox_overlaps_cython

But I feel like there is something wrong with my setting or installation (no error reported during installation for MXNet).

Has anyone faced the same issue before?

pip list:

Cython (0.25.2)
Django (1.11.1)
easydict (1.6)
image (1.5.5)
mxnet (0.9.5)
numpy (1.13.0rc2)
olefile (0.44)
opencv-python (3.2.0.6)
Pillow (4.1.1)
pip (9.0.1)
pytz (2017.2)
PyYAML (3.12)
setuptools (27.2.0)
wheel (0.29.0)

TypeError: init_params() got an unexpected keyword argument 'allow_extra'

Hi, I got trouble while running the scripts:
python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml

After first epoch, I got:

CNNLogLoss=0.776314, RCNNL1Loss=0.329874,
Epoch[0] Batch [9900] Speed: 4.22 samples/sec Train-RPNAcc=0.942865, RPNLogLoss=0.154231, RPNL1Loss=0.071810, RCNNAcc=0.809554, RCNNLogLoss=0.773895, RCNNL1Loss=0.330007,
Epoch[0] Batch [10000] Speed: 4.22 samples/sec Train-RPNAcc=0.943149, RPNLogLoss=0.153486, RPNL1Loss=0.071483, RCNNAcc=0.809500, RCNNLogLoss=0.771593, RCNNL1Loss=0.329987,
Traceback (most recent call last):
File "experiments/rfcn/rfcn_end2end_train_test.py", line 19, in
train_end2end.main()
File "experiments/rfcn/../../rfcn/train_end2end.py", line 164, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/rfcn/../../rfcn/train_end2end.py", line 157, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "experiments/rfcn/../../rfcn/core/module.py", line 990, in fit
self.set_params(arg_params, aux_params)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/base_module.py", line 651, in set_params
allow_extra=allow_extra)
TypeError: init_params() got an unexpected keyword argument 'allow_extra'

Would you mind give me a hint?
My computer has single GTX 1080Ti.

Question about the implementation of function deformable_im2col() in deformable_im2col.h

How to understand LOG(FATAL) << "not implemented" in the following code ?

template <typename DType>
inline void deformable_im2col(mshadow::Stream<cpu>* s,
  const DType* data_im, const DType* data_offset, 
  const TShape& im_shape, const TShape& col_shape, const TShape& kernel_shape,
  const TShape& pad, const TShape& stride, const TShape& dilation, 
  const uint32_t deformable_group, DType* data_col) {
  if (2 == kernel_shape.ndim()) {
	  LOG(FATAL) << "not implemented";
  } else {
	  LOG(FATAL) << "not implemented";
  }
}

RPNL1Loss=nan

What does this mean? As I know it is not correct.
In Epoch[0] RPNL1Loss = 0.403792. Then always nan

Epoch[0] Batch [300] Speed: 1.15 samples/sec Train-RPNAcc=0.812539, RPNLogLoss=0.570043, RPNL1Loss=nan, RCNNAcc=0.767390, RCNNLogLoss=3.596807, RCNNL1Loss=0.010698,
Epoch[0] Batch [400] Speed: 1.16 samples/sec Train-RPNAcc=0.821910, RPNLogLoss=0.599778, RPNL1Loss=nan, RCNNAcc=0.818267, RCNNLogLoss=3.577709, RCNNL1Loss=0.008031,
Epoch[0] Batch [500] Speed: 1.14 samples/sec Train-RPNAcc=0.828243, RPNLogLoss=0.617132, RPNL1Loss=nan, RCNNAcc=0.848381, RCNNLogLoss=3.396140, RCNNL1Loss=0.006434,
Epoch[0] Batch [600] Speed: 1.15 samples/sec Train-RPNAcc=0.832391, RPNLogLoss=0.628403, RPNL1Loss=nan, RCNNAcc=0.869345, RCNNLogLoss=2.870057, RCNNL1Loss=0.005387,
Epoch[0] Batch [700] Speed: 1.16 samples/sec Train-RPNAcc=0.834384, RPNLogLoss=0.636100, RPNL1Loss=nan, RCNNAcc=0.883437, RCNNLogLoss=2.499374, RCNNL1Loss=0.004622,
Epoch[0] Batch [800] Speed: 1.16 samples/sec Train-RPNAcc=0.836050, RPNLogLoss=0.641726, RPNL1Loss=nan, RCNNAcc=0.894673, RCNNLogLoss=2.215910, RCNNL1Loss=0.004046,
Epoch[0] Batch [900] Speed: 1.14 samples/sec Train-RPNAcc=0.837489, RPNLogLoss=0.645880, RPNL1Loss=nan, RCNNAcc=0.903519, RCNNLogLoss=1.994231, RCNNL1Loss=0.003597,
Epoch[0] Batch [1000] Speed: 1.16 samples/sec Train-RPNAcc=0.838438, RPNLogLoss=0.649027, RPNL1Loss=nan, RCNNAcc=0.910550, RCNNLogLoss=1.816863, RCNNL1Loss=0.003238,
Epoch[0] Batch [1100] Speed: 1.16 samples/sec Train-RPNAcc=0.839179, RPNLogLoss=0.650772, RPNL1Loss=nan, RCNNAcc=0.915993, RCNNLogLoss=1.673953, RCNNL1Loss=0.003987,
Epoch[0] Batch [1200] Speed: 1.15 samples/sec Train-RPNAcc=0.839684, RPNLogLoss=0.650882, RPNL1Loss=nan, RCNNAcc=0.920535, RCNNLogLoss=1.553478, RCNNL1Loss=0.003658,
Epoch[0] Batch [1300] Speed: 1.15 samples/sec Train-RPNAcc=0.840592, RPNLogLoss=0.649718, RPNL1Loss=nan, RCNNAcc=0.924115, RCNNLogLoss=1.452725, RCNNL1Loss=0.003903,
Epoch[0] Batch [1400] Speed: 1.14 samples/sec Train-RPNAcc=0.841459, RPNLogLoss=0.647637, RPNL1Loss=nan, RCNNAcc=0.927468, RCNNLogLoss=1.363495, RCNNL1Loss=0.003740,
Epoch[0] Batch [1500] Speed: 1.16 samples/sec Train-RPNAcc=0.841013, RPNLogLoss=0.645282, RPNL1Loss=nan, RCNNAcc=0.930463, RCNNLogLoss=1.284670, RCNNL1Loss=0.003492,
Epoch[0] Batch [1600] Speed: 1.16 samples/sec Train-RPNAcc=0.841707, RPNLogLoss=0.642184, RPNL1Loss=nan, RCNNAcc=0.932781, RCNNLogLoss=1.216500, RCNNL1Loss=0.003344,
Epoch[0] Batch [1700] Speed: 1.16 samples/sec Train-RPNAcc=0.841856, RPNLogLoss=0.638847, RPNL1Loss=nan, RCNNAcc=0.935341, RCNNLogLoss=1.151333, RCNNL1Loss=0.003149,
Epoch[0] Batch [1800] Speed: 1.16 samples/sec Train-RPNAcc=0.842007, RPNLogLoss=0.635248, RPNL1Loss=nan, RCNNAcc=0.937574, RCNNLogLoss=1.095160, RCNNL1Loss=0.004580,
Epoch[0] Batch [1900] Speed: 1.17 samples/sec Train-RPNAcc=0.842244, RPNLogLoss=0.631594, RPNL1Loss=nan, RCNNAcc=0.939325, RCNNLogLoss=1.043886, RCNNL1Loss=0.004343,
Epoch[0] Batch [2000] Speed: 1.17 samples/sec Train-RPNAcc=0.842616, RPNLogLoss=0.627715, RPNL1Loss=nan, RCNNAcc=0.941225, RCNNLogLoss=0.996324, RCNNL1Loss=0.004129,
Epoch[0] Batch [2100] Speed: 1.18 samples/sec Train-RPNAcc=0.843182, RPNLogLoss=0.623727, RPNL1Loss=nan, RCNNAcc=0.942862, RCNNLogLoss=0.953685, RCNNL1Loss=0.003934,
Epoch[0] Batch [2200] Speed: 1.18 samples/sec Train-RPNAcc=0.843663, RPNLogLoss=0.619788, RPNL1Loss=nan, RCNNAcc=0.944095, RCNNLogLoss=0.915521, RCNNL1Loss=0.003757,
Epoch[0] Batch [2300] Speed: 1.17 samples/sec Train-RPNAcc=0.844234, RPNLogLoss=0.615760, RPNL1Loss=nan, RCNNAcc=0.945496, RCNNLogLoss=0.879742, RCNNL1Loss=0.003595,
Epoch[0] Batch [2400] Speed: 1.18 samples/sec Train-RPNAcc=0.844505, RPNLogLoss=0.611821, RPNL1Loss=nan, RCNNAcc=0.947011, RCNNLogLoss=0.846207, RCNNL1Loss=0.003446,
Epoch[0] Batch [2500] Speed: 1.16 samples/sec Train-RPNAcc=0.844367, RPNLogLoss=0.608176, RPNL1Loss=nan, RCNNAcc=0.947858, RCNNLogLoss=0.819818, RCNNL1Loss=0.004606,
Epoch[0] Batch [2600] Speed: 1.18 samples/sec Train-RPNAcc=0.844443, RPNLogLoss=0.604457, RPNL1Loss=nan, RCNNAcc=0.948941, RCNNLogLoss=0.791787, RCNNL1Loss=0.004434,

The training was suspended in the middle but did not end, the GPU-Util become 0%.

Thank you for sharing the wonderful work, it is really help.
I encounter problems when training.
The training suspend in epoch 0 batch 3300 (or others) and gpu-util become 0%, why?

install python exits problem

When I go to this step:
cd python
sudo python setup.py install
there has a TypeErrror:
File "setup.py", line 83, in <module> **kwargs) File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 73, in run self.do_egg_install() File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 101, in do_egg_install cmd.run() File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 380, in run self.easy_install(spec, not self.no_deps) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 604, in easy_install return self.install_item(None, spec, tmpdir, deps, True) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 655, in install_item self.process_distribution(spec, dist, deps) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 701, in process_distribution distreq.project_name, distreq.specs, requirement.extras TypeError: __init__() takes exactly 2 arguments (4 given)
please tell me how to solve it. Thanks!

Can I use cuda7.5 rather than cuda8.0

Does anybody try cuda7.5? Because the GPU driver of our server can't allow me to install cuda8.0.

How to fine tune rfcn and faster_rcnn

if i want to fine tune rfcn and faster_rcnn with my own pretrained model, with another a little datasets

Pretrained model for Deformable Faster R-CNN on MSCOCO

Hi,

Thank you again for sharing this amazing repository.

I noticed that, although the training scripts are provided for Deformable Faster R-CNN, the pre-trained models for Faster R-CNN and Deformable Faster R-CNN are missing. I can also see that you already have the results of Deformable Faster R-CNN on MS COCO.

Could you please kindly share the pre-trained model for Deformable Faster R-CNN (2fc), ResNet-v1-101 with us?

Thank you!

questions about "kernel_dim_ = conv_in_channels_ / group_ * param_.kernel.Size();" in rfcn/operator_cxx/deformable_convolution-inl.h

in caffe,
//kernel_dim_ = C * H * W
kernel_dim_ = this->blobs_[0]->count(1);
weight_offset_ = conv_out_channels_ * kernel_dim_ / group_;
but here,
kernel_dim_ = conv_in_channels_ / group_ * param_.kernel.Size();
weight_offset_ = conv_out_channels_ * kernel_dim_ / group_;

why not kernel_dim_ = conv_in_channels_ * param_.kernel.Size() , is it needed to be devided by group_?

Got error when running demo. (Operator _zeros cannot be run)

Thanks for the great work.

I followed the installation steps but got error when running 'python ./rfcn/demo.py'

Environment

Ubuntu16.04, GCC 5.4
GTX1060
MXNet Installation validated successfully.

Error:

[15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Full:

(mxnet) xx@xx-xx:~/PycharmProjects/Deformable-ConvNets-master$ python ./rfcn/demo.py 
{'CLASS_AGNOSTIC': True,
 'MXNET_VERSION': 'mxnet',
 'SCALES': [(600, 1000)],
 'TEST': {'BATCH_IMAGES': 1,
          'CXX_PROPOSAL': False,
          'HAS_RPN': True,
          'NMS': 0.3,
          'PROPOSAL_MIN_SIZE': 0,
          'PROPOSAL_NMS_THRESH': 0.7,
          'PROPOSAL_POST_NMS_TOP_N': 2000,
          'PROPOSAL_PRE_NMS_TOP_N': 20000,
          'RPN_MIN_SIZE': 0,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'max_per_image': 100,
          'test_epoch': 8},
 'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
                         'RPN_BATCH_IMAGES': 0,
                         'rfcn1_epoch': 0,
                         'rfcn1_lr': 0,
                         'rfcn1_lr_step': '',
                         'rfcn2_epoch': 0,
                         'rfcn2_lr': 0,
                         'rfcn2_lr_step': '',
                         'rpn1_epoch': 0,
                         'rpn1_lr': 0,
                         'rpn1_lr_step': '',
                         'rpn2_epoch': 0,
                         'rpn2_lr': 0,
                         'rpn2_lr_step': '',
                         'rpn3_epoch': 0,
                         'rpn3_lr': 0,
                         'rpn3_lr_step': ''},
           'ASPECT_GROUPING': True,
           'BATCH_IMAGES': 1,
           'BATCH_ROIS': -1,
           'BATCH_ROIS_OHEM': 128,
           'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZATION_PRECOMPUTED': True,
           'BBOX_REGRESSION_THRESH': 0.5,
           'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_WEIGHTS': array([ 1.,  1.,  1.,  1.]),
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'CXX_PROPOSAL': False,
           'ENABLE_OHEM': True,
           'END2END': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'FLIP': True,
           'RESUME': True,
           'RPN_BATCH_SIZE': 256,
           'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 0,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 300,
           'RPN_PRE_NMS_TOP_N': 6000,
           'SHUFFLE': True,
           'begin_epoch': 5,
           'end_epoch': 8,
           'lr': 0.0005,
           'lr_factor': 0.1,
           'lr_step': '5.333',
           'model_prefix': 'e2e',
           'momentum': 0.9,
           'warmup': False,
           'warmup_lr': 5e-05,
           'warmup_step': 1000,
           'wd': 0.0005},
 'dataset': {'NUM_CLASSES': 81,
             'dataset': 'coco',
             'dataset_path': './data/coco',
             'image_set': 'train2014+val2014',
             'proposal': 'rpn',
             'root_path': './data',
             'test_image_set': 'test-dev2015'},
 'default': {'frequent': 20, 'kvstore': 'device'},
 'gpus': '0',
 'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
             'ANCHOR_SCALES': [4, 8, 16, 32],
             'FIXED_PARAMS': ['conv1',
                              'bn_conv1',
                              'res2',
                              'bn2',
                              'gamma',
                              'beta'],
             'FIXED_PARAMS_SHARED': ['conv1',
                                     'bn_conv1',
                                     'res2',
                                     'bn2',
                                     'res3',
                                     'bn3',
                                     'res4',
                                     'bn4',
                                     'gamma',
                                     'beta'],
             'IMAGE_STRIDE': 0,
             'NUM_ANCHORS': 12,
             'PIXEL_MEANS': array([ 103.06,  115.9 ,  123.15]),
             'RCNN_FEAT_STRIDE': 16,
             'RPN_FEAT_STRIDE': 16,
             'pretrained': './model/pretrained_model/resnet_v1_101',
             'pretrained_epoch': 0},
 'output_path': './output/rfcn',
 'symbol': 'resnet_v1_101_rfcn'}
[15:00:14] /home/zehao/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd8c20081bc]
[bt] (1) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x8c9) [0x7fd8c2e09c39]
[bt] (2) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fd8b930c57c]
[bt] (3) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7fd8b930bcd5]
[bt] (4) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7fd8b9303376]
[bt] (5) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(+0x9db3) [0x7fd8b92fadb3]
[bt] (6) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7fd8ca996e93]
[bt] (7) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x715d) [0x7fd8caa4980d]
[bt] (8) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7fd8caa4bc3e]
[bt] (9) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47) [0x7fd8caa4b1f7]

Traceback (most recent call last):
  File "./rfcn/demo.py", line 129, in <module>
    main()
  File "./rfcn/demo.py", line 89, in main
    arg_params=arg_params, aux_params=aux_params)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/tester.py", line 29, in __init__
    self._mod.bind(provide_data, provide_label, for_training=False)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/module.py", line 839, in bind
    for_training, inputs_need_grad, force_rebind=False, shared_module=None)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/module.py", line 396, in bind
    state_names=self._state_names)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 186, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 272, in bind_exec
    shared_group))
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 545, in _bind_ith_exec
    context, self.logger)
  File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 523, in _get_or_reshape
    arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
  File "/home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/ndarray.py", line 980, in zeros
    return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype)
  File "<string>", line 36, in _zeros
  File "/home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/base.py", line 84, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd8c20081bc]
[bt] (1) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x8c9) [0x7fd8c2e09c39]
[bt] (2) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fd8b930c57c]
[bt] (3) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7fd8b930bcd5]
[bt] (4) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7fd8b9303376]
[bt] (5) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(+0x9db3) [0x7fd8b92fadb3]
[bt] (6) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7fd8ca996e93]
[bt] (7) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x715d) [0x7fd8caa4980d]
[bt] (8) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7fd8caa4bc3e]
[bt] (9) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47) [0x7fd8caa4b1f7]

solved

NameError: name 'resnet_v2_101_rfcn' is not defined

I want to change the symbol 'resnet_v1_101_rfcn' to another symbol 'resnet_v2_101_rfcn', so I just copy this file and change the name to resnet_v2_101_rfcn, besides, I also change the class name to resnet_v2_101_rfcn. Then, I change the name and symbol's name of the corresponding .yaml file and run python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v2_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml, the error occur: NameError: name 'resnet_v2_101_rfcn' is not defined
What should I do?

Deeplab implementation is different

@orpine I just noticed that in your deeplab implementation after the res5c layer there is no ASPP or atrous convolution . In actual deeplab implementation the shared conv layer(res5c) is followed by 4 atrous convolution layers with varying dilation parameters (namely 4,8,16,24)....Is there any specific reason you chose to instead follow it with fc6 and score layer instead?

conv_feat = self.get_resnet_conv(data)
fc6_bias = mx.symbol.Variable('fc6_bias', lr_mult=2.0)
fc6_weight = mx.symbol.Variable('fc6_weight', lr_mult=1.0)
fc6 = mx.symbol.Convolution(data=conv_feat, kernel=(1, 1), pad=(0, 0), num_filter=1024, name="fc6", bias=fc6_bias, weight=fc6_weight,workspace=self.workspace)
relu_fc6 = mx.sym.Activation(data=fc6, act_type='relu', name='relu_fc6')
score_bias = mx.symbol.Variable('score_bias', lr_mult=2.0)
score_weight = mx.symbol.Variable('score_weight', lr_mult=1.0)
score = mx.symbol.Convolution(data=relu_fc6, kernel=(1, 1), pad=(0, 0), num_filter=num_classes, name="score", bias=score_bias,weight=score_weight, workspace=self.workspace)

where is deformable_psroi_pooling and deformable_convolution being called

I am new to MXNet, I looked into the deformable convolution and pooling implementation in from /rfcn/operator_cxx, but where are these being integrated and called in the MXNet?

cannot run demo

Hello,

I would like to test your demo, but got some error.

My environment: Ubuntu 14.04, Tesla K80, CUDA8.0

I installed the MXNet with the checkout 62ecb60, and copy your additional operators to $(YOUR_MXNET_FOLDER)/src/operator/contrib. I successfully compiled MXNet. After this, I start testing your code. However, when I ran the demo python ./rfcn/demo.py, I got the following error.

kelin@vision-kevin-gpu-exp:~/code/Deformable-ConvNets$ python ./rfcn/demo.py 
libdc1394 error: Failed to initialize libdc1394
{'CLASS_AGNOSTIC': True,
 'MXNET_VERSION': 'mxnet',
 'SCALES': [(600, 1000)],
 'TEST': {'BATCH_IMAGES': 1,
          'CXX_PROPOSAL': False,
          'HAS_RPN': True,
          'NMS': 0.3,
          'PROPOSAL_MIN_SIZE': 0,
          'PROPOSAL_NMS_THRESH': 0.7,
          'PROPOSAL_POST_NMS_TOP_N': 2000,
          'PROPOSAL_PRE_NMS_TOP_N': 20000,
          'RPN_MIN_SIZE': 0,
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'max_per_image': 100,
          'test_epoch': 8},
 'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
                         'RPN_BATCH_IMAGES': 0,
                         'rfcn1_epoch': 0,
                         'rfcn1_lr': 0,
                         'rfcn1_lr_step': '',
                         'rfcn2_epoch': 0,
                         'rfcn2_lr': 0,
                         'rfcn2_lr_step': '',
                         'rpn1_epoch': 0,
                         'rpn1_lr': 0,
                         'rpn1_lr_step': '',
                         'rpn2_epoch': 0,
                         'rpn2_lr': 0,
                         'rpn2_lr_step': '',
                         'rpn3_epoch': 0,
                         'rpn3_lr': 0,
                         'rpn3_lr_step': ''},
           'ASPECT_GROUPING': True,
           'BATCH_IMAGES': 1,
           'BATCH_ROIS': -1,
           'BATCH_ROIS_OHEM': 128,
           'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZATION_PRECOMPUTED': True,
           'BBOX_REGRESSION_THRESH': 0.5,
           'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_WEIGHTS': array([ 1.,  1.,  1.,  1.]),
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'CXX_PROPOSAL': False,
           'ENABLE_OHEM': True,
           'END2END': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'FLIP': True,
           'RESUME': True,
           'RPN_BATCH_SIZE': 256,
           'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_MIN_SIZE': 0,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 300,
           'RPN_PRE_NMS_TOP_N': 6000,
           'SHUFFLE': True,
           'begin_epoch': 5,
           'end_epoch': 8,
           'lr': 0.0005,
           'lr_factor': 0.1,
           'lr_step': '5.333',
           'model_prefix': 'e2e',
           'momentum': 0.9,
           'warmup': False,
           'warmup_lr': 5e-05,
           'warmup_step': 1000,
           'wd': 0.0005},
 'dataset': {'NUM_CLASSES': 81,
             'dataset': 'coco',
             'dataset_path': './data/coco',
             'image_set': 'train2014+val2014',
             'proposal': 'rpn',
             'root_path': './data',
             'test_image_set': 'test-dev2015'},
 'default': {'frequent': 20, 'kvstore': 'device'},
 'gpus': '0',
 'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
             'ANCHOR_SCALES': [4, 8, 16, 32],
             'FIXED_PARAMS': ['conv1',
                              'bn_conv1',
                              'res2',
                              'bn2',
                              'gamma',
                              'beta'],
             'FIXED_PARAMS_SHARED': ['conv1',
                                     'bn_conv1',
                                     'res2',
                                     'bn2',
                                     'res3',
                                     'bn3',
                                     'res4',
                                     'bn4',
                                     'gamma',
                                     'beta'],
             'IMAGE_STRIDE': 0,
             'NUM_ANCHORS': 12,
             'PIXEL_MEANS': array([ 103.06,  115.9 ,  123.15]),
             'RCNN_FEAT_STRIDE': 16,
             'RPN_FEAT_STRIDE': 16,
             'pretrained': './model/pretrained_model/resnet_v1_101',
             'pretrained_epoch': 0},
 'output_path': './output/rfcn',
 'symbol': 'resnet_v1_101_rfcn'}
[05:41:04] /home/kelin/code/origin_mxnet/mxnet/dmlc-core/include/dmlc/logging.h:300: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]

[05:41:04] /home/kelin/code/origin_mxnet/mxnet/dmlc-core/include/dmlc/logging.h:300: [05:41:04] src/engine/./threaded_engine.h:329: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x371) [0x7f096f795e71]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]

terminate called after throwing an instance of 'dmlc::Error'
  what():  [05:41:04] src/engine/./threaded_engine.h:329: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 6 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x371) [0x7f096f795e71]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]

Aborted (core dumped)

If I understand the code correctly, we can simply ignore Failed to initialize libdc1394. The main problem should be

mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Since I am able to run MXNet's demo (such as image classification on CIFAR10), my hardware setting should be okay now. I have no idea about this error. Can you please help me about this? Thanks!

Question about deformable_convolution implementation

Here, input_offset_dim_ was defined as the product of ishape from 1 to offset_shape.ndim(). I was thinking whether it should be offset_shape's product, since it is used to address offset here?

How to get the sampling points

I note that you have plotted the sampling red locations for different activation units, which are very helpful for understanding deformable convents, but I wonder how to get the sampling points? How can I find the location corresponds to deformable filter?

Error when testing on the Cityscapes dataset

I tried to test my trained model on the Cityscapes dataset via following command:

python experiments/deeplab/deeplab_test.py --cfg experiments/deeplab/cfgs/deeplab_resnet_v1_101_cityscapes_segmentation_dcn.yaml

However, it gave me this error:

Traceback (most recent call last):
File "experiments/deeplab/deeplab_test.py", line 20, in
test.main()
File "experiments/deeplab/../../deeplab/test.py", line 99, in main
test_deeplab()
File "experiments/deeplab/../../deeplab/test.py", line 95, in test_deeplab
pred_eval(predictor, test_data, imdb, vis=args.vis, ignore_cache=args.ignore_cache, logger=logger)
File "experiments/deeplab/../../deeplab/core/tester.py", line 102, in pred_eval
evaluation_results = imdb.evaluate_segmentations(all_segmentation_result)
File "experiments/deeplab/../../deeplab/../lib/dataset/cityscape.py", line 182, in evaluate_segmentations
info = self._py_evaluate_segmentation()
File "experiments/deeplab/../../deeplab/../lib/dataset/cityscape.py", line 241, in _py_evaluate_segmentation
seg_pred = np.array(Image.open(res_save_path)).astype('float32')
File "/home/haowang/software/miniconda2/lib/python2.7/site-packages/PIL/Image.py", line 2410, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: './output/cityscape/deeplab_resnet_v1_101_cityscapes_segmentation_dcn/leftImg8bit_val/results/frankfurt/frankfurt_000001_059642.png'

I guess there is a bug in ./lib/dataset/cityscape.py, line 179, where

if not pred_segmentations:
self.write_segmentation_result(pred_segmentations)

should be

if pred_segmentations:
self.write_segmentation_result(pred_segmentations)

TypeError: _update_params_on_kvstore()

Hi,
I got trouble while running the scripts:
python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml

At the first epoch, I got this error:
Traceback (most recent call last):
File "experiments/rfcn/rfcn_end2end_train_test.py", line 19, in
train_end2end.main()
File "experiments/rfcn/../../rfcn/train_end2end.py", line 164, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/rfcn/../../rfcn/train_end2end.py", line 157, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "experiments/rfcn/../../rfcn/core/module.py", line 969, in fit
self.update()
File "experiments/rfcn/../../rfcn/core/module.py", line 1051, in update
self._curr_module.update()
File "experiments/rfcn/../../rfcn/core/module.py", line 572, in update
self._kvstore)
TypeError: _update_params_on_kvstore() takes exactly 4 arguments (3 given)

I guessed that error may be caused by wrong python and mxnet version, so I removed the version existed in my computer and re-install by this way:

cd $(DCN_ROOT)/
git clone --recursive https://github.com/dmlc/mxnet.git
cd mxnet/
git checkout 62ecb60
git submodule update
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/
usr/local/cuda USE_CUDNN=1

cd python
sudo python setup.py install

I also have checked location of python and mxnet as following:
which python
/usr/bin/python
python
import mxnet
mxnet.__ path __
['/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet']

With these configurations, the error still present.

Would you please give me some advice to come over this issue?
I am very appreciated your concern.

How 'num_deformable_group' works?

In defom_conv_demo.py, num_deformable_group is set to 1, but in resnet_v1_101_rfcn_dcn.py, num_deformable_group is set to 4, i know how deformable convolution works when num_deformable_group=1, but how it works when num_deformable_group=4? Is it the same as group convolution in regular convolution layer? Thanks!

MXNetError when I tried to run python ./rfcn/demo.py

zhangboshen@smart-gpu-server1:~/src/mxnet/Deformable-ConvNets$ python ./rfcn/demo.py
{'CLASS_AGNOSTIC': True,
'MXNET_VERSION': 'mxnet',
'SCALES': [(600, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'CXX_PROPOSAL': False,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': 0,
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': 0,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'max_per_image': 100,
'test_epoch': 8},
'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
'RPN_BATCH_IMAGES': 0,
'rfcn1_epoch': 0,
'rfcn1_lr': 0,
'rfcn1_lr_step': '',
'rfcn2_epoch': 0,
'rfcn2_lr': 0,
'rfcn2_lr_step': '',
'rpn1_epoch': 0,
'rpn1_lr': 0,
'rpn1_lr_step': '',
'rpn2_epoch': 0,
'rpn2_lr': 0,
'rpn2_lr_step': '',
'rpn3_epoch': 0,
'rpn3_lr': 0,
'rpn3_lr_step': ''},
'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': -1,
'BATCH_ROIS_OHEM': 128,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': True,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'CXX_PROPOSAL': False,
'ENABLE_OHEM': True,
'END2END': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'FLIP': True,
'RESUME': True,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 0,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'SHUFFLE': True,
'begin_epoch': 5,
'end_epoch': 8,
'lr': 0.0005,
'lr_factor': 0.1,
'lr_step': '5.333',
'model_prefix': 'e2e',
'momentum': 0.9,
'warmup': False,
'warmup_lr': 5e-05,
'warmup_step': 1000,
'wd': 0.0005},
'dataset': {'NUM_CLASSES': 81,
'dataset': 'coco',
'dataset_path': './data/coco',
'image_set': 'train2014+val2014',
'proposal': 'rpn',
'root_path': './data',
'test_image_set': 'test-dev2015'},
'default': {'frequent': 20, 'kvstore': 'device'},
'gpus': '0',
'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [4, 8, 16, 32],
'FIXED_PARAMS': ['conv1',
'bn_conv1',
'res2',
'bn2',
'gamma',
'beta'],
'FIXED_PARAMS_SHARED': ['conv1',
'bn_conv1',
'res2',
'bn2',
'res3',
'bn3',
'res4',
'bn4',
'gamma',
'beta'],
'IMAGE_STRIDE': 0,
'NUM_ANCHORS': 12,
'PIXEL_MEANS': array([ 103.06, 115.9 , 123.15]),
'RCNN_FEAT_STRIDE': 16,
'RPN_FEAT_STRIDE': 16,
'pretrained': './model/pretrained_model/resnet_v1_101',
'pretrained_epoch': 0},
'output_path': './output/rfcn',
'symbol': 'resnet_v1_101_rfcn'}
[16:21:10] /home/zhangboshen/mxnet/dmlc-core/include/dmlc/logging.h:304: [16:21:10] src/c_api/c_api_ndarray.cc:385: Operator _zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0e1ff9981c]
[bt] (1) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN4nnvm9NodeAttrsEiPPvPiPS4+0xaca) [0x7f0e209c35da]
[bt] (2) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x142) [0x7f0e209c3d52]
[bt] (3) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f0e10a9531c]
[bt] (4) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7f0e10a94a75]
[bt] (5) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7f0e10a8c126]
[bt] (6) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9ce3) [0x7f0e10a83ce3]
[bt] (7) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7f0e2e401dc3]
[bt] (8) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6a67) [0x7f0e2e4b36c7]
[bt] (9) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7f0e2e4b61ce]

Traceback (most recent call last):
File "./rfcn/demo.py", line 130, in
main()
File "./rfcn/demo.py", line 90, in main
arg_params=arg_params, aux_params=aux_params)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/tester.py", line 29, in init
self._mod.bind(provide_data, provide_label, for_training=False)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/module.py", line 839, in bind
for_training, inputs_need_grad, force_rebind=False, shared_module=None)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/module.py", line 396, in bind
state_names=self._state_names)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 186, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 272, in bind_exec
shared_group))
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 545, in _bind_ith_exec
context, self.logger)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 523, in _get_or_reshape
arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/ndarray.py", line 1028, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
File "", line 15, in _zeros
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 73, in _imperative_invoke
c_array(ctypes.c_char_p, [c_str(str(val)) for val in vals])))
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/base.py", line 85, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
_mxnet.base.MXNetError: [16:21:10] src/c_api/c_api_ndarray.cc:385: Operator zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered

Active Convolution another CVPR 2017 paper which share the same idea

bugs of RFCN's and Faster RCNN's callback function???

It is different between mx.nd.repeat() and mx.nd.tile(). So does it should be mx.nd.tile() instead of mx.nd.repeat ?

How to initialized the deformable conv layer?

Is it initialized from zero or pretrained model?

Get the error: rpn data not found at ./data/cache/rpn_data/voc_2007_trainval_rpn.pkl

When i train faster-rcnn, i get the error:

rpn data not found at ./data/cache/rpn_data/voc_2007_trainval_rpn.pkl

Where can i get the voc_2007_trainval_rpn.pkl?

train error

Thank you for sharing the wonderful work, it is really help.
I encounter problems when training.
AttributeError: 'module' object has no attribute 'DeformableConvolution',why?

I want to train deeplab-dcn version, How do I make image_set list?

I try to train deeplab-dcn version. I'm new to mxnet.

In caffe, I made the dataset list into a text file and wrote the path in prototxt.
The dataset list was made in a "image gt_image \n" .
In mxnet, How do I make image_set path and list?
Is it a other format than a text file ?
What was modified from original mxnet ?
Was it just added deformable_convolution layer ?

can the original implementation based on caffe you methtioned be shared?

no error with latest mxnet 0.10.1 during training

running time

With an image size of 800x1200, I get around 7.5 samples per second when training on 8 P6000 GPUs. For r-fcn, I used to get 16 samples per second using a caffe implementation for the same image size. Is this speed in line with your observations, or something is wrong with my runtime environment?

Trainng Error

When I train DCN model on the pascalvoc2012 dataset,I encountered such problems.can anyone please afford me a help to explain where the error come from and how to eliminate it?
libpng error: Read Error
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "experiments/deeplab/../../deeplab/../lib/utils/PrefetchingIter.py", line 60, in prefetch_func
self.next_batch[i] = self.iters[i].next()
File "experiments/deeplab/../../deeplab/core/loader.py", line 185, in next
self.get_batch_parallel()
File "experiments/deeplab/../../deeplab/core/loader.py", line 234, in get_batch_parallel
rst = [multiprocess_result.get() for multiprocess_result in multiprocess_results]
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
ValueError: zero-size array to reduction operation minimum which has no identity

How to get features from ROIs?

Hi,

Thank you for sharing this amazing repo. I would like to use your proposed method to get the features (maps or vectors) from ROIs obtained from RPN.

Is there any way to do that? If so, can you please kindly guild me through it? For example, pruning the network or directly get activations in the middle of the network.

Thank you!

The Result for Faster RCNN

Thanks for sharing your wonderful job. I noticed that you only submit the train&test scripts for R-FCN. But In your paper, you also conduct other experiments using Faster RCNN. Would you mind sharing the result for Faster RCNN detector in this MXNET framework?

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Hi, thank you for releasing the code.
I have set the requirements according to your suggestions. However, I met this problem when I run the code. Could you give me a help?

The channel number of DeformableConvolution layer and corresponding offset layer?

In the provided symbols file resnet_v1_101_rfcn_dcn.py, the channels (filter number) of deformation convolution layer is 512 and the filter number of corresponding offset layer is 72, how to set these two numbers? For example, if I want to construct a deformation network for cifar10, the filter number of deformation convolution layer should be smaller than 512, should I choose 256 or 128?

RFCN-DCN with Soft-NMS by bharatsingh430

Hi all,
you mentioned the third party implementation of bharatsingh430 in your README.
I cannot ask my question there (disabled?), therefore I try it here.
I think the model of bharatsingh430 is not compatible with your demo. When I replace the model with 'rfcn_dcn_coco-0008.params' downloaded from here, I get the following error:

Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (60,) to.shape=(48,)

Any ideas?

Segmentation Fault during Deformable faster r-cnn training

Hi,

.....................................................error log ................................................................................
'pretrained_epoch': 0},
'output_path': './output/rcnn/imagenet_vid',
'symbol': 'resnet_v1_101_rcnn_dcn'}
num_images 53639
ImageNetVID_DET_train_30classes gt roidb loaded from ./data/cache/ImageNetVID_DET_train_30classes_gt_roidb.pkl
append flipped images to roidb
num_images 57834
ImageNetVID_VID_train_15frames gt roidb loaded from ./data/cache/ImageNetVID_VID_train_15frames_gt_roidb.pkl
append flipped images to roidb
filtered 3316 roidb entries: 222946 -> 219630
Segmentation fault (core dumped)

......................................................error log..........................................................................................

When I am trying to run Deformable Fastecr R-CNN for traing. It always shows Segmentation Fault no matter when I change VOC or coco. I have try on two server with 8 GPU. Shows the same fault. Could you please give a hint what the problem may be?