msracver / deformable-convnets Goto Github PK
View Code? Open in Web Editor NEWDeformable Convolutional Networks
License: MIT License
Deformable Convolutional Networks
License: MIT License
In deform_psroi_demo.py, I have two questions about the details:
1.Why the offset of bounding box(red box) comes from output of rfcn_cls_offset layer rather than the output of rfcn_bbox_offset layer? I think the latter is more related to sub bbox location, or because the rfcn_cls_offset is related to foreground object?
2.Why set the value of trans_std to 0.1 in function show_dpsroi_offset? Thank you!!
Hi, in the deformable convolution, the offset is from the output data of the configured convolution layer. I am curious why you process it in that way rather than add some parameters likely adding weight parameter to deforableconv?
In faster_rcnn/cfgs/resnet_v1_101_v712_rcnn_end2end.yaml, I see the two options are set as false, but I think it does support class-agnostic and ohem. So I set those two options as true and conducted the training process, but the detection result are very poor, that to say only a few object are detected.
System configuration:Ubuntu14.04,cuda8,cudnn5
Error info when compiling mxnet:
/usr/include/c++/4.8/bits/stl_vector.h:919:7: note: no known conversion for argument 1 from ‘nnvm::dim_t* {aka long int*}’ to ‘unsigned int*&&’
make: *** [build/src/operator/custom/custom.o] Error 1
Does anyone know how to fix it?
Traceback (most recent call last): File "./rfcn/demo.py", line 129, in <module> main() File "./rfcn/demo.py", line 50, in main sym = sym_instance.get_symbol(config, is_train=False) File "/mnt/Deformable-ConvNets/rfcn/symbols/resnet_v1_101_rfcn_dcn.py", line 725, in get_symbol relu1 = self.get_resnet_v1_conv5(conv_feat) File "/mnt/Deformable-ConvNets/rfcn/symbols/resnet_v1_101_rfcn_dcn.py", line 633, in get_resnet_v1_conv5 res5a_branch2b = mx.contrib.symbol.DeformableConvolution(name='res5a_branch2b', data=res5a_branch2a_relu, offset=res5a_branch2b_offset, AttributeError: 'module' object has no attribute 'DeformableConvolution'
Hi@Orpine,
I've read the Deformable ConvNets paper, it's amazing! Now, I have a face dataset to train, so I change the pascal_voc.py and config.py from 21 classes to 2 classes.I run this :
python ./experiments/rfcn/rfcn_end2end_train_test.py --cfg ./experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml
but it errors:
[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304[14:53:44] /mnt/data1/daniel/mxnet0/dmlc-core/include/dmlc/./logging.h:304: : [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal
Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]
[14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal
Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]
terminate called after throwing an instance of 'dmlc::Error'
what(): [14:53:44] /mnt/data1/daniel/mxnet0/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess CUDA: invalid device ordinal
Stack trace returned 6 entries:
[bt] (0) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x29) [0x7fe9694c9ac9]
[bt] (1) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN7mshadow9SetDeviceINS_3gpuEEEvi+0xb8) [0x7fe96a166c18]
[bt] (2) /mnt/data1/daniel/Python-2.7.13/build/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x20) [0x7fe96a169460]
[bt] (3) /lib64/libstdc++.so.6(+0xb5220) [0x7fe98ce1d220]
[bt] (4) /lib64/libpthread.so.0(+0x7dc5) [0x7fe995a29dc5]
[bt] (5) /lib64/libc.so.6(clone+0x6d) [0x7fe99504f73d]
terminate called recursively
Segmentation fault(core dumped)
I wanna why this happened, and how to solve this?
How could I download this specific version MXNet@(commit 62ecb60)? It seems not an available version in MXNet?
I got the following error when i tried to train voc data.
I use python3.6 and the newest mxnet.
Error in proposal_target.infer_shape: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/mxnet-0.10.0-py3.6.egg/mxnet/operator.py", line 621, in infer_shape_entry
ret = op_prop.infer_shape(shapes)
File "/media/Deformable-ConvNets/rfcn/operator_py/proposal_target.py", line 102, in infer_shape
rois = rpn_rois_shape[0] + gt_boxes_shape[0] if self._batch_rois == -1 else self._batch_rois
IndexError: list index out of range
And I found that in_shape[1] is NONE in the infer_shape function (line 100) of rfcn/operator_py/proposal_target.py and in_shape[0] is [300 5]
Thanks for your great work!
My training data set is pretty small, and I wonder if there are any built-in data augmentations in your code? If so, how to configure it?
Thanks!
I can not download your pretrained model in onedrive.
./model/rfcn_dcn_coco-0000.params
./model/rfcn_coco-0000.params
./model/rcnn_dcn_coco-0000.params
./model/rcnn_coco-0000.params
./model/deeplab_dcn_cityscapes-0000.params
./model/deeplab_cityscapes-0000.params
./model/deform_conv-0000.params
./model/deform_psroi-0000.params
could you provide pretrained model in BaiduYun?
Thanks!
I feel like this is a stupid question, but when I finished the installation and run python ./rfcn/demo.py
Traceback (most recent call last):
File "./rfcn/demo.py", line 17, in <module>
from utils.image import resize, transform
File "/net/mlfs01/export/users/cyma/codes/Deformable-ConvNets/rfcn/../lib/utils/image.py", line 6, in <module>
from bbox.bbox_transform import clip_boxes
File "/net/mlfs01/export/users/cyma/codes/Deformable-ConvNets/rfcn/../lib/bbox/bbox_transform.py", line 6, in <module>
from bbox import bbox_overlaps_cython
ImportError: cannot import name bbox_overlaps_cython
It's obviously that the python can't import from bbox.pyx file.
Adding the following before from bbox import bbox_overlaps_cython
in bbox_transform.py will force it to import from pyx file.
import pyximport
pyximport.install()
from bbox import bbox_overlaps_cython
But I feel like there is something wrong with my setting or installation (no error reported during installation for MXNet).
Has anyone faced the same issue before?
Cython (0.25.2)
Django (1.11.1)
easydict (1.6)
image (1.5.5)
mxnet (0.9.5)
numpy (1.13.0rc2)
olefile (0.44)
opencv-python (3.2.0.6)
Pillow (4.1.1)
pip (9.0.1)
pytz (2017.2)
PyYAML (3.12)
setuptools (27.2.0)
wheel (0.29.0)
Hi, I got trouble while running the scripts:
python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml
After first epoch, I got:
CNNLogLoss=0.776314, RCNNL1Loss=0.329874,
Epoch[0] Batch [9900] Speed: 4.22 samples/sec Train-RPNAcc=0.942865, RPNLogLoss=0.154231, RPNL1Loss=0.071810, RCNNAcc=0.809554, RCNNLogLoss=0.773895, RCNNL1Loss=0.330007,
Epoch[0] Batch [10000] Speed: 4.22 samples/sec Train-RPNAcc=0.943149, RPNLogLoss=0.153486, RPNL1Loss=0.071483, RCNNAcc=0.809500, RCNNLogLoss=0.771593, RCNNL1Loss=0.329987,
Traceback (most recent call last):
File "experiments/rfcn/rfcn_end2end_train_test.py", line 19, in
train_end2end.main()
File "experiments/rfcn/../../rfcn/train_end2end.py", line 164, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/rfcn/../../rfcn/train_end2end.py", line 157, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "experiments/rfcn/../../rfcn/core/module.py", line 990, in fit
self.set_params(arg_params, aux_params)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet/module/base_module.py", line 651, in set_params
allow_extra=allow_extra)
TypeError: init_params() got an unexpected keyword argument 'allow_extra'
Would you mind give me a hint?
My computer has single GTX 1080Ti.
How to understand LOG(FATAL) << "not implemented" in the following code ?
template <typename DType>
inline void deformable_im2col(mshadow::Stream<cpu>* s,
const DType* data_im, const DType* data_offset,
const TShape& im_shape, const TShape& col_shape, const TShape& kernel_shape,
const TShape& pad, const TShape& stride, const TShape& dilation,
const uint32_t deformable_group, DType* data_col) {
if (2 == kernel_shape.ndim()) {
LOG(FATAL) << "not implemented";
} else {
LOG(FATAL) << "not implemented";
}
}
What does this mean? As I know it is not correct.
In Epoch[0] RPNL1Loss = 0.403792. Then always nan
Epoch[0] Batch [300] Speed: 1.15 samples/sec Train-RPNAcc=0.812539, RPNLogLoss=0.570043, RPNL1Loss=nan, RCNNAcc=0.767390, RCNNLogLoss=3.596807, RCNNL1Loss=0.010698,
Epoch[0] Batch [400] Speed: 1.16 samples/sec Train-RPNAcc=0.821910, RPNLogLoss=0.599778, RPNL1Loss=nan, RCNNAcc=0.818267, RCNNLogLoss=3.577709, RCNNL1Loss=0.008031,
Epoch[0] Batch [500] Speed: 1.14 samples/sec Train-RPNAcc=0.828243, RPNLogLoss=0.617132, RPNL1Loss=nan, RCNNAcc=0.848381, RCNNLogLoss=3.396140, RCNNL1Loss=0.006434,
Epoch[0] Batch [600] Speed: 1.15 samples/sec Train-RPNAcc=0.832391, RPNLogLoss=0.628403, RPNL1Loss=nan, RCNNAcc=0.869345, RCNNLogLoss=2.870057, RCNNL1Loss=0.005387,
Epoch[0] Batch [700] Speed: 1.16 samples/sec Train-RPNAcc=0.834384, RPNLogLoss=0.636100, RPNL1Loss=nan, RCNNAcc=0.883437, RCNNLogLoss=2.499374, RCNNL1Loss=0.004622,
Epoch[0] Batch [800] Speed: 1.16 samples/sec Train-RPNAcc=0.836050, RPNLogLoss=0.641726, RPNL1Loss=nan, RCNNAcc=0.894673, RCNNLogLoss=2.215910, RCNNL1Loss=0.004046,
Epoch[0] Batch [900] Speed: 1.14 samples/sec Train-RPNAcc=0.837489, RPNLogLoss=0.645880, RPNL1Loss=nan, RCNNAcc=0.903519, RCNNLogLoss=1.994231, RCNNL1Loss=0.003597,
Epoch[0] Batch [1000] Speed: 1.16 samples/sec Train-RPNAcc=0.838438, RPNLogLoss=0.649027, RPNL1Loss=nan, RCNNAcc=0.910550, RCNNLogLoss=1.816863, RCNNL1Loss=0.003238,
Epoch[0] Batch [1100] Speed: 1.16 samples/sec Train-RPNAcc=0.839179, RPNLogLoss=0.650772, RPNL1Loss=nan, RCNNAcc=0.915993, RCNNLogLoss=1.673953, RCNNL1Loss=0.003987,
Epoch[0] Batch [1200] Speed: 1.15 samples/sec Train-RPNAcc=0.839684, RPNLogLoss=0.650882, RPNL1Loss=nan, RCNNAcc=0.920535, RCNNLogLoss=1.553478, RCNNL1Loss=0.003658,
Epoch[0] Batch [1300] Speed: 1.15 samples/sec Train-RPNAcc=0.840592, RPNLogLoss=0.649718, RPNL1Loss=nan, RCNNAcc=0.924115, RCNNLogLoss=1.452725, RCNNL1Loss=0.003903,
Epoch[0] Batch [1400] Speed: 1.14 samples/sec Train-RPNAcc=0.841459, RPNLogLoss=0.647637, RPNL1Loss=nan, RCNNAcc=0.927468, RCNNLogLoss=1.363495, RCNNL1Loss=0.003740,
Epoch[0] Batch [1500] Speed: 1.16 samples/sec Train-RPNAcc=0.841013, RPNLogLoss=0.645282, RPNL1Loss=nan, RCNNAcc=0.930463, RCNNLogLoss=1.284670, RCNNL1Loss=0.003492,
Epoch[0] Batch [1600] Speed: 1.16 samples/sec Train-RPNAcc=0.841707, RPNLogLoss=0.642184, RPNL1Loss=nan, RCNNAcc=0.932781, RCNNLogLoss=1.216500, RCNNL1Loss=0.003344,
Epoch[0] Batch [1700] Speed: 1.16 samples/sec Train-RPNAcc=0.841856, RPNLogLoss=0.638847, RPNL1Loss=nan, RCNNAcc=0.935341, RCNNLogLoss=1.151333, RCNNL1Loss=0.003149,
Epoch[0] Batch [1800] Speed: 1.16 samples/sec Train-RPNAcc=0.842007, RPNLogLoss=0.635248, RPNL1Loss=nan, RCNNAcc=0.937574, RCNNLogLoss=1.095160, RCNNL1Loss=0.004580,
Epoch[0] Batch [1900] Speed: 1.17 samples/sec Train-RPNAcc=0.842244, RPNLogLoss=0.631594, RPNL1Loss=nan, RCNNAcc=0.939325, RCNNLogLoss=1.043886, RCNNL1Loss=0.004343,
Epoch[0] Batch [2000] Speed: 1.17 samples/sec Train-RPNAcc=0.842616, RPNLogLoss=0.627715, RPNL1Loss=nan, RCNNAcc=0.941225, RCNNLogLoss=0.996324, RCNNL1Loss=0.004129,
Epoch[0] Batch [2100] Speed: 1.18 samples/sec Train-RPNAcc=0.843182, RPNLogLoss=0.623727, RPNL1Loss=nan, RCNNAcc=0.942862, RCNNLogLoss=0.953685, RCNNL1Loss=0.003934,
Epoch[0] Batch [2200] Speed: 1.18 samples/sec Train-RPNAcc=0.843663, RPNLogLoss=0.619788, RPNL1Loss=nan, RCNNAcc=0.944095, RCNNLogLoss=0.915521, RCNNL1Loss=0.003757,
Epoch[0] Batch [2300] Speed: 1.17 samples/sec Train-RPNAcc=0.844234, RPNLogLoss=0.615760, RPNL1Loss=nan, RCNNAcc=0.945496, RCNNLogLoss=0.879742, RCNNL1Loss=0.003595,
Epoch[0] Batch [2400] Speed: 1.18 samples/sec Train-RPNAcc=0.844505, RPNLogLoss=0.611821, RPNL1Loss=nan, RCNNAcc=0.947011, RCNNLogLoss=0.846207, RCNNL1Loss=0.003446,
Epoch[0] Batch [2500] Speed: 1.16 samples/sec Train-RPNAcc=0.844367, RPNLogLoss=0.608176, RPNL1Loss=nan, RCNNAcc=0.947858, RCNNLogLoss=0.819818, RCNNL1Loss=0.004606,
Epoch[0] Batch [2600] Speed: 1.18 samples/sec Train-RPNAcc=0.844443, RPNLogLoss=0.604457, RPNL1Loss=nan, RCNNAcc=0.948941, RCNNLogLoss=0.791787, RCNNL1Loss=0.004434,
Thank you for sharing the wonderful work, it is really help.
I encounter problems when training.
The training suspend in epoch 0 batch 3300 (or others) and gpu-util become 0%, why?
When I go to this step:
cd python
sudo python setup.py install
there has a TypeErrror:
File "setup.py", line 83, in <module> **kwargs) File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 73, in run self.do_egg_install() File "/usr/lib/python2.7/site-packages/setuptools/command/install.py", line 101, in do_egg_install cmd.run() File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 380, in run self.easy_install(spec, not self.no_deps) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 604, in easy_install return self.install_item(None, spec, tmpdir, deps, True) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 655, in install_item self.process_distribution(spec, dist, deps) File "/usr/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 701, in process_distribution distreq.project_name, distreq.specs, requirement.extras TypeError: __init__() takes exactly 2 arguments (4 given)
please tell me how to solve it. Thanks!
Does anybody try cuda7.5? Because the GPU driver of our server can't allow me to install cuda8.0.
if i want to fine tune rfcn and faster_rcnn with my own pretrained model, with another a little datasets
Hi,
Thank you again for sharing this amazing repository.
I noticed that, although the training scripts are provided for Deformable Faster R-CNN, the pre-trained models for Faster R-CNN and Deformable Faster R-CNN are missing. I can also see that you already have the results of Deformable Faster R-CNN on MS COCO.
Could you please kindly share the pre-trained model for Deformable Faster R-CNN (2fc), ResNet-v1-101 with us?
Thank you!
in caffe,
//kernel_dim_ = C * H * W
kernel_dim_ = this->blobs_[0]->count(1);
weight_offset_ = conv_out_channels_ * kernel_dim_ / group_;
but here,
kernel_dim_ = conv_in_channels_ / group_ * param_.kernel.Size();
weight_offset_ = conv_out_channels_ * kernel_dim_ / group_;
why not kernel_dim_ = conv_in_channels_ * param_.kernel.Size() , is it needed to be devided by group_?
Thanks for the great work.
I followed the installation steps but got error when running 'python ./rfcn/demo.py'
Ubuntu16.04, GCC 5.4
GTX1060
MXNet Installation validated successfully.
[15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered
(mxnet) xx@xx-xx:~/PycharmProjects/Deformable-ConvNets-master$ python ./rfcn/demo.py
{'CLASS_AGNOSTIC': True,
'MXNET_VERSION': 'mxnet',
'SCALES': [(600, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'CXX_PROPOSAL': False,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': 0,
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': 0,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'max_per_image': 100,
'test_epoch': 8},
'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
'RPN_BATCH_IMAGES': 0,
'rfcn1_epoch': 0,
'rfcn1_lr': 0,
'rfcn1_lr_step': '',
'rfcn2_epoch': 0,
'rfcn2_lr': 0,
'rfcn2_lr_step': '',
'rpn1_epoch': 0,
'rpn1_lr': 0,
'rpn1_lr_step': '',
'rpn2_epoch': 0,
'rpn2_lr': 0,
'rpn2_lr_step': '',
'rpn3_epoch': 0,
'rpn3_lr': 0,
'rpn3_lr_step': ''},
'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': -1,
'BATCH_ROIS_OHEM': 128,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': True,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'CXX_PROPOSAL': False,
'ENABLE_OHEM': True,
'END2END': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'FLIP': True,
'RESUME': True,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 0,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'SHUFFLE': True,
'begin_epoch': 5,
'end_epoch': 8,
'lr': 0.0005,
'lr_factor': 0.1,
'lr_step': '5.333',
'model_prefix': 'e2e',
'momentum': 0.9,
'warmup': False,
'warmup_lr': 5e-05,
'warmup_step': 1000,
'wd': 0.0005},
'dataset': {'NUM_CLASSES': 81,
'dataset': 'coco',
'dataset_path': './data/coco',
'image_set': 'train2014+val2014',
'proposal': 'rpn',
'root_path': './data',
'test_image_set': 'test-dev2015'},
'default': {'frequent': 20, 'kvstore': 'device'},
'gpus': '0',
'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [4, 8, 16, 32],
'FIXED_PARAMS': ['conv1',
'bn_conv1',
'res2',
'bn2',
'gamma',
'beta'],
'FIXED_PARAMS_SHARED': ['conv1',
'bn_conv1',
'res2',
'bn2',
'res3',
'bn3',
'res4',
'bn4',
'gamma',
'beta'],
'IMAGE_STRIDE': 0,
'NUM_ANCHORS': 12,
'PIXEL_MEANS': array([ 103.06, 115.9 , 123.15]),
'RCNN_FEAT_STRIDE': 16,
'RPN_FEAT_STRIDE': 16,
'pretrained': './model/pretrained_model/resnet_v1_101',
'pretrained_epoch': 0},
'output_path': './output/rfcn',
'symbol': 'resnet_v1_101_rfcn'}
[15:00:14] /home/zehao/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered
Stack trace returned 10 entries:
[bt] (0) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd8c20081bc]
[bt] (1) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x8c9) [0x7fd8c2e09c39]
[bt] (2) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fd8b930c57c]
[bt] (3) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7fd8b930bcd5]
[bt] (4) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7fd8b9303376]
[bt] (5) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(+0x9db3) [0x7fd8b92fadb3]
[bt] (6) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7fd8ca996e93]
[bt] (7) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x715d) [0x7fd8caa4980d]
[bt] (8) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7fd8caa4bc3e]
[bt] (9) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47) [0x7fd8caa4b1f7]
Traceback (most recent call last):
File "./rfcn/demo.py", line 129, in <module>
main()
File "./rfcn/demo.py", line 89, in main
arg_params=arg_params, aux_params=aux_params)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/tester.py", line 29, in __init__
self._mod.bind(provide_data, provide_label, for_training=False)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/module.py", line 839, in bind
for_training, inputs_need_grad, force_rebind=False, shared_module=None)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/module.py", line 396, in bind
state_names=self._state_names)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 186, in __init__
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 272, in bind_exec
shared_group))
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 545, in _bind_ith_exec
context, self.logger)
File "/home/zehao/PycharmProjects/Deformable-ConvNets-master/rfcn/core/DataParallelExecutorGroup.py", line 523, in _get_or_reshape
arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
File "/home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/ndarray.py", line 980, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype)
File "<string>", line 36, in _zeros
File "/home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/base.py", line 84, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [15:00:14] src/c_api/c_api_ndarray.cc:392: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered
Stack trace returned 10 entries:
[bt] (0) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd8c20081bc]
[bt] (1) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/site-packages/mxnet-0.9.5-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x8c9) [0x7fd8c2e09c39]
[bt] (2) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fd8b930c57c]
[bt] (3) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7fd8b930bcd5]
[bt] (4) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7fd8b9303376]
[bt] (5) /home/zehao/anaconda2/envs/mxnet/lib/python2.7/lib-dynload/_ctypes.so(+0x9db3) [0x7fd8b92fadb3]
[bt] (6) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7fd8ca996e93]
[bt] (7) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x715d) [0x7fd8caa4980d]
[bt] (8) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7fd8caa4bc3e]
[bt] (9) /home/zehao/anaconda2/envs/mxnet/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8b47) [0x7fd8caa4b1f7]
I want to change the symbol 'resnet_v1_101_rfcn' to another symbol 'resnet_v2_101_rfcn', so I just copy this file and change the name to resnet_v2_101_rfcn, besides, I also change the class name to resnet_v2_101_rfcn. Then, I change the name and symbol's name of the corresponding .yaml file and run python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v2_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml
, the error occur: NameError: name 'resnet_v2_101_rfcn' is not defined
What should I do?
@orpine I just noticed that in your deeplab implementation after the res5c layer there is no ASPP or atrous convolution . In actual deeplab implementation the shared conv layer(res5c) is followed by 4 atrous convolution layers with varying dilation parameters (namely 4,8,16,24)....Is there any specific reason you chose to instead follow it with fc6 and score layer instead?
conv_feat = self.get_resnet_conv(data)
fc6_bias = mx.symbol.Variable('fc6_bias', lr_mult=2.0)
fc6_weight = mx.symbol.Variable('fc6_weight', lr_mult=1.0)
fc6 = mx.symbol.Convolution(data=conv_feat, kernel=(1, 1), pad=(0, 0), num_filter=1024, name="fc6", bias=fc6_bias, weight=fc6_weight,workspace=self.workspace)
relu_fc6 = mx.sym.Activation(data=fc6, act_type='relu', name='relu_fc6')
score_bias = mx.symbol.Variable('score_bias', lr_mult=2.0)
score_weight = mx.symbol.Variable('score_weight', lr_mult=1.0)
score = mx.symbol.Convolution(data=relu_fc6, kernel=(1, 1), pad=(0, 0), num_filter=num_classes, name="score", bias=score_bias,weight=score_weight, workspace=self.workspace)
I am new to MXNet, I looked into the deformable convolution and pooling implementation in from /rfcn/operator_cxx, but where are these being integrated and called in the MXNet?
Hello,
I would like to test your demo, but got some error.
My environment: Ubuntu 14.04, Tesla K80, CUDA8.0
I installed the MXNet with the checkout 62ecb60
, and copy your additional operators to $(YOUR_MXNET_FOLDER)/src/operator/contrib
. I successfully compiled MXNet. After this, I start testing your code. However, when I ran the demo python ./rfcn/demo.py
, I got the following error.
kelin@vision-kevin-gpu-exp:~/code/Deformable-ConvNets$ python ./rfcn/demo.py
libdc1394 error: Failed to initialize libdc1394
{'CLASS_AGNOSTIC': True,
'MXNET_VERSION': 'mxnet',
'SCALES': [(600, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'CXX_PROPOSAL': False,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': 0,
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': 0,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'max_per_image': 100,
'test_epoch': 8},
'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
'RPN_BATCH_IMAGES': 0,
'rfcn1_epoch': 0,
'rfcn1_lr': 0,
'rfcn1_lr_step': '',
'rfcn2_epoch': 0,
'rfcn2_lr': 0,
'rfcn2_lr_step': '',
'rpn1_epoch': 0,
'rpn1_lr': 0,
'rpn1_lr_step': '',
'rpn2_epoch': 0,
'rpn2_lr': 0,
'rpn2_lr_step': '',
'rpn3_epoch': 0,
'rpn3_lr': 0,
'rpn3_lr_step': ''},
'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': -1,
'BATCH_ROIS_OHEM': 128,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': True,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'CXX_PROPOSAL': False,
'ENABLE_OHEM': True,
'END2END': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'FLIP': True,
'RESUME': True,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 0,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'SHUFFLE': True,
'begin_epoch': 5,
'end_epoch': 8,
'lr': 0.0005,
'lr_factor': 0.1,
'lr_step': '5.333',
'model_prefix': 'e2e',
'momentum': 0.9,
'warmup': False,
'warmup_lr': 5e-05,
'warmup_step': 1000,
'wd': 0.0005},
'dataset': {'NUM_CLASSES': 81,
'dataset': 'coco',
'dataset_path': './data/coco',
'image_set': 'train2014+val2014',
'proposal': 'rpn',
'root_path': './data',
'test_image_set': 'test-dev2015'},
'default': {'frequent': 20, 'kvstore': 'device'},
'gpus': '0',
'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [4, 8, 16, 32],
'FIXED_PARAMS': ['conv1',
'bn_conv1',
'res2',
'bn2',
'gamma',
'beta'],
'FIXED_PARAMS_SHARED': ['conv1',
'bn_conv1',
'res2',
'bn2',
'res3',
'bn3',
'res4',
'bn4',
'gamma',
'beta'],
'IMAGE_STRIDE': 0,
'NUM_ANCHORS': 12,
'PIXEL_MEANS': array([ 103.06, 115.9 , 123.15]),
'RCNN_FEAT_STRIDE': 16,
'RPN_FEAT_STRIDE': 16,
'pretrained': './model/pretrained_model/resnet_v1_101',
'pretrained_epoch': 0},
'output_path': './output/rfcn',
'symbol': 'resnet_v1_101_rfcn'}
[05:41:04] /home/kelin/code/origin_mxnet/mxnet/dmlc-core/include/dmlc/logging.h:300: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered
Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]
[05:41:04] /home/kelin/code/origin_mxnet/mxnet/dmlc-core/include/dmlc/logging.h:300: [05:41:04] src/engine/./threaded_engine.h:329: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered
Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Stack trace returned 6 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x371) [0x7f096f795e71]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]
terminate called after throwing an instance of 'dmlc::Error'
what(): [05:41:04] src/engine/./threaded_engine.h:329: [05:41:04] /home/kelin/code/origin_mxnet/mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered
Stack trace returned 8 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow6StreamINS_3gpuEE4WaitEv+0xd8) [0x7f096f760088]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(+0xe73ea8) [0x7f096f7aaea8]
[bt] (3) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x8c) [0x7f096f795b8c]
[bt] (4) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]
An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Stack trace returned 6 entries:
[bt] (0) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f096eeef06c]
[bt] (1) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x371) [0x7f096f795e71]
[bt] (2) /home/kelin/code/origin_mxnet/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt17_Function_handlerIFvvEZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE1_clEvEUlvE_E9_M_invokeERKSt9_Any_data+0x60) [0x7f096f798f00]
[bt] (3) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb1a60) [0x7f0985ef0a60]
[bt] (4) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f09887ce184]
[bt] (5) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f09884faffd]
Aborted (core dumped)
If I understand the code correctly, we can simply ignore Failed to initialize libdc1394
. The main problem should be
mxnet/mshadow/mshadow/./stream_gpu-inl.h:45: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered
Since I am able to run MXNet's demo (such as image classification on CIFAR10), my hardware setting should be okay now. I have no idea about this error. Can you please help me about this? Thanks!
I note that you have plotted the sampling red locations for different activation units, which are very helpful for understanding deformable convents, but I wonder how to get the sampling points? How can I find the location corresponds to deformable filter?
I tried to test my trained model on the Cityscapes dataset via following command:
python experiments/deeplab/deeplab_test.py --cfg experiments/deeplab/cfgs/deeplab_resnet_v1_101_cityscapes_segmentation_dcn.yaml
However, it gave me this error:
Traceback (most recent call last):
File "experiments/deeplab/deeplab_test.py", line 20, in
test.main()
File "experiments/deeplab/../../deeplab/test.py", line 99, in main
test_deeplab()
File "experiments/deeplab/../../deeplab/test.py", line 95, in test_deeplab
pred_eval(predictor, test_data, imdb, vis=args.vis, ignore_cache=args.ignore_cache, logger=logger)
File "experiments/deeplab/../../deeplab/core/tester.py", line 102, in pred_eval
evaluation_results = imdb.evaluate_segmentations(all_segmentation_result)
File "experiments/deeplab/../../deeplab/../lib/dataset/cityscape.py", line 182, in evaluate_segmentations
info = self._py_evaluate_segmentation()
File "experiments/deeplab/../../deeplab/../lib/dataset/cityscape.py", line 241, in _py_evaluate_segmentation
seg_pred = np.array(Image.open(res_save_path)).astype('float32')
File "/home/haowang/software/miniconda2/lib/python2.7/site-packages/PIL/Image.py", line 2410, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: './output/cityscape/deeplab_resnet_v1_101_cityscapes_segmentation_dcn/leftImg8bit_val/results/frankfurt/frankfurt_000001_059642.png'
I guess there is a bug in ./lib/dataset/cityscape.py, line 179, where
if not pred_segmentations:
self.write_segmentation_result(pred_segmentations)
should be
if pred_segmentations:
self.write_segmentation_result(pred_segmentations)
Hi,
I got trouble while running the scripts:
python experiments/rfcn/rfcn_end2end_train_test.py --cfg experiments/rfcn/cfgs/resnet_v1_101_voc0712_rfcn_dcn_end2end_ohem.yaml
At the first epoch, I got this error:
Traceback (most recent call last):
File "experiments/rfcn/rfcn_end2end_train_test.py", line 19, in
train_end2end.main()
File "experiments/rfcn/../../rfcn/train_end2end.py", line 164, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/rfcn/../../rfcn/train_end2end.py", line 157, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "experiments/rfcn/../../rfcn/core/module.py", line 969, in fit
self.update()
File "experiments/rfcn/../../rfcn/core/module.py", line 1051, in update
self._curr_module.update()
File "experiments/rfcn/../../rfcn/core/module.py", line 572, in update
self._kvstore)
TypeError: _update_params_on_kvstore() takes exactly 4 arguments (3 given)
I guessed that error may be caused by wrong python and mxnet version, so I removed the version existed in my computer and re-install by this way:
cd $(DCN_ROOT)/
git clone --recursive https://github.com/dmlc/mxnet.git
cd mxnet/
git checkout 62ecb60
git submodule update
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/
usr/local/cuda USE_CUDNN=1
cd python
sudo python setup.py install
I also have checked location of python and mxnet as following:
which python
/usr/bin/python
python
import mxnet
mxnet.__ path __
['/usr/local/lib/python2.7/dist-packages/mxnet-0.10.1-py2.7.egg/mxnet']
With these configurations, the error still present.
Would you please give me some advice to come over this issue?
I am very appreciated your concern.
In defom_conv_demo.py, num_deformable_group is set to 1, but in resnet_v1_101_rfcn_dcn.py, num_deformable_group is set to 4, i know how deformable convolution works when num_deformable_group=1, but how it works when num_deformable_group=4? Is it the same as group convolution in regular convolution layer? Thanks!
zhangboshen@smart-gpu-server1:~/src/mxnet/Deformable-ConvNets$ python ./rfcn/demo.py
{'CLASS_AGNOSTIC': True,
'MXNET_VERSION': 'mxnet',
'SCALES': [(600, 1000)],
'TEST': {'BATCH_IMAGES': 1,
'CXX_PROPOSAL': False,
'HAS_RPN': True,
'NMS': 0.3,
'PROPOSAL_MIN_SIZE': 0,
'PROPOSAL_NMS_THRESH': 0.7,
'PROPOSAL_POST_NMS_TOP_N': 2000,
'PROPOSAL_PRE_NMS_TOP_N': 20000,
'RPN_MIN_SIZE': 0,
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'max_per_image': 100,
'test_epoch': 8},
'TRAIN': {'ALTERNATE': {'RCNN_BATCH_IMAGES': 0,
'RPN_BATCH_IMAGES': 0,
'rfcn1_epoch': 0,
'rfcn1_lr': 0,
'rfcn1_lr_step': '',
'rfcn2_epoch': 0,
'rfcn2_lr': 0,
'rfcn2_lr_step': '',
'rpn1_epoch': 0,
'rpn1_lr': 0,
'rpn1_lr_step': '',
'rpn2_epoch': 0,
'rpn2_lr': 0,
'rpn2_lr_step': '',
'rpn3_epoch': 0,
'rpn3_lr': 0,
'rpn3_lr_step': ''},
'ASPECT_GROUPING': True,
'BATCH_IMAGES': 1,
'BATCH_ROIS': -1,
'BATCH_ROIS_OHEM': 128,
'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZATION_PRECOMPUTED': True,
'BBOX_REGRESSION_THRESH': 0.5,
'BBOX_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_WEIGHTS': array([ 1., 1., 1., 1.]),
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'CXX_PROPOSAL': False,
'ENABLE_OHEM': True,
'END2END': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'FLIP': True,
'RESUME': True,
'RPN_BATCH_SIZE': 256,
'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_MIN_SIZE': 0,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'SHUFFLE': True,
'begin_epoch': 5,
'end_epoch': 8,
'lr': 0.0005,
'lr_factor': 0.1,
'lr_step': '5.333',
'model_prefix': 'e2e',
'momentum': 0.9,
'warmup': False,
'warmup_lr': 5e-05,
'warmup_step': 1000,
'wd': 0.0005},
'dataset': {'NUM_CLASSES': 81,
'dataset': 'coco',
'dataset_path': './data/coco',
'image_set': 'train2014+val2014',
'proposal': 'rpn',
'root_path': './data',
'test_image_set': 'test-dev2015'},
'default': {'frequent': 20, 'kvstore': 'device'},
'gpus': '0',
'network': {'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [4, 8, 16, 32],
'FIXED_PARAMS': ['conv1',
'bn_conv1',
'res2',
'bn2',
'gamma',
'beta'],
'FIXED_PARAMS_SHARED': ['conv1',
'bn_conv1',
'res2',
'bn2',
'res3',
'bn3',
'res4',
'bn4',
'gamma',
'beta'],
'IMAGE_STRIDE': 0,
'NUM_ANCHORS': 12,
'PIXEL_MEANS': array([ 103.06, 115.9 , 123.15]),
'RCNN_FEAT_STRIDE': 16,
'RPN_FEAT_STRIDE': 16,
'pretrained': './model/pretrained_model/resnet_v1_101',
'pretrained_epoch': 0},
'output_path': './output/rfcn',
'symbol': 'resnet_v1_101_rfcn'}
[16:21:10] /home/zhangboshen/mxnet/dmlc-core/include/dmlc/logging.h:304: [16:21:10] src/c_api/c_api_ndarray.cc:385: Operator _zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered
Stack trace returned 10 entries:
[bt] (0) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0e1ff9981c]
[bt] (1) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN4nnvm9NodeAttrsEiPPvPiPS4+0xaca) [0x7f0e209c35da]
[bt] (2) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x142) [0x7f0e209c3d52]
[bt] (3) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f0e10a9531c]
[bt] (4) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7f0e10a94a75]
[bt] (5) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7f0e10a8c126]
[bt] (6) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9ce3) [0x7f0e10a83ce3]
[bt] (7) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7f0e2e401dc3]
[bt] (8) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6a67) [0x7f0e2e4b36c7]
[bt] (9) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7f0e2e4b61ce]
Traceback (most recent call last):
File "./rfcn/demo.py", line 130, in
main()
File "./rfcn/demo.py", line 90, in main
arg_params=arg_params, aux_params=aux_params)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/tester.py", line 29, in init
self._mod.bind(provide_data, provide_label, for_training=False)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/module.py", line 839, in bind
for_training, inputs_need_grad, force_rebind=False, shared_module=None)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/module.py", line 396, in bind
state_names=self._state_names)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 186, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 272, in bind_exec
shared_group))
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 545, in _bind_ith_exec
context, self.logger)
File "/home/zhangboshen/src/mxnet/Deformable-ConvNets/rfcn/core/DataParallelExecutorGroup.py", line 523, in _get_or_reshape
arg_arr = nd.zeros(arg_shape, context, dtype=arg_type)
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/ndarray.py", line 1028, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
File "", line 15, in _zeros
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 73, in _imperative_invoke
c_array(ctypes.c_char_p, [c_str(str(val)) for val in vals])))
File "/home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/base.py", line 85, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
_mxnet.base.MXNetError: [16:21:10] src/c_api/c_api_ndarray.cc:385: Operator zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered
Stack trace returned 10 entries:
[bt] (0) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f0e1ff9981c]
[bt] (1) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(Z20ImperativeInvokeImplRKN4nnvm9NodeAttrsEiPPvPiPS4+0xaca) [0x7f0e209c35da]
[bt] (2) /home/zhangboshen/anaconda2/lib/python2.7/site-packages/mxnet-0.10.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x142) [0x7f0e209c3d52]
[bt] (3) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f0e10a9531c]
[bt] (4) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x1f5) [0x7f0e10a94a75]
[bt] (5) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x3e6) [0x7f0e10a8c126]
[bt] (6) /home/zhangboshen/anaconda2/lib/python2.7/lib-dynload/_ctypes.so(+0x9ce3) [0x7f0e10a83ce3]
[bt] (7) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7f0e2e401dc3]
[bt] (8) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x6a67) [0x7f0e2e4b36c7]
[bt] (9) /home/zhangboshen/anaconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e) [0x7f0e2e4b61ce]
It is different between mx.nd.repeat()
and mx.nd.tile()
. So does it should be mx.nd.tile()
instead of mx.nd.repeat
?
Is it initialized from zero or pretrained model?
When i train faster-rcnn, i get the error:
rpn data not found at ./data/cache/rpn_data/voc_2007_trainval_rpn.pkl
Where can i get the voc_2007_trainval_rpn.pkl?
Thank you for sharing the wonderful work, it is really help.
I encounter problems when training.
AttributeError: 'module' object has no attribute 'DeformableConvolution',why?
I try to train deeplab-dcn version. I'm new to mxnet.
In caffe, I made the dataset list into a text file and wrote the path in prototxt.
The dataset list was made in a "image gt_image \n" .
In mxnet, How do I make image_set path and list?
Is it a other format than a text file ?
What was modified from original mxnet ?
Was it just added deformable_convolution layer ?
can the original implementation based on caffe you methtioned be shared?
With an image size of 800x1200, I get around 7.5 samples per second when training on 8 P6000 GPUs. For r-fcn, I used to get 16 samples per second using a caffe implementation for the same image size. Is this speed in line with your observations, or something is wrong with my runtime environment?
When I train DCN model on the pascalvoc2012 dataset,I encountered such problems.can anyone please afford me a help to explain where the error come from and how to eliminate it?
libpng error: Read Error
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "experiments/deeplab/../../deeplab/../lib/utils/PrefetchingIter.py", line 60, in prefetch_func
self.next_batch[i] = self.iters[i].next()
File "experiments/deeplab/../../deeplab/core/loader.py", line 185, in next
self.get_batch_parallel()
File "experiments/deeplab/../../deeplab/core/loader.py", line 234, in get_batch_parallel
rst = [multiprocess_result.get() for multiprocess_result in multiprocess_results]
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
ValueError: zero-size array to reduction operation minimum which has no identity
Hi,
Thank you for sharing this amazing repo. I would like to use your proposed method to get the features (maps or vectors) from ROIs obtained from RPN.
Is there any way to do that? If so, can you please kindly guild me through it? For example, pruning the network or directly get activations in the middle of the network.
Thank you!
Thanks for sharing your wonderful job. I noticed that you only submit the train&test scripts for R-FCN. But In your paper, you also conduct other experiments using Faster RCNN. Would you mind sharing the result for Faster RCNN detector in this MXNET framework?
Hi, thank you for releasing the code.
I have set the requirements according to your suggestions. However, I met this problem when I run the code. Could you give me a help?
In the provided symbols file resnet_v1_101_rfcn_dcn.py, the channels (filter number) of deformation convolution layer is 512 and the filter number of corresponding offset layer is 72, how to set these two numbers? For example, if I want to construct a deformation network for cifar10, the filter number of deformation convolution layer should be smaller than 512, should I choose 256 or 128?
Hi all,
you mentioned the third party implementation of bharatsingh430 in your README.
I cannot ask my question there (disabled?), therefore I try it here.
I think the model of bharatsingh430 is not compatible with your demo. When I replace the model with 'rfcn_dcn_coco-0008.params' downloaded from here, I get the following error:
Check failed: from.shape() == to->shape() operands shape mismatchfrom.shape = (60,) to.shape=(48,)
Any ideas?
Hi,
.....................................................error log ................................................................................
'pretrained_epoch': 0},
'output_path': './output/rcnn/imagenet_vid',
'symbol': 'resnet_v1_101_rcnn_dcn'}
num_images 53639
ImageNetVID_DET_train_30classes gt roidb loaded from ./data/cache/ImageNetVID_DET_train_30classes_gt_roidb.pkl
append flipped images to roidb
num_images 57834
ImageNetVID_VID_train_15frames gt roidb loaded from ./data/cache/ImageNetVID_VID_train_15frames_gt_roidb.pkl
append flipped images to roidb
filtered 3316 roidb entries: 222946 -> 219630
Segmentation fault (core dumped)
......................................................error log..........................................................................................
When I am trying to run Deformable Fastecr R-CNN for traing. It always shows Segmentation Fault no matter when I change VOC or coco. I have try on two server with 8 GPU. Shows the same fault. Could you please give a hint what the problem may be?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.