Git Product home page Git Product logo

fpn's Introduction

Hi there 👋

xmyqsh's github stats

Top Langs

fpn's People

Contributors

xmyqsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fpn's Issues

where to download resnet50.npy

when try to train FPN, i found the following exception:
Exception: Check your pretrained model data/pretrain_model/Resnet50.npy

where can i download Resnet50.npy?

training loss is nan

i noticed that lib/networks/network.py have been modified.
use the previous version is fine, but use modified version will lead to gradient explosion.
by the way, the dataset i use is caltech pedestrian detection.
apologize for my poor English:)

alt_opt testing problem

The alt_opt training has been done. But when I test the network, the file ../tools/test_net.py doesn't exsit.

Experimental result

Cool code! I wonder whether you have reimplemented the exact results of the FPN paper?

Best!
guangxing

UnknownError (see above for traceback): KeyError: b'TRAIN'

When I started to train end-to-end , the error happened:

W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TRAIN'

Caused by op 'RPN/rpn_rois/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 427, in setup
.proposal_layer(_feat_stride[2:], anchor_size[2:], 'TRAIN',name = 'rpn_rois'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 345, in proposal_layer
[tf.float32]),
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/gtx980a-206a/anaconda/envs/py3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

UnknownError (see above for traceback): KeyError: b'TRAIN'
[[Node: RPN/rpn_rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT], token="pyfunc_1", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/Reshape_2/_717, RPN/rpn_bbox_pred/BiasAdd/_719, RPN/Reshape_5/_721, RPN/rpn_bbox_pred_1/BiasAdd/_723, RPN/Reshape_8/_725, RPN/rpn_bbox_pred_2/BiasAdd/_727, RPN/Reshape_11/_729, RPN/rpn_bbox_pred_3/BiasAdd/_731, RPN/Reshape_14/_733, RPN/rpn_bbox_pred_4/BiasAdd/_735, _arg_im_info_0_5, RPN/rpn_rois/PyFunc/input_11, RPN/rpn_rois/PyFunc/input_12, RPN/rpn_rois/PyFunc/input_13)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

VGG based FPN model

I want to use FPN for the VGG model since I only have 2 gtx 980 with 8 G memory, do you have plan to share a VGG based model ?

Results on COCO 2014, with TFFRCNN baseline

Hi, I've been looking for a working tensorflow implementation of FPN for some time now, and I think that this actually works :).

I'm using TFFRCNN to establish a baseline (this repo also seams to be a direct port of that, but i could be mistaken?). First i tried traning on pascal voc 2007 and testing on pascal voc 2007. That, sadly, didn't give an increase in accuracy (TFFRCNN reported 0.7 mAP and this reported 0.698 mAP, both were trained for 160k iterations), but the RPN loss during training was really good, so that gave me hope :)

But the COCO dataset seams to be a better candidate for testing this, first, because this is what the authors of the FPN paper report on and second, because the COCO evaluation metrics is significantly more fine grained.

below are the result:


Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 = 0.17, 0.20, 0.03
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 = 0.34, 0.37, 0.03
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 = 0.16, 0.20, 0.04
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.03, 0.08, 0.05
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.18, 0.23, 0.05
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.29, 0.27, -0.02
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 = 0.19, 0.21, 0.02
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 = 0.27, 0.33, 0.06
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 = 0.27, 0.33, 0.06
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.05, 0.13, 0.08
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.30, 0.39, 0.09
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.47, 0.47, 0.00

Where the 3 numbers at the end are (TFFRCNN, FPN, difference between the two).

The paper uses a slightly different training and testing set (training_set + train35k for traning and minival for testing (minival is only 5k images, while val is 40k images)). But the relative differences between
faster-rcnn and FPN are in the same ballpark. Were seeing large increases in performance for small instances (both AR and AP), which is exactly what FPN sets out to do. So congrats! @xmyqsh :). The only result thats worse that TFFRCNN is the large instances, but that maybe remedied by two thing. First I only used (P3 -P5) for the class/bbox heads (similar to the paper), but I see you now use P6 aswell. Second I accidentally used OHEM when training/testing TFFRCNN, and not for FPN, so the test is actually not completely fair to FPN.

I'm going to focus on implementing RoiAlign and attaching a Mask head, so we can maybe replicate the results of Mask R-CNN.

PS. @xmyqsh should I do a pull request so that we can all train/test on coco (I just took the coco dataset code from TFFRCNN and made a few changes to the training code, so that instances with no gt boxes in the traningset are handled)

cannot convert float infinity to integer

我在训练自己的数据集时,由于数据集中有些图片是没有我要标注的目标,所以这部分图片没有标签,我在训练是出现
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
可是当我把这部分图片的标签设置成0 0 0 0,生成xml文件,但是在训练时出现
tensorflow.python.framework.errors_impl.UnknownError: exceptions.OverflowError: cannot convert float infinity to integer @xmyqsh

ValueError: attempt to get argmax of an empty sequence

please help me .when i train my own data
2017-11-15 21:07:00.175803: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "./faster_rcnn/train_net.py", line 109, in
restore=bool(int(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 409, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 263, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "./faster_rcnn/train_net.py", line 101, in
network = get_network(args.network_name)
File "./faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "./faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "./faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "./faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): exceptions.ValueError: attempt to get argmax of an empty sequence
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _arg_gt_boxes_0_3, _arg_gt_ishard_0_4, _arg_dontcare_areas_0_2, _arg_im_info_0_5, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13/_619 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_2711_gradients/RPN/rpn_bbox_pred_reshape_concat_grad/Gather_13", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

running test_net.py gives no detection result

I managed to do training on my own datasets, after 12w iterations I went for a test by using commands similar to
CUDA_VISIBLE_DEVICES=0 python ./faster_rcnn/test_net.py --gpu 0 --weights output/FPN_end2end/voc_0712_trainval/FPN_iter_370000.ckpt --imdb voc_0712_test --cfg ./experiments/cfgs/FPN_end2end.yml --network FPN_test

But detection results from some images turn out to be nothing while the others seem normal.
I have my own loader inherited from imdb and it works well on faster-rcnn from this repository:

https://github.com/ruotianluo/pytorch-faster-rcnn

I'm totally confused, any suggestions will be appreciated, thanks.

nms model cannot be imported

Hi @xmyqsh ,

When I run the training command, there is an error message
Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 23, in <module> from lib.fast_rcnn.train import get_training_roidb, train_net File "./faster_rcnn/../lib/fast_rcnn/__init__.py", line 9, in <module> from . import train File "./faster_rcnn/../lib/fast_rcnn/train.py", line 15, in <module> from .nms_wrapper import nms_wrapper File "./faster_rcnn/../lib/fast_rcnn/nms_wrapper.py", line 14, in <module> from ..nms.gpu_nms import gpu_nms ImportError: No module named gpu_nms Command exited with non-zero status 1

Do you have any idea to fix this issue?

Thank you!

Train New Dataset

Hi @xmyqsh, thanks for your work firstly. The code is neat and wonderful. But I wonder if you could add more info of dataset structure and how we use this code to train and test. BTW, it seems a little bit difficult if I change the training set to my dataset, would you give me some advice? The pascol_voc.py in datasets contains many functions and I am not sure if I should rewrite all the functions. (my dataset structure similar to the VOC)
@ouchjm

TypeError: exceptions must be old-style classes or derived from BaseException, not Command exited with non-zero status 1

Please help me
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
File "./tools/train_net_alt_opt.py", line 109, in
max_iters=max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 513, in train_net
sw.train_model(sess, vs_names, max_iters)
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 406, in train_model
self.train_rpn(sess, vs_names[0], max_iters[0], init_model=self.pretrained_mod
File "./tools/../lib/fast_rcnn/train_opt_alt.py", line 154, in train_rpn
raise 'Check your pretrained model {:s}'.format(init_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not
Command exited with non-zero status 1
8.84user 0.76system 0:08.69elapsed 110%CPU (0avgtext+0avgdata 928076maxresident)k
0inputs+2224outputs (0major+134721minor)pagefaults 0swaps

tensorflow.python.framewor.errors

当我运行FPN_alt_opt.sh时,出现如下问题:
dxt@dxt-System-Product-Name:~/FPN-master (2)$ ./experiments/scripts/FPN_alt_opt.sh 0 FPN_alt_opt pascal_voc0712

  • set -e
  • export PYTHONUNBUFFERED=True
  • PYTHONUNBUFFERED=True
  • GPU_ID=0
  • NET=FPN_alt_opt
  • NET_lc=fpn_alt_opt
  • DATASET=pascal_voc0712
  • array=($@)
  • len=3
  • EXTRA_ARGS=
  • EXTRA_ARGS_SLUG=
  • case $DATASET in
  • TRAIN_IMDB=voc_0712_trainval
  • TEST_IMDB=voc_0712_test
  • PT_DIR=pascal_voc
  • CFG=experiments/cfgs/FPN_alt_opt.yml
    ++ date +%Y_%m_%d_%H_%M_%S
  • LOG=experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
  • exec
    ++ tee -a experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
    tee: experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59: 没有那个文件或目录
  • echo Logging output to experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
    Logging output to experiments/logs/faster_rcnn_end2end_FPN_alt_opt_.txt.2018_03_06_21_06_59
  • CUDA_VISIBLE_DEVICES=0
  • time python ./tools/train_net_alt_opt.py --gpu 0 --weights data/pretrain_model/Resnet50.npy --imdb voc_0712_trainval --cfg experiments/cfgs/FPN_alt_opt.yml --network FPN_alt_opt_train_test
    Traceback (most recent call last):
    File "./tools/train_net_alt_opt.py", line 26, in
    from lib.networks.factory import get_network
    File "./tools/../lib/networks/init.py", line 8, in
    from .FPN_train import FPN_train
    File "./tools/../lib/networks/FPN_train.py", line 9, in
    from .network import Network
    File "./tools/../lib/networks/network.py", line 5, in
    from ..roi_pooling_layer import roi_pooling_op as roi_pool_op
    File "./tools/../lib/roi_pooling_layer/init.py", line 7, in
    import roi_pooling_op
    File "./tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
    _roi_pooling_module = tf.load_op_library(filename)
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
    tensorflow.python.framework.errors_impl.NotFoundError: ./tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumES3
    Command exited with non-zero status 1
    1.84user 0.35system 0:02.14elapsed 102%CPU (0avgtext+0avgdata 198848maxresident)k
    0inputs+32outputs (0major+49923minor)pagefaults 0swaps

@xmyqsh 我不知道如何解决,请帮助我,谢谢啦!!!

stuck in loading the resnet50

I am using two 970 to run the network, it just get stuck in loading the Resnet50.npy, maybe the memory is too small, but I suspect it will be stuck in a later stage than stuck in loading the Resnet50 anyway

2017-08-16 09:00:28.165008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1 
2017-08-16 09:00:28.165016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y Y 
2017-08-16 09:00:28.165028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1:   Y Y 
2017-08-16 09:00:28.165033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
2017-08-16 09:00:28.165036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 970, pci bus id: 0000:06:00.0)
Computing bounding-box regression targets...

19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading pretrained model weights from data/pretrain_model/Resnet50.npy
Traceback (most recent call last):
  File "./faster_rcnn/train_net.py", line 109, in <module>
    restore=bool(int(args.restore)))
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 164, in train_model
    raise 'Check your pretrained model {:s}'.format(self.pretrained_model)
TypeError: exceptions must be old-style classes or derived from BaseException, not str
Command exited with non-zero status 1
19.63user 1.20system 0:25.18elapsed 82%CPU (0avgtext+0avgdata 1314772maxresident)k
583592inputs+3008outputs (159major+346781minor)pagefaults 0swaps

Can not get the final checkpoint.

Hi, I use your model to train VOC2007.
I get the following checkpoint

FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage1_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage1_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage1_RPN_iter_8.ckpt.index
FPN_alt_opt_stage1_RPN_iter_8.ckpt.meta
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.index
FPN_alt_opt_stage2_Fast_RCNN_iter_4.ckpt.meta
FPN_alt_opt_stage2_RPN_iter_8.ckpt.data-00000-of-00001
FPN_alt_opt_stage2_RPN_iter_8.ckpt.index
FPN_alt_opt_stage2_RPN_iter_8.ckpt.meta

but don't have the final checkpoint, and I get this error
./tools/test_net.py --gpu 0 --weights --imdb voc_2007_test --cfg experiments/cfgs/FPN_alt_opt.yml --network FPN_alt_opt_train_test python: can't open file './tools/test_net.py': [Errno 2] No such file or directory

How can I fix it.
Thank you!

Getting -1 for map using VOC07+12 Trainval

Hi,thank you for your code,I'm new on this ,I use your net and change the baseline to resnext using resnext50 which is convertied from caffe model ,but I get -1 mAP for all classes,can you tell me why? Thank you and please forgive my poor English!

How to run test_net? (to get the mAP of my task)

Hi
Thx for your excellent job! I'm a new learner and it benefits me a lot.
I notice that there is no test_net.py in FPN/faster_rcnn/

So, could u tell me how to run a test? I neet to know the mAP of my task.

When i use the test_net.py from another project, there gose an error
Traceback (most recent call last):
File "faster_rcnn/test_net.py", line 85, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 19, in get_network
return FPN_test()
File "faster_rcnn/../lib/networks/FPN_test.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_test.py", line 231, in setup
.fc(n_classes, relu=False, name='cls_score')
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 390, in fc
dim *= d
TypeError: unsupported operand type(s) for *=: 'int' and 'NoneType'

How can i fix this?

Thx again :)
@xmyqsh

no module named cython_bbox

when I train train my dataset with end to end,it show's error :
from ..utils.cython_bbox import bbox_overlaps
ImportError:no module named cython_bbox
and I can't find cython_bbox.py in utils.
please help me,thanks!

Memory leak problem with proposal_layer.py

Hi, I encountered a memory leak problem when I inference images with the trained fpn model.

In my case, I load the model with net = caffe.Net(...) in python and using net.forward() to get the detection results (scores, predicted bboxes). I monitored the memory usage of the program and I found that more inferences done, more memory used. I looked into the problem and found it might have something to do with the proposal_layer.py file.

I found that if I comment out the lines after

leveled_rois = [None] * 5

there will be no memory leak. However, if I comment out the lines after (including this line)
for level_idx in xrange(0, 5):

memroy leak appears. So I think the memory leak is caused by the following lines.
leveled_idxs = [[], [], [], [], []]
for idx, roi in enumerate(rpn_rois):
level_idx = level(roi) - 2
leveled_idxs[level_idx].append(idx)

Does anybody have this problem too and is my guess correct?
Any opinions will be appreciated.

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.

hello, I have successfully iterated 100 times,the following mistake suddenly appeared, anyone have idea about this?

iter: 0 / 200000, total loss: 18.0627, rpn_loss_cls: 0.9430, rpn_loss_box: 10.3524, loss_cls: 4.4159, loss_box: 2.3514, lr: 0.000500
speed: 3.120s / iter
2017-12-29 13:22:04.715605: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2001 get requests, put_count=1760 evicted_count=1000 eviction_rate=0.568182 and unsatisfied allocation rate=0.670165
2017-12-29 13:22:04.715654: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
image: 025553069_K1210503_T001_1_10.jpg iter: 20 / 200000, total loss: 3.2476, rpn_loss_cls: 0.2447, rpn_loss_box: 2.8628, loss_cls: 0.0771, loss_box: 0.0630, lr: 0.000500
speed: 1.036s / iter
image: 030446539_K1221297_419_1_05.jpg iter: 40 / 200000, total loss: 4.8578, rpn_loss_cls: 0.1838, rpn_loss_box: 3.7386, loss_cls: 0.4705, loss_box: 0.4649, lr: 0.000500
speed: 1.033s / iter
2017-12-29 13:22:46.655261: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3000 get requests, put_count=3099 evicted_count=1000 eviction_rate=0.322685 and unsatisfied allocation rate=0.308
2017-12-29 13:22:46.655287: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
image: 030454261_K1221455_T001_5_04.jpg iter: 60 / 200000, total loss: 2.1451, rpn_loss_cls: 0.1146, rpn_loss_box: 1.7099, loss_cls: 0.2562, loss_box: 0.0643, lr: 0.000500
speed: 0.708s / iter
image: 025913309_K1214499_161_1_07.jpg iter: 80 / 200000, total loss: 2.7814, rpn_loss_cls: 0.2832, rpn_loss_box: 1.0201, loss_cls: 0.6792, loss_box: 0.7989, lr: 0.000500
speed: 1.077s / iter
image: 030141742_K1217637_285_1_28.jpg iter: 100 / 200000, total loss: 2.7696, rpn_loss_cls: 0.2172, rpn_loss_box: 0.8301, loss_cls: 0.7361, loss_box: 0.9862, lr: 0.000500
speed: 0.945s / iter
Traceback (most recent call last):
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "faster_rcnn/../lib/rpn_msr/anchor_target_layer.py", line 151, in anchor_target_layer
argmax_overlaps = overlaps.argmax(axis=1) # (A)
ValueError: attempt to get argmax of an empty sequence
2017-12-29 13:23:49.293478: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
2017-12-29 13:23:49.330989: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 106, in
restore=bool(int(args.restore)))
File "faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "faster_rcnn/../lib/fast_rcnn/train.py", line 261, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "faster_rcnn/train_net.py", line 98, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Command exited with non-zero status 1
167.60user 9.99system 2:55.72elapsed 101%CPU (0avgtext+0avgdata 2398032maxresident)k
0inputs+3640outputs (0major+1128621minor)pagefaults 0swaps

Loading pretrained model weights from data/pretrain_model/Resnet50.npy,I met this problem

num gt: 2
num fg: 17
num bg: 111
cudaCheckError() failed in ROIPoolForward: invalid device function
cudaCheckError() failed in ROIPoolForward: driver shutting down
2017-08-21 10:19:34.578317: F tensorflow/stream_executor/cuda/cuda_driver.cc:312] Check failed: CUDA_SUCCESS == cuCtxSetCurrent(cuda_context->context()) (0 vs. 4)
2017-08-21 10:19:34.578347: E tensorflow/stream_executor/stream.cc:289] Error recording event in stream: error recording CUDA event on stream 0x61a7d80: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2017-08-21 10:19:34.578429: E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
2017-08-21 10:19:34.578450: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:203] Unexpected Event status: 1
Command terminated by signal 6
65.76user 10.62system 1:13.23elapsed 104%CPU (0avgtext+0avgdata 1891988maxresident)k
0inputs+3016outputs (0major+831469minor)pagefaults 0swaps
@xmyqsh ,I met this problem, how to solve?

Memory allocation problem

Hi, I encountered memory allocation problem during initialization (solving).

InternalError (see above for traceback): Dst tensor is not initialized. [[Node: fc7_new/weights/Momentum/Initializer/Const = Const[_class=["loc:@fc7_new/weights"], dtype=DT_FLOAT, value=Tensor<type: float shape: [4096,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

and

Limit: 5111519641
InUse: 5111519488
MaxInUse: 5111519488
NumAllocs: 148
MaxAllocSize: 4991561984

My setting:

  • single GPU, 11.57G GPU memory available
  • (same as faster rcnn) 1 image, shorter side 600, batch size 128, rpn batch size 256
  • same error if i change to batch size 16, rpn batch size 32

So I assume the model is too large to fit in?

Tensorflow does have the issue that taking up too much memory in initialization. Wonder anyone has encountered or resolved this issue?

Thanks!

Rationale behind rpn loss in build_loss

In the build_loss function, I see that the labels are filtered for only foreground/background regions before calculating the rpn class loss and that the SmoothL1Loss is normalized by the number of foreground regions. I don't see this in the original Faster-RCNN implementation. @xmyqsh Can you please explain why this is necessary?

different shape error when training now data

When I traing my data using
nohup ./experiments/scripts/FPN_end2end.sh 0 FPN pascal_voc2007 --set RNG_SEED 42 TRAIN.SCALES "[800]" > FPN.log 2>&1 &
I got this trouble

ignore bn4f_branch2a offset
ignore res5a_branch2c weights
ignore res5a_branch2b weights
ignore res5a_branch2a weights
ignore res3d_branch2b weights
ignore res3d_branch2c weights
ignore res3d_branch2a weights
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in exp
  pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:61: RuntimeWarning: overflow encountered in multiply
  pred_w = np.exp(dw) * widths[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in exp
  pred_h = np.exp(dh) * heights[:, np.newaxis]
./faster_rcnn/../lib/fast_rcnn/bbox_transform.py:62: RuntimeWarning: overflow encountered in multiply
  pred_h = np.exp(dh) * heights[:, np.newaxis]
Traceback (most recent call last):
  File "./faster_rcnn/train_net.py", line 109, in <module>
    restore=bool(int(args.restore)))
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
    sw.train_model(sess, max_iters, restore=restore)
  File "./faster_rcnn/../lib/fast_rcnn/train.py", line 277, in train_model
    bbox_pred = bbox_pred * np.tile(self.bbox_stds, (bbox_pred.shape[0], 1)) + \
ValueError: operands could not be broadcast together with shapes (128,84) (128,8) 
Command exited with non-zero status 1
13.00user 2.02system 0:14.48elapsed 103%CPU (0avgtext+0avgdata 2385504maxresident)k
0inputs+3096outputs (0major+371289minor)pagefaults 0swaps

I don't know how to fix it.

FPN ROI Choosing

Hi there! As I scan through Feature Pyramid Network for Object Detection, I found a part where there is a formula for choosing the feature map for ROI based on the size of the region proposal. Can you show me how you implement this? I wish to implement FPN on the new Object Detection API provided by Tensorflow.

dataset structure

Hi, you reference the folder pascal_voc0712 in the training/test scripts, but what is the structure of that folder?

I guess it contains the pascal voc 2007 and pascal voc 2012 dataset, but i cant find instructions on how to create that folder?

Encounter this error: tensorflow.python.framework.errors_impl.NotFoundError

When I run train_net.py, I have met the following error:

Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 26, in
from lib.networks.factory import get_network
File "faster_rcnn/../lib/networks/init.py", line 8, in
from .FPN_train import FPN_train
File "faster_rcnn/../lib/networks/FPN_train.py", line 9, in
from .network import Network
File "faster_rcnn/../lib/networks/network.py", line 5, in
from ..roi_pooling_layer import roi_pooling_op as roi_pool_op
File "faster_rcnn/../lib/roi_pooling_layer/init.py", line 7, in
import roi_pooling_op
File "faster_rcnn/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: faster_rcnn/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

What's the reason? Is there anybody who will give some advice?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.