Git Product home page Git Product logo

fpn_tensorflow's Introduction

Feature Pyramid Networks for Object Detection

Note

A development version based on FPN.
Support multi-gpu training!

Abstract

This is a tensorflow re-implementation of Feature Pyramid Networks for Object Detection.

This project is based on Faster-RCNN, and completed by YangXue and YangJirui.

Train on VOC 2007 trainval and test on VOC 2007 test (PS. This project also support coco training.)

1

Comparison

use_voc2007_metric

Models mAP sheep horse bicycle bottle cow sofa bus dog cat person train diningtable aeroplane car pottedplant tvmonitor chair bird boat motorbike
Faster-RCNN resnet50_v1 73.09 72.11 85.63 77.74 55.82 81.19 67.34 82.44 85.66 87.34 77.49 79.13 62.65 76.54 84.01 47.90 74.13 50.09 76.81 60.34 77.47
Faster-RCNN resnet101_v1 74.63 76.35 86.18 79.87 58.73 83.4 74.75 80.03 85.4 86.55 78.24 76.07 70.89 78.52 86.26 47.80 76.34 52.14 78.06 58.90 78.04
Faster-RCNN mobilenet_v2 50.34 46.99 68.45 65.89 28.16 53.21 46.96 57.80 38.60 44.12 66.20 60.49 52.40 56.06 72.68 26.91 49.99 30.18 39.38 38.54 64.74
FPN resnet50_v1 74.26 73.27 82.23 82.99 61.27 80.59 72.73 81.37 85.26 84.76 80.33 77.43 65.31 79.18 85.78 46.47 73.10 55.99 76.11 59.80 81.19
FPN resnet101_v1 76.14 74.63 85.13 81.67 63.79 82.43 77.83 83.07 86.45 85.82 81.08 81.01 71.22 80.01 86.30 48.05 73.89 56.99 78.33 62.91 82.24
FPN resnet101_v1+ 75.71 74.83 83.55 82.47 65.49 77.85 71.74 80.98 86.61 87.14 81.02 77.76 71.26 79.82 86.78 51.64 77.45 56.12 79.44 60.55 81.69
FPN resnet101_v1++ 75.89 76.05 84.22 80.29 63.21 83.04 78.69 81.81 86.61 85.61 79.75 79.78 71.27 80.33 86.24 49.03 76.81 56.32 78.51 60.37 79.91

+: SHARE_NET=False
++: SHORT_SIDE_LEN=800, FAST_RCNN_MINIBATCH_SIZE=512

COCO

Model Backbone Train Schedule GPU Image/GPU FP16 Box AP(Mask AP)
Faster (ours) R50v1-FPN 1X 1X TITAN Xp 1 no 36.1
Faster (ours) R50v1-FPN 1X 4X TITAN Xp 1 no 36.1
Faster (Face++ & Detectron) R50v1-FPN 1X 8X TITAN Xp 2 no 36.4
Faster (SimpleDet) R50v1-FPN 1X 8X 1080Ti 2 no 36.5

2

My Development Environment

1、python3.5 (anaconda recommend)
2、cuda9.0 (If you want to use cuda8, please set CUDA9 = False in the cfgs.py file.)
3、opencv(cv2)
4、tfplot
5、tensorflow == 1.10

Download Model

Please download resnet50_v1resnet101_v1 pre-trained models on Imagenet, put it to $PATH_ROOT/data/pretrained_weights.

Data Format

├── VOCdevkit
│   ├── VOCdevkit_train
│       ├── Annotation
│       ├── JPEGImages
│   ├── VOCdevkit_test
│       ├── Annotation
│       ├── JPEGImages

Compile

cd $PATH_ROOT/libs/box_utils/cython_utils
python setup.py build_ext --inplace

Demo(available)

Select a configuration file in the folder ($PATH_ROOT/libs/configs/) and copy its contents into cfgs.py, then download the corresponding weights.

cd $PATH_ROOT/tools
python inference.py --data_dir='/PATH/TO/IMAGES/' 
                    --save_dir='/PATH/TO/SAVE/RESULTS/' 
                    --GPU='0'

Eval

cd $PATH_ROOT/tools
python eval.py --eval_imgs='/PATH/TO/IMAGES/'  
               --annotation_dir='/PATH/TO/TEST/ANNOTATION/'
               --GPU='0'

Train

1、If you want to train your own data, please note:

(1) Modify parameters (such as CLASS_NUM, DATASET_NAME, VERSION, etc.) in $PATH_ROOT/libs/configs/cfgs.py
(2) Add category information in $PATH_ROOT/libs/label_name_dict/lable_dict.py     
(3) Add data_name to line 76 of $PATH_ROOT/data/io/read_tfrecord.py 

2、make tfrecord

cd $PATH_ROOT/data/io/  
python convert_data_to_tfrecord.py --VOC_dir='/PATH/TO/VOCdevkit/VOCdevkit_train/' 
                                   --xml_dir='Annotation'
                                   --image_dir='JPEGImages'
                                   --save_name='train' 
                                   --img_format='.jpg' 
                                   --dataset='pascal'

3、train

cd $PATH_ROOT/tools
python train.py

4、multi-gpu train

cd $PATH_ROOT/tools
python multi_gpu_train.py

Tensorboard

cd $PATH_ROOT/output/summary
tensorboard --logdir=.

3 4

Reference

1、https://github.com/endernewton/tf-faster-rcnn
2、https://github.com/zengarden/light_head_rcnn
3、https://github.com/tensorflow/models/tree/master/research/object_detection
4、https://github.com/CharlesShang/FastMaskRCNN
5、https://github.com/matterport/Mask_RCNN
6、https://github.com/msracver/Deformable-ConvNets

fpn_tensorflow's People

Contributors

yangxue0827 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fpn_tensorflow's Issues

pretrained weights

When I trained my model with pretrained-model, the variables are zeros like this :

model restore from pretrained mode, path is : /home/..//data/pretrained_weights/resnet_50.ckpt
resnet_v1_50/conv1/weights:0
resnet_v1_50/conv1/BatchNorm/gamma:0
resnet_v1_50/conv1/BatchNorm/beta:0
resnet_v1_50/conv1/BatchNorm/moving_mean:0
resnet_v1_50/conv1/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/weights:0
resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/gamma:0
resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/beta:0
resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/gamma:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/beta:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/weights:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/gamma:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/beta:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/weights:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/gamma:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/beta:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/weights:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/gamma:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/beta:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/weights:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/gamma:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/beta:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/weights:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/gamma:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/beta:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_mean:0
resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/BatchNorm/moving_variance:0
resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/weights:0
...
...

map!

你好,我把VOC2007数据集按照8:2的比例分成训练集和测试集,按照你的训练过程训练了150000次,最后得到的map为0.804,这结果与你训练测试的结果相差比较大的,我想知道,是否是我哪里出现问题了?
default

基础网络Mobilenetv2的时候训练的时候报了这个错误

Traceback (most recent call last):
File "/home/litao/Algorithm/FPN_Tensorflow-master/tools/train.py", line 188, in
train()
File "/home/litao/Algorithm/FPN_Tensorflow-master/tools/train.py", line 48, in train
gtboxes_batch=gtboxes_and_label)
File "../libs/networks/build_whole_network.py", line 390, in build_whole_detection_network
for level_name, p in zip(cfgs.LEVLES, P_list):
File "/home/litao/anaconda2/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 431, in iter
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.

train.py

Traceback (most recent call last):
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/qzhang/Desktop/FPN_Tensorflow-master/tools/train.py", line 187, in
train()
File "/home/qzhang/Desktop/FPN_Tensorflow-master/tools/train.py", line 145, in train
_, global_stepnp = sess.run([train_op, global_step])
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at:
File "/home/qzhang/Desktop/FPN_Tensorflow-master/tools/train.py", line 187, in
train()
File "/home/qzhang/Desktop/FPN_Tensorflow-master/tools/train.py", line 35, in train
is_training=True)
File "/home/qzhang/Desktop/FPN_Tensorflow-master/data/io/read_tfrecord.py", line 98, in next_batch
dynamic_pad=True)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 988, in batch
name=name)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 762, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3480, in queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/qzhang/.conda/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

训练自己的数据出现的问题!

楼主你好,我用你的程序训练数据。开始我拿pascal_voc的前几十张图片测试了一下,是可以训练的。但是修改了代码之后用我自己的数据训练,格式也是pascal voc格式,出现下面这个问题

Traceback (most recent call last):
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 194, in
train()
File "train.py", line 174, in train
_, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op])
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at:
File "train.py", line 194, in
train()
File "train.py", line 34, in train
is_training=True)
File "../data/io/read_tfrecord.py", line 98, in next_batch
dynamic_pad=True)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 988, in batch
name=name)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 762, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 476, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3480, in queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/amax/panzhihao/FPN_Tensorflow/venv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

我查了一下,说是数据大小不一致,但是我的数据以前在caffe上的程序跑过,并没有问题,请楼主指教一下,谢谢!

ModuleNotFoundError: No module named 'libs.box_utils.cython_utils.cython_bbox'

Traceback (most recent call last):
File "inference.py", line 17, in
from libs.networks import build_whole_network
File "../libs/networks/build_whole_network.py", line 19, in
from libs.detection_oprations.anchor_target_layer_without_boxweight import anchor_target_layer
File "../libs/detection_oprations/anchor_target_layer_without_boxweight.py", line 15, in
from libs.box_utils.cython_utils.cython_bbox import bbox_overlaps
ModuleNotFoundError: No module named 'libs.box_utils.cython_utils.cython_bbox'

加载模型出错

NotFoundError (see above for traceback): Key build_rpn/rpn_bbox_pred_P2/biases not found in checkpoint

cuda8.0

你好,假如说我还是想用cuda8.0的,而不想再安装一个cuda9.0来切换,该如何设置cfgs.py呢?

运行inference.py 测试时,运行的检测结果没有检测框。

您好,我们使用您提供的”voc_149999model.ckpt.data-00000-of-00001“作为权重文件,用于检测demos文件夹的测试图片。但是最终的运行结果显示,根本没有进行检测,即测试图像中不存在检测框。

或者说,您的代码是如何调用权重文件的?

no modul named tfplot

楼主,我在ubuntu系统上用pycharm创建虚拟环境打开工程,然后tensorflow-plot这个包也在配置环境里下载了,也安装成功了,但是运行demo的时候提示
File “../libs/networks/resnet.py”,line 12 ,in
ImportError: No module named tfplot
这是什么原因呢,请作者指教一下,谢谢!

使用自己的训练集训练正常,但测试出错


model restore from : /raid/cc/extend/FPN_Tensorflow/output/trained_weights/FPN_Res101_20181201/voc_120001model.ckpt
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:06:00.0
Total memory: 11.93GiB
Free memory: 11.81GiB
2018-12-05 17:04:54.307583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-12-05 17:04:54.307590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-12-05 17:04:54.307597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
restore model
2018-12-05 17:04:58.326784: I tensorflow/core/kernels/logging_ops.cc:79] SHAPE4***[0]
2018-12-05 17:04:58.326785: I tensorflow/core/kernels/logging_ops.cc:79] SHAPE3***[0]
2018-12-05 17:04:58.326818: I tensorflow/core/kernels/logging_ops.cc:79] SHAPE2***[555]
2018-12-05 17:04:58.326790: I tensorflow/core/kernels/logging_ops.cc:79] SHAPE5***[0]
/raid/cc/extend/FPN_Tensorflow/data/demos/img_6.jpg image cost 2.74012804031s:[>>>>> ]13% 1/82018-12-05 17:04:59.071122: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
2018-12-05 17:04:59.071244: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-12-05 17:04:59.071258: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-12-05 17:04:59.071267: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-12-05 17:04:59.071294: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-12-05 17:04:59.071301: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
2018-12-05 17:04:59.071300: W tensorflow/core/framework/op_kernel.cc:1192] Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
Traceback (most recent call last):
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 138, in
inference_save_path=args.save_dir)
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 105, in test
detect(det_net=faster_rcnn, inference_save_path=inference_save_path, real_test_imgname_list=test_imgname_list)
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 60, in detect
feed_dict={img_plac: raw_img[:, :, ::-1]} # cv is BGR. But need RGB
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
[[Node: postprocess_fastrcnn/concat_2/_1287 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_5171_postprocess_fastrcnn/concat_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op u'assign_levels/Where', defined at:
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 138, in
inference_save_path=args.save_dir)
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 105, in test
detect(det_net=faster_rcnn, inference_save_path=inference_save_path, real_test_imgname_list=test_imgname_list)
File "/home/cc/extend/FPN_Tensorflow/tools/test.py", line 35, in detect
gtboxes_batch=None)
File "/home/cc/extend/FPN_Tensorflow/libs/networks/build_whole_network.py", line 493, in build_whole_detection_network
rois_list = self.assign_levels(all_rois=rois) # rois_list: [P2_rois, P3_rois, P4_rois, P5_rois]
File "/home/cc/extend/FPN_Tensorflow/libs/networks/build_whole_network.py", line 253, in assign_levels
bbox_targets=bbox_targets)
File "/home/cc/extend/FPN_Tensorflow/libs/networks/build_whole_network.py", line 218, in get_rois
level_i_indices = tf.reshape(tf.where(tf.equal(levels, level_i)), [-1])
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 2365, in where
return gen_array_ops.where(input=condition, name=name)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4053, in where
result = _op_def_lib.apply_op("Where", input=input, name=name)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/cc/miniconda3/envs/chinese-ocr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
[[Node: assign_levels/Where = Where_device="/job:localhost/replica:0/task:0/gpu:0"]]
[[Node: postprocess_fastrcnn/concat_2/_1287 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_5171_postprocess_fastrcnn/concat_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Process finished with exit code 1

我使用的是cuda8.0,还请作者赐教!

测试时导入同一个checkpoint测试结果不一样

训练完以后导入同一个checkpoint 发现每次的结果都不一样,看了下eval.py代码
image

sess.run(tf.global_variables_initializer())

这样导致,载入的checkpoint的数据失效,所有数据重新进行了初始化,导致模型是个随机的取值,从而每次运行获得的结果不同。

想问下您这个是不是有点小问题啊。。。。望解答谢谢~~~

feature c2

feature_dict = {'C2': end_points_C2['{}/block1/unit_2/bottleneck_v1'
C2的选择,为什么用 unit_2/bottleneck_v1? 可以用unit_3吗

您好,有一些问题想问一下

您好,我看您的对比实验中改了SHARE_NET ,SHORT_SIDE_LEN , FAST_RCNN_MINIBATCH_SIZE这几个参数的值,您这么改的理由是什么啊,还有就是您有发表一些相关的论文吗,想看一看

训练过程出现Nan

你好,我利用您这份代码训练时出现了nan,应该不是数据的问题,因为我之前根据您另一份代码训练时成功的,tfrecord也是在那一份代码上就生成好的,但这份代码大概训练2800次以后loss就出现nan了,望解答谢谢~

Wrongs about CUDNN

I met this problem when I tried to train for my dataset.
F tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 256 spatial: 14 14 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
Aborted (core dumped)

eval.py

已经解决了,谢谢你

关于eval的precision

为什么最后结果的precision那么小呢,我看论文中也都挺大的啊,是还要再计算什么嘛

"C:\Program Files\Anaconda3\python.exe" D:/mycode/FPN_Tensorflow-master2/tools/train.py ++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++-- D:\mycode\FPN_Tensorflow-master2 Traceback (most recent call last): File "D:/mycode/FPN_Tensorflow-master2/tools/train.py", line 16, in <module> from libs.networks import build_whole_network File "D:\mycode\FPN_Tensorflow-master2\libs\networks\build_whole_network.py", line 19, in <module> from libs.detection_oprations.anchor_target_layer_without_boxweight import anchor_target_layer File "D:\mycode\FPN_Tensorflow-master2\libs\detection_oprations\anchor_target_layer_without_boxweight.py", line 15, in <module> from libs.box_utils.cython_utils.cython_bbox import bbox_overlaps ImportError: No module named 'libs.box_utils.cython_utils.cython_bbox' Process finished with exit code 1

"C:\Program Files\Anaconda3\python.exe" D:/mycode/FPN_Tensorflow-master2/tools/train.py
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
D:\mycode\FPN_Tensorflow-master2
Traceback (most recent call last):
File "D:/mycode/FPN_Tensorflow-master2/tools/train.py", line 16, in
from libs.networks import build_whole_network
File "D:\mycode\FPN_Tensorflow-master2\libs\networks\build_whole_network.py", line 19, in
from libs.detection_oprations.anchor_target_layer_without_boxweight import anchor_target_layer
File "D:\mycode\FPN_Tensorflow-master2\libs\detection_oprations\anchor_target_layer_without_boxweight.py", line 15, in
from libs.box_utils.cython_utils.cython_bbox import bbox_overlaps
ImportError: No module named 'libs.box_utils.cython_utils.cython_bbox'

Process finished with exit code 1

在训练FPN_MobileNetv2的时候出现Tensor objects are not iterable when eager execution is not enabled

Traceback (most recent call last):
File "/home/litao/Algorithm/FPN_Tensorflow-master/tools/train.py", line 188, in
train()
File "/home/litao/Algorithm/FPN_Tensorflow-master/tools/train.py", line 48, in train
gtboxes_batch=gtboxes_and_label)
File "../libs/networks/build_whole_network.py", line 390, in build_whole_detection_network
for level_name, p in zip(cfgs.LEVLES, P_list):
File "/home/litao/anaconda2/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 431, in iter
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.

关于demo运行结果没有检测框的问题

楼主,您好,我在运行您所编辑的inference时候出现了问题
我的操作如下
1.下载resnet50_v1 到pretrained_weights文件夹下
2.在home目录下下载了voc2007的数据集(传统voc数据)
3.compile,执行了您在cython_utils文件夹下的的python setup.py build_ext --inplace命令
4.之后,我cfgs_res50_fpn.py的内容复制到cfgs.py中
5.下载权重,放置在output的trained_weights文件夹下
6.运行Demo
python inference.py --data_dir='/home/rw/FPN_Tensorflow-master/tools/demos' --save_dir='/home/rw/FPN_Tensorflow-master/tools/inference_results/1' --GPU='0'

结果如下面所示

var_in_graph: resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean:0
var_in_ckpt: resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/BatchNorm/moving_mean



restore from pretrained_weighs in IMAGE_NET
2019-01-22 13:24:02.512718: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-22 13:24:02.824773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-22 13:24:02.825682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.733
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.53GiB
2019-01-22 13:24:02.825710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2019-01-22 13:24:08.674217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-22 13:24:08.674294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2019-01-22 13:24:08.674315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2019-01-22 13:24:08.681519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5294 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
restore model
/home/rw/FPN_Tensorflow-master/tools/demos/000144.jpg image cost 9.31491208076477s:[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]100% 1/1(tf1.10)

我查看我所在的result/1这个文件夹中的图,发现只是图片大小发生了改变,然而并没有产生gif动图中的检测框,肯定是我的操作问题,请问下楼主。上面我那个地方没有做对?
这没有得到我想要的检测框,希望您能帮助下我

您好,我有几个问题,希望解答。我想问一下我训练的结果是这样子的,这会是哪里出错了呢?

数据是这样的
q 5ybyc 40k3u n11y

但训练结果却是这样的,这是为啥?

图一,这是结果,我的xml文件画出图没问题。问题一:是不是制作tfrecord的时候出错?
1
图二
2
图三,损失图
3
图4,
4
图5:问题2:这个图是什么意思?
5
图6:
6
图7:
7
问题3:针对这种类型的数据,如果要更好的训练,需要将图6那里设置为自己的数据吗?图7的1,和2参数应该如何改?

问题4,这个代码我训练起来很慢,大概1000张图,单1080ti,走完150000需要20小时,这正常吗?有什么提速方法吗?

非常希望您能够回答我的问题,非常感谢!

您好,我在运行eval文件的时候报了一个很奇怪的错误。

我配置了image和annotation的路径,然后运行了py文件,具体的错误如下:
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid configuration argument
服务器配置是 cuda8+tf1.4

Skipping cancelled enqueue attempt with queue not closed

你好,我再重新跑一次这个代码的时候,在训练到9991次的时候,发现了这个错误,里面有一个警告提示是:Skipping cancelled enqueue attempt with queue not closed,紧接着后面就崩了。
default
@yangxue0827 请教一下大神,不知道这个问题该怎么解决的?

您好,想请问一下运行这个开源项目的顺序?

您好,我是第一次入手目标检测的项目,不是很懂里面的操作流程。按照您里面关于inference.py和eval.py的操作好像都报错了,想请问一下,是不是需要训练之后才能正确运行上述的两个py文件。然后,可以大致解释一下inference文件的作用么,是不是只是在原图里面画框?

版本问题

你好请问你使用的是tensorflow是哪一个版本,我在跑inforerence的时候出现了问题用的是1.5版的

采用mobilenetv2报错

我在cfgs.py中设了mobilenetv2作为特征提取网络,但是报错,请问你知道是什么原因吗
Traceback (most recent call last):
File "train.py", line 186, in
train()
File "train.py", line 48, in train
gtboxes_batch=gtboxes_and_label)
File "../libs/networks/build_whole_network.py", line 383, in build_whole_detection_network
for level_name, p in zip(cfgs.LEVLES, P_list):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 431, in iter
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.