robertcsordas / rfcn-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

291.0 14.0 137.0 89 KB

RFCN implementation in TensorFlow

Python 77.59% Makefile 1.27% C++ 15.64% C 5.50%

tensorflow deep-learning object-detection rfcn machine-learning

rfcn-tensorflow's Introduction

TensorFlow implementation of RFCN

Paper is available on https://arxiv.org/abs/1605.06409.

Building

The ROI pooling and the MS COCO loader needs to be compiled first. To do so, run make in the root directory of the project. You may need to edit BoxEngine/ROIPooling/Makefile if you need special linker/compiler options.

NOTE: If you have multiple python versions on your system, and you want to use a different one than "python", provide an environment variable called PYTHON before calling make. For example: PYTHON=python3 make

You may get undefined symbol problems while trying to load the .so file. This will be the case if you built your TensorFlow version yourself and the Makefile fails to auto-detect your ABI version. You may encounter errors like "tensorflow.python.framework.errors_impl.NotFoundError: BoxEngine/ROIPooling/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" in the log. In this case clean the project (make clean) and rebuild it with USE_OLD_EABI=0 flag (USE_OLD_EABI=0 make).

You may want to build ROI pooling without GPU support. Use the USE_GPU=0 flag to turn off the CUDA part of the code.

You may want to install python dependencies by running:

pip install --user -r packages.txt

Testing

You can run trained models with test.py. Model path should be given without file extension (without .data* and .index). An example:

Pretrained model

You can download a pretrained model from here:

http://xdever.engineerjs.com/rfcn-tensorflow-export.tar.bz2

Extract it to your project directory. Then you can run the network with the following command:

./test.py -n export/model -i <input image> -o <output image>

NOTE: this pretrained model was not hyperparameter-optimized in any way. The model can (and will) have much better performance when optimized. Try out different learning rates and classification to regression loss balances. Optimal values are highly test dependent.

Training the network

For training the network you will first need to download the MS COCO dataset. Download the needed files and extract them to a directory with the following structure:

<COCO>
├─  annotations
│    ├─  instances_train2014.json
│    └─  ...
|
├─  train2014
└─  ...

Run the following command: ./main.py -dataset <COCO> -name <savedir>

<COCO> - full path to the coco root directory
<savedir> - path where files will be saved. This directory and its subdirectories will be automatically created.

The <savedir> will have the following structure:

<savedir>
├─  preview
│    └─  preview.jpg - preview snapshots from training process.
|
├─  save - TensorFlow checkpoint directory
│    ├─  checkpoint
│    ├─  model_*.*
│    └─  ...
└─  args.json - saved command line arguments.

You can always kill the training process and resume it later just by running ./main.py -name <savedir> without any other parameters. All command line parameters will be saved and reloaded automatically.

License

The software is under Apache 2.0 license. See http://www.apache.org/licenses/LICENSE-2.0 for further details.

Notes

This code requires TensorFlow >=1.0 (last known working version is 1.4.1). Tested with python3.6, build it should work with python 2.

rfcn-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

jangkyung yulongpo andrei-pokrovsky xmyqsh zilongzhong sunxingxingtf ddkang embeddedsamurai jiakui kasyoukin statml sophiealex wanjinchang sunjieee benjamesbabala fireae templeblock lyk125 difoi baiyancheng20 chunfeima nikitos9000 zhanghaoinf jassonvia bwuzhang jsmilemsj yushanshan05 walkoncross chelovekhe zgsxwsdxg morusu obendidi hyzcn robopassion hajungong007 qinhuaping lichentsai eljefec simmoncn unyqhz davidvuong luoshanwei lzzzfelipe ibunny01 wjyyao xtanitfy deeprrl cvtower schperics transcendentsky fortisaqua horaccefeng dreadlord1984 sinianyutian machanic huipengzhang yu-jingrui wangjuenew jidebingfeng 1784266476 jianangao super-ljg jacke121 yamlong dawin2015 5059 myatmo fengyh3 xwzcwq youchaoqin liangxiaotian sunshinezhihuo grseb9s merlin2013 zoujuny aust-hansen jqcai kyle-nguyennn kyubeomlee123 senliuy tony32769 fendaq sumihui styjb tonygongjc zhouyonglong dr-zhuang gq124 fcinter hlesmqh breakend2010 maxiaoliang1989120 uwuneng zm66260 xingliujia cghzxlcq0201 hellogiantman1989 kwan-ywan zjulixin qsxymal

rfcn-tensorflow's Issues

Runtime TypeError in CPU version

Greetings,

Thank you for the code.

I am trying to compile a CPU-only version of the code and launch the ./test.py script. I removed the nvcc lines in BoxEngine/ROIPooling/Makefile and commented the GPU-related lines in the source code. The .so is compiled with no issues, but running the ./test.py yields the following error:

~/RFCN-tensorflow$ ./test.py -n export/model -i ./test_imgs/1.jpg -o o.jpg
Training network from end
('Anchors: ', [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]])
Traceback (most recent call last):
  File "./test.py", line 51, in <module>
    net = BoxInceptionResnet(image, len(categories), name="boxnet")
  File "/home/username/RFCN-tensorflow/BoxInceptionResnet.py", line 58, in __init__
    BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
  File "/home/username/RFCN-tensorflow/BoxEngine/BoxNetwork.py", line 36, in __init__
    self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 230, in getPositiveOutputs
    boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 221, in filterOutputBoxes
    scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1753, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(fn1)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1659, in BuildCondBranch
    original_result = fn()
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 221, in <lambda>
    scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
  File "/home/username/RFCN-tensorflow/Utils/MultiGather.py", line 30, in gatherTopK
    values, indices = tf.cond(isMoreThanK, lambda: tf.nn.top_k(t, k=k, sorted=sorted), lambda: tf.tuple([t, tf.zeros((0,1), tf.int32)]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1777, in cond
    "Incompatible return types of fn1 and fn2: {}".format(e))
TypeError: Incompatible return types of fn1 and fn2: The two structures don't have the same sequence type. First structure has type <class 'tensorflow.python.ops.gen_nn_ops.TopKV2'>, while second structure has type <type 'list'>.

Could you possibly suggest what could be the problem?
I am using Ubuntu 16.04, python 2.7, tensorflow 1.1.0.

Error while running demo file -> KeyError: 'Repeat_1'

Hi @xdever ,
while running demo test.py, I am getting below error. I am not able to understand what exactly is the problem.

anand@aicenter001:~/Desktop/anand/RFCN-tensorflow$ ./test.py -n pretrained/export/model.data-00000-of-00001 -i PEDS-CityTrafficB.jpg -o PEDS-CityTrafficB1.jpg

Training network from end
Traceback (most recent call last):
File "./test.py", line 51, in
net = BoxInceptionResnet(image, len(categories), name="boxnet")
File "/home/anand/Desktop/anand/RFCN-tensorflow/BoxInceptionResnet.py", line 43, in init
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
File "/home/anand/Desktop/anand/RFCN-tensorflow/InceptionResnetV2.py", line 277, in getOutput
return self.endPoints[name]
KeyError: 'Repeat_1'

Could you please help me to know the possible reason and how can I fix this.

Thank you.

I want to know how to train on my own dataset?I didn't see the code for train

How to do fine tune

I want to do logo detection. So I just want to fine tune the model provided http://xdever.engineerjs.com/rfcn-tensorflow-export.tar.bz2. In the main.py, I can set the parameter trainFrom to train from a layer; but how can I train the layers based on the model I have?

Extract coordinates of the each boundary box

Hello,
Can I retrieve the coordinates (x, y, width, height) of the boundary box for each object and save them in text file? if yes so how can I extract them.

Thank you

I have some problems,can you tell me why?

Training network from end
Traceback (most recent call last):
File "test.py", line 51, in
net = BoxInceptionResnet(image, len(categories), name="boxnet")
File "/home/vision/jsw/tf/RFCN-tensorflow-master/BoxInceptionResnet.py", line 43, in init
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
File "/home/vision/jsw/tf/RFCN-tensorflow-master/InceptionResnetV2.py", line 275, in getOutput
return self.endPoints[name]
KeyError: 'Repeat_1'

How to Log?

I see nothing in the save/log?
and when I see the timeline.json, ROI pooling is in the CPU?

Helping! Why we need add GT boxes when calculating refinement loss??

#Add GT boxes
posBoxes = tf.concat([posBoxes,refBoxes], 0)
posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)

How to understanding offset?

I find that RPN and PSROIPooling all have offset and you set it to 32. I can't understand it?what's its use and where it comes from?

Incompatible shapes: [8873,1] vs. [9432,2]

I'm using TF 1.13, would be glad to use 1.4, but it depends on CUDA 8, and I'm already on CUDA 10, with other my project are depending on it. So, going back to TF 1.4 / CUDA 8 is not an option, sorry.

Once I have ignored the error mentioned in the issue #40, the next show stopper is a message about incompatible types. Having tensor sizes, I could probably have searched through the code and fix it, but the problem is, the tensor size and dimension change every time. It could be a plain vector with 25k versus 45k values, or a 2-d matrix with approximately [XXXX, 1] vs. [YYYY, 2], where YYYY is always slightly larger than XXXX, and both could have values from 7k to 20k.

Here are the outputs from the subsequent runs, nothing changed in the data, I just use rm -rf saved; ./model -dataset COCO, and the numbers keep changing. Could anyone explain what's the reason and how to fix this, please ?

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 25896 values, but the requested shape has 44928
	 [[node optimizer/gradients/boxnet/Box/BoxNetwork/RPN/mergeBoxData/Reshape_grad/Reshape (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8873,1] vs. [9432,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12270,1] vs. [12948,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8280,1] vs. [9988,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7359,1] vs. [8320,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [17058,1] vs. [18868,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

Helping! error when run test.py

when I run test.py ecounter this:

liyirong@liyirong-Alienware-17-R3:~/RFCN-tensorflow$ ./test.py
/home/liyirong/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Training network from end
Anchors: [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]]
Traceback (most recent call last):
File "./test.py", line 56, in
input = PreviewIO.PreviewInput(opt.i)
File "/home/liyirong/RFCN-tensorflow/Utils/PreviewIO.py", line 17, in init
if os.path.isdir(self.path):
File "/home/liyirong/anaconda3/lib/python3.6/genericpath.py", line 42, in isdir
st = os.stat(s)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I can't figure out ,Can anyone help?

Errors for training!

Hello Xdever!
The test.py is ok, but when I tried the main.py, there is errors
I download MS COCO 2014, and extracted it! Then I use the following command:
python main.py -dataset /home/hfl/RFCN-tensorflow-master-test1/COCO -name /home/hfl/RFCN-tensorflow-master-test1/export2
Then the errors like this:

WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam_1" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam_1" not found in file to load.
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_mean
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean
WARNING: Unused variable: global_step
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta
Done.
BoxLoader: Loaded 123287 files.
2018-03-16 21:19:23.107145: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107192: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107238: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107249: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "main.py", line 157, in
res = runManager.modRun(i)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 97, in modRun
return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 71, in runAndMerge
res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape', defined at:
File "main.py", line 119, in
trainOp=createUpdateOp()
File "main.py", line 106, in createUpdateOp
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
return grad_fn() # Exit early
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ReshapeGrad
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3903, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2', defined at:
File "main.py", line 99, in
tf.losses.add_loss(net.getLoss(boxes, classes))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/BoxNetwork.py", line 50, in getLoss
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in loss
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2018, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch
original_result = fn()
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 163, in calcLoss
positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 142, in calcAllLosses
classificationLoss = tf.nn.softmax_cross_entropy_with_logits(logits=scores, labels=refScores)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1960, in softmax_cross_entropy_with_logits
labels=labels, logits=logits, dim=dim, name=name)

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

No result for score lower than 0.5

Hi, I set the test threshold to 0.01, but there're only bounding boxes with scores higher than 0.5. Can I get results with scores lower than 0.5?
Thank you so much.

tensorflow 1.7.0 jtensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 8218 values, but the requested shape has 9988

Done.
BoxLoader: Loaded 123287 files.
2018-07-09 01:22:09.707764: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707820: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707840: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707848: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 8218 values, but the requested shape has 9988
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/imgCoordinatesToHeatmapCoordinates/Cast/_1977 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5800_...nates/Cast", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

how to compute val loss during training

hi, how can I add the computation of validation loss during training

Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./inception_resnet_v2_2016_08_30.ckpt

TF 1.13 says: "The two structures don't have the same nested structure."

Inference works perfectly with the pre-trained model.

However, when I try to train, I get the following error. Looks like things have changed from TF1.4 to TF 1.13 and I'd be glad if you tell how to fix this?

Traceback (most recent call last):
  File "./main.py", line 118, in <module>
    trainOp=createUpdateOp()
  File "./main.py", line 105, in createUpdateOp
    grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 664, in gradients
    unconnected_gradients)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 420, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_grad.py", line 84, in _SwitchGrad
    return merge(grad, name="cond_grad")[0], None
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 406, in merge
    nest.assert_same_structure(inputs[0], v, expand_composites=True)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 249, in assert_same_structure
    % (str(e), str1, str2))
ValueError: The two structures don't have the same nested structure.

First structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd/Switch_grad/cond_grad/range:0", shape=(?,), dtype=int32), values=Tensor("optimizer/gradients/zeros_5:0", shape=(?,), dtype=float32), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd/Switch_grad/cond_grad/Shape:0", shape=(1,), dtype=int32))

Second structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd_grad/Squeeze:0", shape=(?,), dtype=int32), values=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/cond_grad/Switch_1:1", shape=(?,), dtype=int32), values=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/tuple/control_dependency_1:0", shape=(?,), dtype=float32), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/cond_grad/Switch_2:1", shape=(1,), dtype=int32)), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd_grad/Shape:0", shape=(1,), dtype=int32))

Need help badly!!!!!!!!!!!!!

hi xdever,
I'm all new to the deeplearning and tensorflow.So i guess my question may seem very stupid to you.
Well,I don't know the meaning of this sentence"Model path should be given without file extension (without .data* and .index)."Dose it mean that i need my own trained model first before run the test.py????
Because when i run test.py, it will fail and says that "path expected str, bytes or os.PathLike object, not NoneType".
Anyway, i just want to run the rfcn to get some result first,what should i do now?
wish you can answer me quickly.

anchors .. how to choose anchors for custom dataset ?

Hello,
I've trained the RFCN on my own dataset (4K images 💯 ) to detect some custom object , the results are not half bad , but I need more precision , so I was wondering if the selection of anchors plays an important role in the final precision , and if so what is the method to choose anchors ?

Thank you !

Result on coco

Hello,
Do you test your code on coco test or coco val dataset? What performance can you get?

cannot open shared object file

Traceback (most recent call last):
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/ROIPoolingWrapper.py", line 21, in
roiPoolingModule = tf.load_op_library("BoxEngine/ROIPooling/roi_pooling.so")
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: BoxEngine/ROIPooling/roi_pooling.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 24, in
from BoxInceptionResnet import BoxInceptionResnet
File "/home/vvk/Music/RFCN-tensorflow-master/BoxInceptionResnet.py", line 23, in
from BoxEngine.BoxNetwork import BoxNetwork
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/init.py", line 1, in
from BoxEngine.BoxNetwork import *
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/BoxNetwork.py", line 17, in
from BoxEngine.BoxRefinementNetwork import BoxRefinementNetwork
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/BoxRefinementNetwork.py", line 19, in
from BoxEngine.ROIPooling import positionSensitiveRoiPooling
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/init.py", line 2, in
from .ROIPoolingWrapper import *
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/ROIPoolingWrapper.py", line 23, in
roiPoolingModule = tf.load_op_library("./roi_pooling.so")
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./roi_pooling.so: cannot open shared object file: No such file or directory

why too few results?

although I lower down the confidence threshold, there are only few results. why is it so? I need the less confident detections to compute ave. precision etc.

Training Param

You say that you change the trainFrom after 100k-200k iters. I train my model for 110k iters and then I shut dowm it by Ctrl+C. Then I change trainFrom in the main.py, it seems that I cannot change params and trainFrom is stil -1.How can I change the trainFrom?
And I use MobileNet instead of InceptionResnet, the output channels of the MobileNet is 1024 instead of 1536.Should I change it to 1536?
And I donnot know how to get log to use tensorboard. I see the Utils/Summary.py. How should I use it?
I am very sorry to bother you.But I sincerely hope your help.Thanks!

Image for readme

Where is the file pycocotools/_mask.py

Excuse me , How can I import Dataset.coco.pycocotools._mask as _mask in the file
RFCN-tensorflow/Dataset/coco/pycocotools/mask.py

train own vehicle dataset

After my training(only vehicle ang background), the mAP was just only 0.773. When i training the dataset, the params unchanged, except the categories(set 1) and used the pretrained model(inception_resnet_v2_2016_08_30.ckpt). Can you tell me, that's why???

evaluate model

Hello, please is there a code to evaluate a trained model using coco metrics?

Validation results on cocoval 2014 data set were poor, with Map less than 10%.

Hello, I would like to ask how many iterations have taken place in the training model you provided. Has it undergone hyper-parameter optimization? After I trained it for 250,000 iterations, why is there only about 8% of Map on Cocoval 2014?

Windows version

Could you please provide the windows version?

Tensorflow 2

Hello,

I am a beginner who trying to run this on tensorflow. Is this model work on tensorflow2? If not, are there anywhere I can find the implementation on tensorflow2?

All the best,
Than

how can we adapt nonlocal vars to python 2.7?

in InceptionResnetV2.py, there are two nonlocal variables called trainBnEntered and currBlock. In python 2.7, the keyword nonlocal is not defined. when I comment them out it seems to get problematic to finetune earlier layers. so how should we fix the code to make it suitable to python2.7?

Multi-GPUs training

Hello Xdever! I expect to train the network by multi-gpus, but in the version, it seems that when I open several GPUs visible such as GPU0,1, the result is that the training is not accelerated while two GPUs are fully occupied.
How to achieve multi-gpus training function? Thanks.

ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory

I'm using the provided pre-trained model mentioned in the README but Tensorflow doesn't seem to be happy.

>> python test.py -n ./Models/example1/ -i TestImages/71d002a9-5e0d-4e91-844e-0f85ce18323d.jpg -o test.jpg
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Training network from end
('Anchors: ', [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]])
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:04:00.0
Total memory: 7.92GiB
Free memory: 7.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:04:00.0)
Resuming ./Models/example1/
Traceback (most recent call last):
  File "test.py", line 80, in <module>
    if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
  File "xxx/RFCN-tensorflow/Utils/CheckpointLoader.py", line 67, in loadCheckpoint
    varsToRead, loadedVars = getCheckpointVarList(last)
  File "xxx/RFCN-tensorflow/Utils/CheckpointLoader.py", line 20, in getCheckpointVarList
    reader = tf.contrib.framework.load_checkpoint(file)
  File "xxx/rfcn-tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/framework/checkpoint_utils.py", line 62, in load_checkpoint
    "given directory %s" % filepattern)
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./Models/example1/

These are my pip dependencies:

backports.weakref==1.0rc1
bleach==1.5.0
Cython==0.26.1
funcsigs==1.0.2
html5lib==0.9999999
Markdown==2.2.0
mock==2.0.0
numpy==1.13.1
opencv-python==3.3.0.10
pbr==3.1.1
protobuf==3.4.0
six==1.11.0
tensorflow-gpu==1.0.0
tensorflow-tensorboard==0.1.6
Werkzeug==0.12.2

and I'm running on Ubuntu 16.04. Any ideas why this might be the case? I suspect it has something to do with the model.

NMS in GPU

I find NMS operations is in CPU.Is there any way to switch to GPU?

why is not run in cuda9

some Error When train with coco2014 : InvalidArgumentError (see above for traceback): Incompatible shapes: [9645,1] vs. [9988,2]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [9645,1] vs. [9988,2]
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/ExpandDims, RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg:1)]]
[[Node: cond/getRefinementLoss/cond/getPosLoss/classRefinementLoss/getBoxScores/roiMean/positionSensitiveRoiPooling/imgCoordinatesToHeatmapCoordinates/Cast/_1923 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4910_...nates/Cast", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

the problem has been bothering me for a long time,thanks for any help

how to train my own data?????????????

where is the steps????????

Incompatible return types of true_fn and false_fn

Dataset/coco2014
loading annotations into memory...
Done (t=12.48s)
creating index...
index created!
Loaded 82783 COCO images
Dataset/coco2014
loading annotations into memory...
Done (t=8.34s)
creating index...
index created!
Loaded 40504 COCO images
Traceback (most recent call last):
File "./main.py", line 89, in
images, boxes, classes = Augment.augment(*dataset.get())
File "/home/hp03/liyahui/runed_ok_RFCN-tensorflow/Dataset/Augment.py", line 59, in augment
image, boxes = mirror(image, boxes)
File "/home/hp03/liyahui/runed_ok_RFCN-tensorflow/Dataset/Augment.py", line 52, in mirror
return tf.cond(uniform_random < 0.5, lambda: tf.tuple([image, boxes]), lambda: doMirror(image, boxes))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1843, in cond
"Incompatible return types of true_fn and false_fn: {}".format(e))
TypeError: Incompatible return types of true_fn and false_fn: The two structures don't have the same sequence type. First structure has type <type 'list'>, while second structure has type <type 'tuple'>.

please tell me what should i do? thanks!

batch_size larger than 1

How can the repo most easily be modified to have minibatch sizes larger than 1 image? I am willing to put some time into it, but need a little direction.

Error make

When I run make, I obtained this error

make -C BoxEngine/ROIPooling all
make[1]: Entering directory '/content/drive/My Drive/RFCN-tensorflow/BoxEngine/ROIPooling'
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -c -o roi_pooling.o roi_pooling.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -c -o roi_pooling_grad.o roi_pooling_grad.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2
/usr/local/cuda/bin/nvcc -std=c++11 -c -o roi_pooling.cu.o roi_pooling.cu.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_37 -use_fast_math
/usr/local/cuda/bin/nvcc -std=c++11 -c -o roi_pooling_grad.cu.o roi_pooling_grad.cu.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_37 -use_fast_math
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -shared -o roi_pooling.so roi_pooling.o roi_pooling_grad.o roi_pooling.cu.o roi_pooling_grad.cu.o -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2 -L /usr/local/cuda/targets/x86_64-linux/lib -L /usr/local/cuda/lib64/ -L /usr/local/cuda/extras/CUPTI/lib64/ -lcudart -L /usr/local/lib/python3.6/dist-packages/tensorflow_core -ltensorflow_framework
/usr/bin/ld: cannot find -ltensorflow_framework
collect2: error: ld returned 1 exit status
Makefile:32: recipe for target 'roi_pooling.so' failed
make[1]: *** [roi_pooling.so] Error 1
make[1]: Leaving directory '/content/drive/My Drive/RFCN-tensorflow/BoxEngine/ROIPooling'
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 2

It should be USE_OLD_EABI=1, instead of USE_OLD_EABI=0

you can find in the Makefile:
ifeq (${USE_OLD_EABI}, 1)
EABIFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0

What is the profile argument

What does the profile argument means ?
It is the break condition for the training loop and I didn't understand it

ohem

Have you tried ohem. I have tried and found it seems have no effective. I have tried with several different parameters. I don't know it was my code error or it indeed have no effective.

What is purpose of offset?

in
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/RPN.py#L54
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/BoxRefinementNetwork.py#L47
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/ROIPooling/ROIPoolingWrapper.py#L31

You pass a mystery offset=[32,32], what is the intention of this offset?

I got this problem, when run network.

@xdever
hi, I got this problem.Can you tell me how to close queue?

W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
*** Error in `python': free(): invalid pointer: 0x0000000001361c80 ***

TypeError while running test.py

I run the command below
CUDA_VISIBLE_DEVICES= python test.py -n /data1/export/model -i /data/coco/images/train2014/COCO_train2014_000000144590.jpg -o result.jpg

get TypeError
TypeError: Incompatible return types of true_fn and false_fn: The two structures don't have the same sequence type. First structure has type <class 'tensorflow.python.ops.gen_nn_ops.TopKV2'>, while second structure has type <type 'list'>.

How solve this problem?

make error in OS X - Darwin Kernel Version 16.3.0 RELEASE_X86_64 x86_64

Undefined symbols for architecture x86_64:
"tensorflow::DEVICE_CPU", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::TensorShape::DestructorOutOfLine()", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::TensorShape::TensorShape(tensorflow::gtl::ArraySlice)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper const&)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::Input(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::Output(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::OpDefBuilder(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::kernel_factory::OpKernelRegistrar::InitInternal(tensorflow::KernelDef const*, tensorflow::StringPiece, tensorflow::OpKernel* ()(tensorflow::OpKernelConstruction))", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpKernelContext::CtxFailure(tensorflow::Status)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::allocate_output(int, tensorflow::TensorShape const&, tensorflow::Tensor**)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::CtxFailureWithWarning(tensorflow::Status)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::input(int)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::KernelDefBuilder::Device(char const*)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::KernelDefBuilder::KernelDefBuilder(char const*)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDef::~OpDef()", referenced from:
tensorflow::OpDefBuilder::~OpDefBuilder() in roi_pooling.o
tensorflow::OpDefBuilder::~OpDefBuilder() in roi_pooling_grad.o
"tensorflow::Status::Status(tensorflow::error::Code, tensorflow::StringPiece)", referenced from:
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling.o
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling_grad.o
"tensorflow::strings::StrCat(tensorflow::strings::AlphaNum const&)", referenced from:
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling.o
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling_grad.o
"tensorflow::OpKernel::OpKernel(tensorflow::OpKernelConstruction*)", referenced from:
$_0::__invoke(tensorflow::OpKernelConstruction*) in roi_pooling.o
$_0::__invoke(tensorflow::OpKernelConstruction*) in roi_pooling_grad.o
"tensorflow::OpKernel::~OpKernel()", referenced from:
PosRoiPoolingCpu::~PosRoiPoolingCpu() in roi_pooling.o
PosRoiPoolingCpu::~PosRoiPoolingCpu() in roi_pooling.o
PosRoiPoolingCpuGrad::~PosRoiPoolingCpuGrad() in roi_pooling_grad.o
PosRoiPoolingCpuGrad::~PosRoiPoolingCpuGrad() in roi_pooling_grad.o
"tensorflow::internal::LogMessageFatal::LogMessageFatal(char const*, int)", referenced from:
tensorflow::TensorShape::dims() const in roi_pooling.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling.o
tensorflow::TensorShape::dims() const in roi_pooling_grad.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling_grad.o
"tensorflow::internal::LogMessageFatal::~LogMessageFatal()", referenced from:
tensorflow::TensorShape::dims() const in roi_pooling.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling.o
tensorflow::TensorShape::dims() const in roi_pooling_grad.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling_grad.o
"tensorflow::TensorShape::dim_size(int) const", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::Tensor::tensor_data() const", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"typeinfo for tensorflow::OpKernel", referenced from:
typeinfo for PosRoiPoolingCpu in roi_pooling.o
typeinfo for PosRoiPoolingCpuGrad in roi_pooling_grad.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

custom op roi_pooling.so error with tf 1.4

When running the code (thank you!) the tf.load_op_library in ROIPoolingWrapper.py yields the error, "roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE".

It loads in tf 1.3 but other issues exist (that are fixed in tf 1.4 exist) that don't allow the code to run.