robertcsordas / rfcn-tensorflow Goto Github PK

RFCN implementation in TensorFlow

Python 77.59% Makefile 1.27% C++ 15.64% C 5.50%

tensorflow deep-learning object-detection rfcn machine-learning

rfcn-tensorflow's Issues

ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory

I'm using the provided pre-trained model mentioned in the README but Tensorflow doesn't seem to be happy.

>> python test.py -n ./Models/example1/ -i TestImages/71d002a9-5e0d-4e91-844e-0f85ce18323d.jpg -o test.jpg
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Training network from end
('Anchors: ', [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]])
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:04:00.0
Total memory: 7.92GiB
Free memory: 7.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:04:00.0)
Resuming ./Models/example1/
Traceback (most recent call last):
  File "test.py", line 80, in <module>
    if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
  File "xxx/RFCN-tensorflow/Utils/CheckpointLoader.py", line 67, in loadCheckpoint
    varsToRead, loadedVars = getCheckpointVarList(last)
  File "xxx/RFCN-tensorflow/Utils/CheckpointLoader.py", line 20, in getCheckpointVarList
    reader = tf.contrib.framework.load_checkpoint(file)
  File "xxx/rfcn-tensorflow/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/framework/checkpoint_utils.py", line 62, in load_checkpoint
    "given directory %s" % filepattern)
ValueError: Couldn't find 'checkpoint' file or checkpoints in given directory ./Models/example1/

These are my pip dependencies:

backports.weakref==1.0rc1
bleach==1.5.0
Cython==0.26.1
funcsigs==1.0.2
html5lib==0.9999999
Markdown==2.2.0
mock==2.0.0
numpy==1.13.1
opencv-python==3.3.0.10
pbr==3.1.1
protobuf==3.4.0
six==1.11.0
tensorflow-gpu==1.0.0
tensorflow-tensorboard==0.1.6
Werkzeug==0.12.2

and I'm running on Ubuntu 16.04. Any ideas why this might be the case? I suspect it has something to do with the model.

Windows version

Could you please provide the windows version?

Hello Xdever!
The test.py is ok, but when I tried the main.py, there is errors
I download MS COCO 2014, and extracted it! Then I use the following command:
python main.py -dataset /home/hfl/RFCN-tensorflow-master-test1/COCO -name /home/hfl/RFCN-tensorflow-master-test1/export2
Then the errors like this:

WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam_1" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam" not found in file to load.
WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam_1" not found in file to load.
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/biases
WARNING: Unused variable: InceptionResnetV2/Logits/Logits/weights
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_mean
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean
WARNING: Unused variable: global_step
WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta
Done.
BoxLoader: Loaded 123287 files.
2018-03-16 21:19:23.107145: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107192: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107238: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-03-16 21:19:23.107249: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "main.py", line 157, in
res = runManager.modRun(i)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 97, in modRun
return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata)
File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 71, in runAndMerge
res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape', defined at:
File "main.py", line 119, in
trainOp=createUpdateOp()
File "main.py", line 106, in createUpdateOp
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile
return grad_fn() # Exit early
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in
lambda: grad_fn(op, *out_grads))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ReshapeGrad
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3903, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2', defined at:
File "main.py", line 99, in
tf.losses.add_loss(net.getLoss(boxes, classes))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/BoxNetwork.py", line 50, in getLoss
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in loss
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2018, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch
original_result = fn()
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in
return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 163, in calcLoss
positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes)
File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 142, in calcAllLosses
classificationLoss = tf.nn.softmax_cross_entropy_with_logits(logits=scores, labels=refScores)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1960, in softmax_cross_entropy_with_logits
labels=labels, logits=logits, dim=dim, name=name)

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 9489 values, but the requested shape has 12356
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6985_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Result on coco

Hello,
Do you test your code on coco test or coco val dataset? What performance can you get?

why too few results?

although I lower down the confidence threshold, there are only few results. why is it so? I need the less confident detections to compute ave. precision etc.

Incompatible return types of true_fn and false_fn

Dataset/coco2014
loading annotations into memory...
Done (t=12.48s)
creating index...
index created!
Loaded 82783 COCO images
Dataset/coco2014
loading annotations into memory...
Done (t=8.34s)
creating index...
index created!
Loaded 40504 COCO images
Traceback (most recent call last):
File "./main.py", line 89, in
images, boxes, classes = Augment.augment(*dataset.get())
File "/home/hp03/liyahui/runed_ok_RFCN-tensorflow/Dataset/Augment.py", line 59, in augment
image, boxes = mirror(image, boxes)
File "/home/hp03/liyahui/runed_ok_RFCN-tensorflow/Dataset/Augment.py", line 52, in mirror
return tf.cond(uniform_random < 0.5, lambda: tf.tuple([image, boxes]), lambda: doMirror(image, boxes))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1843, in cond
"Incompatible return types of true_fn and false_fn: {}".format(e))
TypeError: Incompatible return types of true_fn and false_fn: The two structures don't have the same sequence type. First structure has type <type 'list'>, while second structure has type <type 'tuple'>.

please tell me what should i do? thanks!

how to compute val loss during training

hi, how can I add the computation of validation loss during training

Runtime TypeError in CPU version

Greetings,

Thank you for the code.

I am trying to compile a CPU-only version of the code and launch the ./test.py script. I removed the nvcc lines in BoxEngine/ROIPooling/Makefile and commented the GPU-related lines in the source code. The .so is compiled with no issues, but running the ./test.py yields the following error:

~/RFCN-tensorflow$ ./test.py -n export/model -i ./test_imgs/1.jpg -o o.jpg
Training network from end
('Anchors: ', [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]])
Traceback (most recent call last):
  File "./test.py", line 51, in <module>
    net = BoxInceptionResnet(image, len(categories), name="boxnet")
  File "/home/username/RFCN-tensorflow/BoxInceptionResnet.py", line 58, in __init__
    BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
  File "/home/username/RFCN-tensorflow/BoxEngine/BoxNetwork.py", line 36, in __init__
    self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 230, in getPositiveOutputs
    boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 221, in filterOutputBoxes
    scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1753, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(fn1)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1659, in BuildCondBranch
    original_result = fn()
  File "/home/username/RFCN-tensorflow/BoxEngine/RPN.py", line 221, in <lambda>
    scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
  File "/home/username/RFCN-tensorflow/Utils/MultiGather.py", line 30, in gatherTopK
    values, indices = tf.cond(isMoreThanK, lambda: tf.nn.top_k(t, k=k, sorted=sorted), lambda: tf.tuple([t, tf.zeros((0,1), tf.int32)]))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1777, in cond
    "Incompatible return types of fn1 and fn2: {}".format(e))
TypeError: Incompatible return types of fn1 and fn2: The two structures don't have the same sequence type. First structure has type <class 'tensorflow.python.ops.gen_nn_ops.TopKV2'>, while second structure has type <type 'list'>.

Could you possibly suggest what could be the problem?
I am using Ubuntu 16.04, python 2.7, tensorflow 1.1.0.

how to train my own data?????????????

where is the steps????????

custom op roi_pooling.so error with tf 1.4

When running the code (thank you!) the tf.load_op_library in ROIPoolingWrapper.py yields the error, "roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE".

It loads in tf 1.3 but other issues exist (that are fixed in tf 1.4 exist) that don't allow the code to run.

Extract coordinates of the each boundary box

Hello,
Can I retrieve the coordinates (x, y, width, height) of the boundary box for each object and save them in text file? if yes so how can I extract them.

Thank you

why is not run in cuda9

I have some problems,can you tell me why?

Training network from end
Traceback (most recent call last):
File "test.py", line 51, in
net = BoxInceptionResnet(image, len(categories), name="boxnet")
File "/home/vision/jsw/tf/RFCN-tensorflow-master/BoxInceptionResnet.py", line 43, in init
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
File "/home/vision/jsw/tf/RFCN-tensorflow-master/InceptionResnetV2.py", line 275, in getOutput
return self.endPoints[name]
KeyError: 'Repeat_1'

make error in OS X - Darwin Kernel Version 16.3.0 RELEASE_X86_64 x86_64

Undefined symbols for architecture x86_64:
"tensorflow::DEVICE_CPU", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::TensorShape::DestructorOutOfLine()", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::TensorShape::TensorShape(tensorflow::gtl::ArraySlice)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper const&)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::Input(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::Output(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDefBuilder::OpDefBuilder(tensorflow::StringPiece)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::kernel_factory::OpKernelRegistrar::InitInternal(tensorflow::KernelDef const*, tensorflow::StringPiece, tensorflow::OpKernel* ()(tensorflow::OpKernelConstruction))", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpKernelContext::CtxFailure(tensorflow::Status)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::allocate_output(int, tensorflow::TensorShape const&, tensorflow::Tensor**)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::CtxFailureWithWarning(tensorflow::Status)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::OpKernelContext::input(int)", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::KernelDefBuilder::Device(char const*)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::KernelDefBuilder::KernelDefBuilder(char const*)", referenced from:
__GLOBAL__sub_I_roi_pooling.cc in roi_pooling.o
__GLOBAL__sub_I_roi_pooling_grad.cc in roi_pooling_grad.o
"tensorflow::OpDef::~OpDef()", referenced from:
tensorflow::OpDefBuilder::~OpDefBuilder() in roi_pooling.o
tensorflow::OpDefBuilder::~OpDefBuilder() in roi_pooling_grad.o
"tensorflow::Status::Status(tensorflow::error::Code, tensorflow::StringPiece)", referenced from:
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling.o
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling_grad.o
"tensorflow::strings::StrCat(tensorflow::strings::AlphaNum const&)", referenced from:
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling.o
tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*) in roi_pooling_grad.o
"tensorflow::OpKernel::OpKernel(tensorflow::OpKernelConstruction*)", referenced from:
$_0::__invoke(tensorflow::OpKernelConstruction*) in roi_pooling.o
$_0::__invoke(tensorflow::OpKernelConstruction*) in roi_pooling_grad.o
"tensorflow::OpKernel::~OpKernel()", referenced from:
PosRoiPoolingCpu::~PosRoiPoolingCpu() in roi_pooling.o
PosRoiPoolingCpu::~PosRoiPoolingCpu() in roi_pooling.o
PosRoiPoolingCpuGrad::~PosRoiPoolingCpuGrad() in roi_pooling_grad.o
PosRoiPoolingCpuGrad::~PosRoiPoolingCpuGrad() in roi_pooling_grad.o
"tensorflow::internal::LogMessageFatal::LogMessageFatal(char const*, int)", referenced from:
tensorflow::TensorShape::dims() const in roi_pooling.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling.o
tensorflow::TensorShape::dims() const in roi_pooling_grad.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling_grad.o
"tensorflow::internal::LogMessageFatal::~LogMessageFatal()", referenced from:
tensorflow::TensorShape::dims() const in roi_pooling.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling.o
tensorflow::TensorShape::dims() const in roi_pooling_grad.o
tensorflow::KernelDefBuilder::~KernelDefBuilder() in roi_pooling_grad.o
"tensorflow::TensorShape::dim_size(int) const", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"tensorflow::Tensor::tensor_data() const", referenced from:
ComputePosRoiPooling(tensorflow::OpKernelContext*, bool) in roi_pooling.o
ComputePosRoiPoolingGrad(tensorflow::OpKernelContext*, bool) in roi_pooling_grad.o
"typeinfo for tensorflow::OpKernel", referenced from:
typeinfo for PosRoiPoolingCpu in roi_pooling.o
typeinfo for PosRoiPoolingCpuGrad in roi_pooling_grad.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

It should be USE_OLD_EABI=1, instead of USE_OLD_EABI=0

you can find in the Makefile:
ifeq (${USE_OLD_EABI}, 1)
EABIFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0

Validation results on cocoval 2014 data set were poor, with Map less than 10%.

Hello, I would like to ask how many iterations have taken place in the training model you provided. Has it undergone hyper-parameter optimization? After I trained it for 250,000 iterations, why is there only about 8% of Map on Cocoval 2014?

Image for readme

No result for score lower than 0.5

Hi, I set the test threshold to 0.01, but there're only bounding boxes with scores higher than 0.5. Can I get results with scores lower than 0.5?
Thank you so much.

anchors .. how to choose anchors for custom dataset ?

Hello,
I've trained the RFCN on my own dataset (4K images 💯 ) to detect some custom object , the results are not half bad , but I need more precision , so I was wondering if the selection of anchors plays an important role in the final precision , and if so what is the method to choose anchors ?

Thank you !

Error make

When I run make, I obtained this error

make -C BoxEngine/ROIPooling all
make[1]: Entering directory '/content/drive/My Drive/RFCN-tensorflow/BoxEngine/ROIPooling'
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -c -o roi_pooling.o roi_pooling.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -c -o roi_pooling_grad.o roi_pooling_grad.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2
/usr/local/cuda/bin/nvcc -std=c++11 -c -o roi_pooling.cu.o roi_pooling.cu.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_37 -use_fast_math
/usr/local/cuda/bin/nvcc -std=c++11 -c -o roi_pooling_grad.cu.o roi_pooling_grad.cu.cc -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_37 -use_fast_math
g++ -std=c++11 -O2 -D GOOGLE_CUDA=1 -shared -o roi_pooling.so roi_pooling.o roi_pooling_grad.o roi_pooling.cu.o roi_pooling_grad.cu.o -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include -I /usr/local/lib/python3.6/dist-packages/tensorflow_core/include/external/nsync/public -I /usr/local/cuda/include -fPIC -O2 -L /usr/local/cuda/targets/x86_64-linux/lib -L /usr/local/cuda/lib64/ -L /usr/local/cuda/extras/CUPTI/lib64/ -lcudart -L /usr/local/lib/python3.6/dist-packages/tensorflow_core -ltensorflow_framework
/usr/bin/ld: cannot find -ltensorflow_framework
collect2: error: ld returned 1 exit status
Makefile:32: recipe for target 'roi_pooling.so' failed
make[1]: *** [roi_pooling.so] Error 1
make[1]: Leaving directory '/content/drive/My Drive/RFCN-tensorflow/BoxEngine/ROIPooling'
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 2

cannot open shared object file

Traceback (most recent call last):
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/ROIPoolingWrapper.py", line 21, in
roiPoolingModule = tf.load_op_library("BoxEngine/ROIPooling/roi_pooling.so")
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: BoxEngine/ROIPooling/roi_pooling.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 24, in
from BoxInceptionResnet import BoxInceptionResnet
File "/home/vvk/Music/RFCN-tensorflow-master/BoxInceptionResnet.py", line 23, in
from BoxEngine.BoxNetwork import BoxNetwork
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/init.py", line 1, in
from BoxEngine.BoxNetwork import *
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/BoxNetwork.py", line 17, in
from BoxEngine.BoxRefinementNetwork import BoxRefinementNetwork
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/BoxRefinementNetwork.py", line 19, in
from BoxEngine.ROIPooling import positionSensitiveRoiPooling
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/init.py", line 2, in
from .ROIPoolingWrapper import *
File "/home/vvk/Music/RFCN-tensorflow-master/BoxEngine/ROIPooling/ROIPoolingWrapper.py", line 23, in
roiPoolingModule = tf.load_op_library("./roi_pooling.so")
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/vvk/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./roi_pooling.so: cannot open shared object file: No such file or directory

Incompatible shapes: [8873,1] vs. [9432,2]

I'm using TF 1.13, would be glad to use 1.4, but it depends on CUDA 8, and I'm already on CUDA 10, with other my project are depending on it. So, going back to TF 1.4 / CUDA 8 is not an option, sorry.

Once I have ignored the error mentioned in the issue #40, the next show stopper is a message about incompatible types. Having tensor sizes, I could probably have searched through the code and fix it, but the problem is, the tensor size and dimension change every time. It could be a plain vector with 25k versus 45k values, or a 2-d matrix with approximately [XXXX, 1] vs. [YYYY, 2], where YYYY is always slightly larger than XXXX, and both could have values from 7k to 20k.

Here are the outputs from the subsequent runs, nothing changed in the data, I just use rm -rf saved; ./model -dataset COCO, and the numbers keep changing. Could anyone explain what's the reason and how to fix this, please ?

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 25896 values, but the requested shape has 44928
	 [[node optimizer/gradients/boxnet/Box/BoxNetwork/RPN/mergeBoxData/Reshape_grad/Reshape (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8873,1] vs. [9432,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [12270,1] vs. [12948,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [8280,1] vs. [9988,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7359,1] vs. [8320,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [17058,1] vs. [18868,2]
	 [[node optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul (defined at ./main.py:105) ]]
	 [[optimizer/gradients/cond/getRefinementLoss/cond_1/getNetLoss/classRefinementLoss/getBoxScores/roiMean/Mean_grad/truediv/_2009]]

NMS in GPU

I find NMS operations is in CPU.Is there any way to switch to GPU?

Where is the file pycocotools/_mask.py

Excuse me , How can I import Dataset.coco.pycocotools._mask as _mask in the file
RFCN-tensorflow/Dataset/coco/pycocotools/mask.py

Error while running demo file -> KeyError: 'Repeat_1'

Hi @xdever ,
while running demo test.py, I am getting below error. I am not able to understand what exactly is the problem.

anand@aicenter001:~/Desktop/anand/RFCN-tensorflow$ ./test.py -n pretrained/export/model.data-00000-of-00001 -i PEDS-CityTrafficB.jpg -o PEDS-CityTrafficB1.jpg

Training network from end
Traceback (most recent call last):
File "./test.py", line 51, in
net = BoxInceptionResnet(image, len(categories), name="boxnet")
File "/home/anand/Desktop/anand/RFCN-tensorflow/BoxInceptionResnet.py", line 43, in init
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
File "/home/anand/Desktop/anand/RFCN-tensorflow/InceptionResnetV2.py", line 277, in getOutput
return self.endPoints[name]
KeyError: 'Repeat_1'

Could you please help me to know the possible reason and how can I fix this.

Thank you.

How to understanding offset?

I find that RPN and PSROIPooling all have offset and you set it to 32. I can't understand it?what's its use and where it comes from?

ohem

Have you tried ohem. I have tried and found it seems have no effective. I have tried with several different parameters. I don't know it was my code error or it indeed have no effective.

evaluate model

Hello, please is there a code to evaluate a trained model using coco metrics?

some Error When train with coco2014 : InvalidArgumentError (see above for traceback): Incompatible shapes: [9645,1] vs. [9988,2]

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [9645,1] vs. [9988,2]
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg_grad/ExpandDims, RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg:1)]]
[[Node: cond/getRefinementLoss/cond/getPosLoss/classRefinementLoss/getBoxScores/roiMean/positionSensitiveRoiPooling/imgCoordinatesToHeatmapCoordinates/Cast/_1923 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4910_...nates/Cast", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

the problem has been bothering me for a long time,thanks for any help

I want to know how to train on my own dataset?I didn't see the code for train

how can we adapt nonlocal vars to python 2.7?

in InceptionResnetV2.py, there are two nonlocal variables called trainBnEntered and currBlock. In python 2.7, the keyword nonlocal is not defined. when I comment them out it seems to get problematic to finetune earlier layers. so how should we fix the code to make it suitable to python2.7?

Training Param

You say that you change the trainFrom after 100k-200k iters. I train my model for 110k iters and then I shut dowm it by Ctrl+C. Then I change trainFrom in the main.py, it seems that I cannot change params and trainFrom is stil -1.How can I change the trainFrom?
And I use MobileNet instead of InceptionResnet, the output channels of the MobileNet is 1024 instead of 1536.Should I change it to 1536?
And I donnot know how to get log to use tensorboard. I see the Utils/Summary.py. How should I use it?
I am very sorry to bother you.But I sincerely hope your help.Thanks!

Tensorflow 2

Hello,

I am a beginner who trying to run this on tensorflow. Is this model work on tensorflow2? If not, are there anywhere I can find the implementation on tensorflow2?

All the best,
Than

Need help badly!!!!!!!!!!!!!

hi xdever,
I'm all new to the deeplearning and tensorflow.So i guess my question may seem very stupid to you.
Well,I don't know the meaning of this sentence"Model path should be given without file extension (without .data* and .index)."Dose it mean that i need my own trained model first before run the test.py????
Because when i run test.py, it will fail and says that "path expected str, bytes or os.PathLike object, not NoneType".
Anyway, i just want to run the rfcn to get some result first,what should i do now?
wish you can answer me quickly.

batch_size larger than 1

How can the repo most easily be modified to have minibatch sizes larger than 1 image? I am willing to put some time into it, but need a little direction.

What is purpose of offset?

in
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/RPN.py#L54
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/BoxRefinementNetwork.py#L47
https://github.com/xdever/RFCN-tensorflow/blob/master/BoxEngine/ROIPooling/ROIPoolingWrapper.py#L31

You pass a mystery offset=[32,32], what is the intention of this offset?

TypeError while running test.py

I run the command below
CUDA_VISIBLE_DEVICES= python test.py -n /data1/export/model -i /data/coco/images/train2014/COCO_train2014_000000144590.jpg -o result.jpg

get TypeError
TypeError: Incompatible return types of true_fn and false_fn: The two structures don't have the same sequence type. First structure has type <class 'tensorflow.python.ops.gen_nn_ops.TopKV2'>, while second structure has type <type 'list'>.

How solve this problem?

Helping! Why we need add GT boxes when calculating refinement loss??

#Add GT boxes
posBoxes = tf.concat([posBoxes,refBoxes], 0)
posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)

Multi-GPUs training

Hello Xdever! I expect to train the network by multi-gpus, but in the version, it seems that when I open several GPUs visible such as GPU0,1, the result is that the training is not accelerated while two GPUs are fully occupied.
How to achieve multi-gpus training function? Thanks.

Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./inception_resnet_v2_2016_08_30.ckpt

tensorflow 1.7.0 jtensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 8218 values, but the requested shape has 9988

Done.
BoxLoader: Loaded 123287 files.
2018-07-09 01:22:09.707764: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707820: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707840: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2018-07-09 01:22:09.707848: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/home/bigtree/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 8218 values, but the requested shape has 9988
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/imgCoordinatesToHeatmapCoordinates/Cast/_1977 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5800_...nates/Cast", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

How to Log?

I see nothing in the save/log?
and when I see the timeline.json, ROI pooling is in the CPU?

train own vehicle dataset

After my training(only vehicle ang background), the mAP was just only 0.773. When i training the dataset, the params unchanged, except the categories(set 1) and used the pretrained model(inception_resnet_v2_2016_08_30.ckpt). Can you tell me, that's why???

How to do fine tune

I want to do logo detection. So I just want to fine tune the model provided http://xdever.engineerjs.com/rfcn-tensorflow-export.tar.bz2. In the main.py, I can set the parameter trainFrom to train from a layer; but how can I train the layers based on the model I have?

What is the profile argument

What does the profile argument means ?
It is the break condition for the training loop and I didn't understand it

TF 1.13 says: "The two structures don't have the same nested structure."

Inference works perfectly with the pre-trained model.

However, when I try to train, I get the following error. Looks like things have changed from TF1.4 to TF 1.13 and I'd be glad if you tell how to fix this?

Traceback (most recent call last):
  File "./main.py", line 118, in <module>
    trainOp=createUpdateOp()
  File "./main.py", line 105, in createUpdateOp
    grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 664, in gradients
    unconnected_gradients)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 420, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 965, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_grad.py", line 84, in _SwitchGrad
    return merge(grad, name="cond_grad")[0], None
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 406, in merge
    nest.assert_same_structure(inputs[0], v, expand_composites=True)
  File "/home/lenik/tfenv/local/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 249, in assert_same_structure
    % (str(e), str1, str2))
ValueError: The two structures don't have the same nested structure.

First structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd/Switch_grad/cond_grad/range:0", shape=(?,), dtype=int32), values=Tensor("optimizer/gradients/zeros_5:0", shape=(?,), dtype=float32), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd/Switch_grad/cond_grad/Shape:0", shape=(1,), dtype=int32))

Second structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd_grad/Squeeze:0", shape=(?,), dtype=int32), values=IndexedSlices(indices=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/cond_grad/Switch_1:1", shape=(?,), dtype=int32), values=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/tuple/control_dependency_1:0", shape=(?,), dtype=float32), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/Merge_grad/cond_grad/Switch_2:1", shape=(1,), dtype=int32)), dense_shape=Tensor("optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/cond_1/getNegativeLosses/GatherNd_grad/Shape:0", shape=(1,), dtype=int32))

Helping! error when run test.py

when I run test.py ecounter this:

liyirong@liyirong-Alienware-17-R3:~/RFCN-tensorflow$ ./test.py
/home/liyirong/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Training network from end
Anchors: [[64, 64], [90, 45], [45, 90], [128, 128], [181, 90], [90, 181], [256, 256], [362, 181], [181, 362], [512, 512], [724, 362], [362, 724]]
Traceback (most recent call last):
File "./test.py", line 56, in
input = PreviewIO.PreviewInput(opt.i)
File "/home/liyirong/RFCN-tensorflow/Utils/PreviewIO.py", line 17, in init
if os.path.isdir(self.path):
File "/home/liyirong/anaconda3/lib/python3.6/genericpath.py", line 42, in isdir
st = os.stat(s)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I can't figure out ,Can anyone help?

I got this problem, when run network.

@xdever
hi, I got this problem.Can you tell me how to close queue?

W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
*** Error in `python': free(): invalid pointer: 0x0000000001361c80 ***

robertcsordas / rfcn-tensorflow Goto Github PK

rfcn-tensorflow's Issues

Recommend Projects

Recommend Topics

Recommend Org