dengdan / seglink Goto Github PK

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

License: GNU General Public License v3.0

Python 98.01% Shell 1.99%

seglink's Introduction

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link

Contents:

Introduction

This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie

Installation&requirements

tensorflow-gpu 1.1.0
cv2. I'm using 2.4.9.1, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.
download the project pylib and add the src folder to your PYTHONPATH

If any other requirements unmet, just install them following the error msg.

Datasets

Convert them into tfrecords format using the scripts in datasets if you wanna train your own model.

Problems

The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.

Several reasons may contribute to the slow convergency of my model:

Batch size. I don't have 4 12G-Titans for training, as described in the paper. Instead, I trained my model on two 8G GeForce GTX 1080 or two Titans.
Learning Rate. In the paper, 10^-3 and 10^-4 have been used. But I adopted a fixed learning rate of 10^-4.
Different initialization model. I used the pretrained VGG model from SSD-caffe on coco , because I thought it better than VGG trained on ImageNet. However, it seems that my point of view does not hold. 4.Some other differences exists maybe, I am not sure.

Models

Two models trained on SynthText and IC15 train can be downloaded.

seglink-384. Trained using image size of 384x384, the same image size as the paper. The Hmean is comparable to the result reported in the paper.

The hust_orientedText is the result of paper.

seglink-512. Trainied using image size of 512x512, and one pointer better than 384x384.

They have been trained:

on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.
learning_rate = 10e-4
two gpus
384: GTX 1080, batch_size = 24; 512: Titan, batch_size = 20

Both models perform best at seg_conf_threshold=0.8 and link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.

Test Your own images

Use the script test_seglink.py, and a shortcut has been created in script test.sh:

Go to the seglink root directory and execute the command:


./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

For example:


./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867  ~/dataset/ICDAR2015/Challenge4/ch4_training_images

I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as DATASET_DIR.

A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.

The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script visualize_detection_result.py.

The command looks like:


python visualize_detection_result.py \

    --image=where your images are put

    --det=the directory of the text files output by test_seglink.py

    --output=the output directory of detection results drawn on images.

For example:


python visualize_detection_result.py \

    --image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \

    --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \
    --output=~/temp/no-use/seglink_result_512_train

Training and evaluation

The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the datasets directory. The scrips:train_seglink.py and eval_seglink.py are the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.

Some Comments

Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.

EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.

Contact me if you have any problems, through github issues.

Some Notes On Implementation Detail

How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg

seglink's People

Contributors

Stargazers

Watchers

Forkers

10183308 bowiehsu jdc08161063 baiyancheng20 fireae benjamesbabala shenggaozhu xshhhm labimage zgsxwsdxg ericustc whycoding126 gewenpulan guozhongluo chenmyzju sdsy888 ivansong1988 twinsyssy1018 wyw636 codingzzy terminats17 simmoncn chenfsjz inesbeltaief justrypython huichuanliu vividx zxdeepdiver lxj0276 cjt222 zenozhouzhao dchen187 lz20061213 zmxheart codebuddy049 fendaq zack6514 bysowhat jinshuihe liu-zhy qianfu1997 nightinwhite ocrbyyue yuckfu sabirdvd tobechao pustar nidetaoge faisal-w chaitusvk ajaycharan rkshuai hibive lss616263 cvtower dafeix meisonp bygreencn hcywork esmaeeleskandari dfrsg zhouzhenkun tonychouzju caiovmv jiachen0212 huxy29 fzulj miningmouse mingingmouse houkai airyym 3028913971 lzd0825 keng000 jullyz aronifanger yunwenhuang valmendil anazou zhangchaos smilewsw alexliyang qycgit anmolakhilesh lovaster moctiors weifeifan shun14 june505 lllhhhqqq delonzhou zhouleidcc handsomeboy iandmozart xikunlun001 altiplanogao-theirs lagvier oysz2016 happog rayjvillicana

seglink's Issues

Two titan xp of 12GB can only handle 512*512 and batch size=8 on my machine....

how to know the word in the text found?

Hello, Thanks for the great contribution, I have installed the SegLink system successfully, but, I would like to know if there is a way to know the word in the text found?

regards

ImportError: No module named util

'util' is missing

ImportError: No module named 'util'

Where can i find it?

UnboundLocalError: local variable 'setproctitle' referenced before assignment

Hi, I have the dependencies installed for seglink already.

However, when I tried to run the program, I have encountered such problem:

Could anyone advise how can I solve this?

Thanks!

poor performance using seglink-384

I have tested ICDAR2013 with seglink-384 model and visualized the result images. The seg_conf is set 0.8 and link_conf 0.5. However the performance was very poor. Can you help explain the reason?

Trained model

Could you please share a Drive or Dropbox link with a quicker server? Unfortunately, I cannot download any of the models. Once I start downloading it is telling me 20 h....for 300 mB? And as a consequence Chrome stops the download.

Valentin

Meet a question.

Hello, I have benefited greatly from this open source project. Thank you for this elegant code, but I am puzzled by the following code.

"points_in_bbox_mask = points_in_bbox_mask.intersection(config.default_anchor_center_set)"
( in ~/seglink/tf_extended/seglink.py)

According the upper code, I think we can only find default box/anchor which center is in the edge of the word bounding box not inside the word bounding box.
But a default box is labeled as positive iff (1) the center of the box is inside the word bounding box according the paper.
Could you help me solve the problem. Thank you in advance~@dengdan

Errors while changing the basenet

When I try to change the VGG net to Resnet,it doesn't work.

I mainly change the vgg.py file like

def basenet(inputs):
    logit, endpoints =resnet_50(inputs)
    endpoints['conv4_3'] = endpoints['vgg/resnet_50/block2/unit_2']
    endpoints['fc7'] = endpoints['vgg/resnet_50/block3/unit_4']
    return endpoints['fc7'], endpoints

#try to keep the output as original net

However it dosen't work but come out:

Traceback (most recent call last):
File "/home/moon/seglink-master/train_seglink.py", line 276, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/moon/seglink-master/train_seglink.py", line 271, in main
train_op = create_clones(batch_queue)
File "/home/moon/seglink-master/train_seglink.py", line 220, in create_clones
averaged_gradients = sum_gradients(gradients)
File "/home/moon/seglink-master/train_seglink.py", line 164, in sum_gradients
grad = tf.add_n(grads, name = v.op.name + '_summed_gradients')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1918, in add_n
raise ValueError("inputs must be a list of at least one Tensor with the "
ValueError: inputs must be a list of at least one Tensor with the same dtype and shape

My coarse renset implemention is as follow:

import tensorflow as tf
import collections

slim = tf.contrib.slim

Block = collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])

def subsample(inputs, factor, scope=None):
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride= factor, scope=scope)

def bottleneck(inputs,
               depth,
               depth_bottleneck,
               stride,
               outputs_collections='collections',
               scope=None):

    with tf.variable_scope(scope) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
        preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')
        if depth==depth_in:
            shortcut = subsample(inputs, stride, 'shortcut')
        else:

            shortcut = slim.conv2d(preact, depth, [1, 1],
                                   stride=stride, normalizer_fn=None,
                                   activation_fn=None, scope='shortcut')
        residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1, scope='conv1')

        residual = slim.conv2d(residual, depth_bottleneck, [3, 3], stride=stride, padding='SAME', scope='conv2')

        residual = slim.conv2d(residual, depth, [1, 1], stride=1, scope='conv3')

        output = shortcut+residual

        return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)



def resnet_50(input):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block(
            'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block(
            'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block(
            'block4', bottleneck, [(2048, 512, 1)] * 3)]
    net = input
    net = slim.conv2d(net, 64, 7, stride=2, scope='conv1', padding='SAME')
    net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
    with tf.variable_scope('resnet_50'):
        for i, block in enumerate(blocks):
            with tf.variable_scope(block.scope):
                args = block.args
                for j, arg in enumerate(args):
                    depth, depth_bottleneck, stride = arg
                    net = bottleneck(net, depth, depth_bottleneck, stride, scope='unit_'+str(j))
    endpoints = slim.utils.convert_collection_to_dict('collections')
    return net, endpoints

Can you help me figure it out?
Is there any example for changing basenet?
Thank you!
@dengdan

Export for serving from TFS

Hello,

Great job with the seglink model code as well as with making the checkpoint available!

I would like to save a seglink checkpoint as a graph (e.g saved_model.pb) for serving from TFS -
https://www.tensorflow.org/serving/

TFS requires that there be no py_func usage (so, whatever numpy operations you carryout in the function, convert it to tf operations) - see this reference link: tensorflow/serving#495

My questions / requests to you at this point:

Do you plan to implement an export graph function, similar to the inception saved model function in the TFS example ( https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_saved_model.py ) ?
If not, is it possible for you to convert your numpy ops to pure tf ops in the functions that are called from tf.py_func() in tf_extended/seglink..py, and make it available as another fork?

Thank you very much,
Regards,
Buvana

The f-measure of evaluation

I used the command


python eval_seglink.py --checkpoint_path=./seglink/model.ckpt-136750  --dataset_name=icdar2015 --dataset_split_name=test --dataset_dir=./tf_records

to evaluate the model provided by you(seglink-384 model)

I change 'seg_conf_threshold' and 'link_conf_threshold' to 0.8 and 0.5 separately.

when i set the test image as 384x384, the result is
Recall, Precision, Fmean = [0.48117587][0.72381693][0.57806695]

when I set the test image as 512x512, the result is
Recall, Precision, Fmean = [0.61840743][0.78477693][0.69172925]

It doesn't match the result you provided.
Is there anything i miss?

Could you supported this library in Android?

Hello,

In my application, I want to detect text from real-life images. Do you have given support in Android?

Thanks!

Why re-implement?

Hi,

Could you elaborate on why you chose to re-implement?

Best regards,
Jesper

About transofrom_cv_rect

When a rectangle is rotated to horizontal direction, should width-side be horizontal?
should the length of width-side be larger than than the length of height-side?

The input parameter rects is generated by minarearect function of opncv


def transform_cv_rect(rects):
    only_one = False
    if len(np.shape(rects)) == 1:
        rects = np.expand_dims(rects, axis = 0)
        only_one = True
    assert np.shape(rects)[1] == 5, 'The shape of rects must be (N, 5), but meet %s'%(str(np.shape(rects)))
    rects = np.asarray(rects, dtype = np.float32).copy()
    num_rects = np.shape(rects)[0]
    for idx in xrange(num_rects):
        cx, cy, w, h, theta = rects[idx, ...];
        #assert theta < 0 and theta >= -90, "invalid theta: %f"%(theta) 
        if abs(theta) > 45 or (abs(theta) == 45 and w < h):
            w, h = [h, w]
            theta = 90 + theta
        rects[idx, ...] = [cx, cy, w, h, theta]
    if only_one:
        return rects[0, ...]
    return rects

After using the above function, it seems that it can't promise that the length of width is larger than the length of height.

I'm so confused.

maybe find a mistake?

# datasets/dataset_utils.py
example = tf.train.Example(features=tf.train.Features(feature={
            'image/shape': int64_feature(list(shape)),
            'image/object/bbox/xmin': float_feature(list(bboxes[:, 0])),
            'image/object/bbox/ymin': float_feature(list(bboxes[:, 1])),
            'image/object/bbox/xmax': float_feature(list(bboxes[:, 2])),
            'image/object/bbox/ymax': float_feature(list(bboxes[:, 3])),
            'image/object/bbox/x1': float_feature(list(oriented_bboxes[:, 0])),
            'image/object/bbox/y1': float_feature(list(oriented_bboxes[:, 1])),
            'image/object/bbox/x2': float_feature(list(oriented_bboxes[:, 2])),
            'image/object/bbox/y2': float_feature(list(oriented_bboxes[:, 3])),
            'image/object/bbox/x3': float_feature(list(oriented_bboxes[:, 4])),
            'image/object/bbox/y3': float_feature(list(oriented_bboxes[:, 5])),
            'image/object/bbox/x4': float_feature(list(oriented_bboxes[:, 6])),
            'image/object/bbox/y4': float_feature(list(oriented_bboxes[:, 7])),
            'image/object/bbox/label': int64_feature(labels),
            'image/object/bbox/label_text': bytes_feature(labels_text),
            'image/object/bbox/ignored': int64_feature(ignored),
            'image/format': bytes_feature(image_format),
            'image/filename': bytes_feature(filename),
            'image/encoded': bytes_feature(image_data)}))
    return example

Should 'xmax' be '4' and 'ymax' be '5' ?

 'image/object/bbox/xmax': float_feature(list(bboxes[:, 4])),
 'image/object/bbox/ymax': float_feature(list(bboxes[:, 5])),

0	1	2	3	4	5	6	7	text
506	224	580	226	581	250	507	247	HARRY

ymax = 226 ? or ymax = 250 ?
these have an impact on the last result?

Any repository or reommendation for text recogintion?

pre trained model

Can any one of the model be uploaded on one drive or google drive?

training issue

i'am facing the below issue while "training train_seglink.py "

TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

Can anyone please explain the issue .

Datasets "ICDAR2015" unable to download when register due to no activation email receive

To download ICDAR2015 i need to create a account @ http://rrc.cvc.uab.es/?com=contestant.
I was waiting for the activation email almost a day yet nothing came.

I'm using hotmail address to create the account and did double check that i did not input wrong email

Here something i found out

ch4_test_localization_transcription_gt

ch4_test_localization_transcription_gt, this file for the testset ground truth does not exists, it is appreciate that the author could provide.

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value

INFO:tensorflow:global step 109662: loss = 5.3843 (0.160 sec/step)
INFO:tensorflow:global step 109663: loss = 4.5832 (0.256 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Retval[0] does not have value
INFO:tensorflow:global step 109664: loss = 8.8361 (0.098 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "./train_seglink.py", line 275, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./train_seglink.py", line 271, in main
train(train_op)
File "./train_seglink.py", line 260, in train
session_config = sess_config
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 759, in train
sv.saver.save(sess, sv.save_path, global_step=sv.global_step)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception
yield
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run
self.run_loop()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop
self._sv.global_step])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value
zst@zst-robot1:~/zst/seglink-master$
我的tf 版本是1.2.1,我也尝试在1.1.0上运行也会出现这样的错误.

How to train a new model only with pretrained VGG16?

@dengdan

When I add focal loss, the time of evaluation is very slow

Sorry to bother you.
I change the loss to be focal loss,it trains normally.
However,when i try to evaluate the effect of it, it's so slow and i can't understand why.
This is the code:

    def focal_loss(self, onehot_labels, cls_preds,
                   alpha=0.25, gamma=2.0, name=None, scope=None):
        with tf.name_scope(scope, 'focal_loss', [cls_preds, onehot_labels]) as sc:
            logits = tf.convert_to_tensor(cls_preds)
            onehot_labels = tf.convert_to_tensor(onehot_labels)
        precise_logits = tf.cast(logits, tf.float32) if (
            logits.dtype == tf.float16) else logits
        onehot_labels = tf.cast(onehot_labels, precise_logits.dtype)
        predictions = tf.nn.sigmoid(precise_logits)
        predictions_pt = tf.where(tf.equal(onehot_labels, 1), predictions, 1. - predictions)
        # add small value to avoid 0
        epsilon = 1e-8
        alpha_t = tf.scalar_mul(alpha, tf.ones_like(onehot_labels, dtype=tf.float32))
        alpha_t = tf.where(tf.equal(onehot_labels, 1.0), alpha_t, 1 - alpha_t)
        losses = tf.reduce_mean(-alpha_t * tf.pow(1. - predictions_pt, gamma) * tf.log(predictions_pt + epsilon),
                               name=name)
        return losses

def build_loss(self, seg_labels, seg_offsets, link_labels, do_summary=True):
    batch_size = config.batch_size_per_gpu

    # note that for label values in both seg_labels and link_labels:
    #    -1 stands for negative
    #     1 stands for positive
    #     0 stands for ignored
    def get_pos_and_neg_masks(labels):
        if config.train_with_ignored:
            pos_mask = labels >= 0
            neg_mask = tf.logical_not(pos_mask)
        else:
            pos_mask = tf.equal(labels, 1)
            neg_mask = tf.equal(labels, -1)

        return pos_mask, neg_mask

    def OHNM_single_image(scores, n_pos, neg_mask):
        """Online Hard Negative Mining.
            scores: the scores of being predicted as negative cls
            n_pos: the number of positive samples
            neg_mask: mask of negative samples
            Return:
                the mask of selected negative samples.
                if n_pos == 0, no negative samples will be selected.
        """

        def has_pos():
            n_neg = n_pos * config.max_neg_pos_ratio
            max_neg_entries = tf.reduce_sum(tf.cast(neg_mask, tf.int32))
            n_neg = tf.minimum(n_neg, max_neg_entries)
            n_neg = tf.cast(n_neg, tf.int32)
            neg_conf = tf.boolean_mask(scores, neg_mask)
            vals, _ = tf.nn.top_k(-neg_conf, k=n_neg)
            threshold = vals[-1]  # a negtive value
            selected_neg_mask = tf.logical_and(neg_mask, scores <= -threshold)
            return tf.cast(selected_neg_mask, tf.float32)

        def no_pos():
            return tf.zeros_like(neg_mask, tf.float32)

        return tf.cond(n_pos > 0, has_pos, no_pos)

    def OHNM_batch(neg_conf, pos_mask, neg_mask):
        selected_neg_mask = []
        for image_idx in xrange(batch_size):
            image_neg_conf = neg_conf[image_idx, :]
            image_neg_mask = neg_mask[image_idx, :]
            image_pos_mask = pos_mask[image_idx, :]
            n_pos = tf.reduce_sum(tf.cast(image_pos_mask, tf.int32))
            selected_neg_mask.append(OHNM_single_image(image_neg_conf, n_pos, image_neg_mask))

        selected_neg_mask = tf.stack(selected_neg_mask)
        selected_mask = tf.cast(pos_mask, tf.float32) + selected_neg_mask
        return selected_mask

    # OHNM on segments
    seg_neg_scores = self.seg_scores[:, :, 0]
    seg_pos_mask, seg_neg_mask = get_pos_and_neg_masks(seg_labels)
    seg_selected_mask = OHNM_batch(seg_neg_scores, seg_pos_mask, seg_neg_mask)
    n_seg_pos = tf.reduce_sum(tf.cast(seg_pos_mask, tf.float32))

    with tf.name_scope('seg_cls_loss'):
        def has_pos():
            #seg_cls_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            #    logits=self.seg_score_logits,
            #    labels=tf.cast(seg_pos_mask, dtype=tf.int32))
            #return tf.reduce_sum(seg_cls_loss * seg_selected_mask) / n_seg_pos
            seg_cls_loss = self.focal_loss(tf.one_hot(seg_labels, 2), self.seg_score_logits)
            return seg_cls_loss
        def no_pos():
            return tf.constant(.0);

        seg_cls_loss = tf.cond(n_seg_pos > 0, has_pos, no_pos)
        tf.add_to_collection(tf.GraphKeys.LOSSES, seg_cls_loss)

    def smooth_l1_loss(pred, target, weights):
        diff = pred - target
        abs_diff = tf.abs(diff)
        abs_diff_lt_1 = tf.less(abs_diff, 1)
        if len(target.shape) != len(weights.shape):
            loss = tf.reduce_sum(tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=2)
            return tf.reduce_sum(loss * tf.cast(weights, tf.float32))
        else:
            loss = tf.where(abs_diff_lt_1, 0.5 * tf.square(abs_diff), abs_diff - 0.5)
            return tf.reduce_sum(loss * tf.cast(weights, tf.float32))

    with tf.name_scope('seg_loc_loss'):
        def has_pos():
            seg_loc_loss = smooth_l1_loss(self.seg_offsets, seg_offsets,
                                          seg_pos_mask) * config.seg_loc_loss_weight / n_seg_pos
            names = ['loc_cx_loss', 'loc_cy_loss', 'loc_w_loss', 'loc_h_loss', 'loc_theta_loss']
            sub_loc_losses = []
            from tensorflow.python.ops import control_flow_ops
            for idx, name in enumerate(names):
                name_loss = smooth_l1_loss(self.seg_offsets[:, :, idx], seg_offsets[:, :, idx],
                                           seg_pos_mask) * config.seg_loc_loss_weight / n_seg_pos
                name_loss = tf.identity(name_loss, name=name)
                if do_summary:
                    tf.summary.scalar(name, name_loss)
                sub_loc_losses.append(name_loss)
            seg_loc_loss = control_flow_ops.with_dependencies(sub_loc_losses, seg_loc_loss)
            return seg_loc_loss

        def no_pos():
            return tf.constant(.0);

        seg_loc_loss = tf.cond(n_seg_pos > 0, has_pos, no_pos)
        tf.add_to_collection(tf.GraphKeys.LOSSES, seg_loc_loss)

    link_neg_scores = self.link_scores[:, :, 0]
    link_pos_mask, link_neg_mask = get_pos_and_neg_masks(link_labels)
    link_selected_mask = OHNM_batch(link_neg_scores, link_pos_mask, link_neg_mask)
    n_link_pos = tf.reduce_sum(tf.cast(link_pos_mask, dtype=tf.float32))
    with tf.name_scope('link_cls_loss'):
        def has_pos():
            #link_cls_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            #    logits=self.link_score_logits,
            #    labels=tf.cast(link_pos_mask, tf.int32))
            #return tf.reduce_sum(link_cls_loss * link_selected_mask) / n_link_pos
            link_cls_loss = self.focal_loss(tf.one_hot(link_labels, 2), self.link_score_logits)
            return link_cls_loss
        def no_pos():
            return tf.constant(.0);

        link_cls_loss = tf.cond(n_link_pos > 0, has_pos, no_pos) * config.link_cls_loss_weight
        tf.add_to_collection(tf.GraphKeys.LOSSES, link_cls_loss)

    if do_summary:
        tf.summary.scalar('seg_cls_loss', seg_cls_loss)
        tf.summary.scalar('seg_loc_loss', seg_loc_loss)
        tf.summary.scalar('link_cls_loss', link_cls_loss)



Thanks in advance.

import error

Dear Deng：
I'm sorry I am a newer for TensorFlow framework and get an error of following：

++ set -e
++ export CUDA_VISIBLE_DEVICES=0
++ CUDA_VISIBLE_DEVICES=0
++ CHECKPOINT_PATH=/home/jin/zs/seglink/seglink-512/
++ DATASET_DIR=/home/jin/data/ch4/ch4_test_imgs
++ python test_seglink.py --checkpoint_path=/home/jin/zs/seglink/seglink-512/ --gpu_memory_fraction=-1 --seg_conf_threshold=0.8 --link_conf_threshold=0.5 --dataset_dir=/home/jin/data/ch4/ch4_test_imgs
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "test_seglink.py", line 9, in
from tensorflow.contrib.training.python.training import evaluation
ImportError: cannot import name evaluation

I will appreciate it very much！
Thanks！

TypeError: not all arguments converted during string formatting

Hi. I want to test the pretrained model on my image. when i run'./scripts/test.sh 0 model/model.ckpt-217867 /home/wh/work/jyh_dataset/testimage', i get this error:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/init.py", line 861, in emit
msg = self.format(record)
File "/usr/lib/python2.7/logging/init.py", line 734, in format
return fmt.format(record)
File "/usr/lib/python2.7/logging/init.py", line 465, in format
record.message = record.getMessage()
File "/usr/lib/python2.7/logging/init.py", line 329, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting

os: ubuntu 16.04
python:2.7
opencv:3.3.0
tensorflow-gpu 1.4.0

Slow evaluation speed...

It is me again....
I tried to speed up evaluation by changing eval_seglink.py 's batch_size to 8 instead of 1 , which took me 40 mins to evaluate my own entire dataset of about 14000 images
config.init_config(image_shape,
batch_size = 8,
seg_conf_threshold = FLAGS.seg_conf_threshold,
link_conf_threshold = FLAGS.link_conf_threshold,
train_with_ignored = FLAGS.train_with_ignored,
seg_loc_loss_weight = FLAGS.seg_loc_loss_weight,
link_cls_loss_weight = FLAGS.link_cls_loss_weight,
)

but changing to 8 will report error
ValueError: slice index 1 of dimension 0 out of bounds. for 'evaluation_512x512/strided_slice_4' (op: 'StridedSlice') with input shapes: [1,5460], [2], [2], [2] and with computed input tensors: input[1] = <1 0>, input[2] = <2 0>, input[3] = <1 1>.

What does it mean....? Can only evaluate with batch_size more than one ?

test with no-gpu？

hi，can i test images with no gpu？

About the Negative link in the paper

Hi, I'm checking your code and thank you for your code.
But I didn't found the negtive link which was mentioned in the paper. Does the code has not implemented this part?
Thank you.

How to train on my own datasets?

Hi, dengdan, thank you for your hard work,
i am trying to train seglink model on my own datasets, i meet such situation:

I only get one GPU card which is TITAN XP, i have rewrite your train scripts,but i get some warnings
which pretrain model should i prepare for trainning progress? i got imagenet vgg16 checkpoints from SSD-tensorflow project, does this which part of the code should i rewrite to train on this pretrain model?

Thank you again, your work is awesome.

detected bounding box slightly rotated to one direction

Hi, dengdan, thanks for your repo to help me with the process of my project. I am now training on my own dataset, with your pre-trained 512 weights. Everything goes fine, but I met a problem when doing inference with my trained weights. The bboxes found by my weights are all slightly rotated to a particular orientation. More specifically, all horizontal bboxes are slightly rotated clockwise to various degrees, like 5 degree to 15 degree. This is wired since no bboxes are rotated anti-clockwise. Just before I check the code with care, could you give me a hint of what might be going wrong? Thanks in advance.

About bboxes_filter_overlap

I read the code and something make me confused.
In the process of data augmentation,the following function appears.

bboxes_filter_overlap(labels, bboxes,xs, ys, threshold, scope=None, assign_negative = False)

Is the value in bboxes of parameters may be negative?

I am looking forward to your help！

test algorithm on own images

I am proceeding exactly how you describe. However, once I run the code I receive

DataLossError (see above for traceback): Unable to open table file /home.net/vs17dow/Desktop/A_M-arbeit/G_Code/E_seglink/models/model.ckpt-136750.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_24 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_24/tensor_names, save/RestoreV2_24/shape_and_slices)]]
[[Node: save/RestoreV2_56/_73 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_260_save/RestoreV2_56", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

./scripts/test.sh 0 ~/Desktop/A_M-arbeit/G_Code/E_seglink/models/seglink/model.ckpt-136750.data-00000-of-00001 ~/Desktop/A_M-arbeit/G_Code/E_seglink/datasets/mydata/

I do not see an error in my command. Any advice how to proceed here?

Best

Valentin

error when running test_seglink.py

I have successfully convert SynthText and ICDAR2015 datasets to tfrecords but when I try to test_seglink.py I got the following error !
AttributeError: 'NoneType' object has no attribute 'startswith'

finetune

能否花点时间写个详细的微调文档或指导
如果成型不胜感激

Please Share pretrained models

@dengdan Can you please share your pre-trained models using gdrive or one drive. I can't download models from Baidu.

eval.sh waiting for checkpoint

Hi,
I'm trying to run eval.sh to do offline evaluation since robust reading competition website is down.
I got this message and process stucks there. Any advise?

INFO:tensorflow:Waiting for new checkpoint at /home/rp/code/seglink/seglink-512/model.ckpt-217867

what's the effect when applied to chinese ocr?

what's the effect when seglink applied to chinese ocr? can it achieve the state-of-the-art effect ?

where can i get the model files?

exceptions.AttributeError: 'module' object has no attribute 'cv' When run test_seglink.py

I run test_seglink.py, and the script meet below error when it run to line "image_bboxes = sess.run([bboxes_pred], feed_dict = {image:image_data, image_shape:image_data.shape})
"
I use tensorflow-gpu (1.2.0) python 2.7

Traceback (most recent call last):

File "", line 1, in
runfile('/home/user/MZH/seglink-master/test_seglink.py', args='--dataset_dir=datasets/ICDAR-Test-Images --checkpoint_path=/home/user/MZH/seglink-master/seglink-384/model.ckpt-136750', wdir='/home/user/MZH/seglink-master')

File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
builtins.execfile(filename, *where)

File "/home/user/MZH/seglink-master/test_seglink.py", line 164, in
tf.app.run()

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "/home/user/MZH/seglink-master/test_seglink.py", line 160, in main
eval()

File "/home/user/MZH/seglink-master/test_seglink.py", line 147, in eval
image_bboxes = sess.run([bboxes_pred], feed_dict = {image:image_data, image_shape:image_data.shape})

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)

File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)

UnknownError: exceptions.AttributeError: 'module' object has no attribute 'cv'
[[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4, test/strided_slice_5, test/strided_slice_2, test/strided_slice_3, test/PyFunc/input_4, test/PyFunc/input_5)]]

Caused by op u'test/PyFunc', defined at:
File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/ipython/start_kernel.py", line 231, in
main()
File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/ipython/start_kernel.py", line 227, in main
kernel.start()
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/user/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/user/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/user/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes
if self.run_code(code, result):
File "/home/user/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/home/user/MZH/seglink-master/test_seglink.py', args='--dataset_dir=datasets/ICDAR-Test-Images --checkpoint_path=/home/user/MZH/seglink-master/seglink-384/model.ckpt-136750', wdir='/home/user/MZH/seglink-master')
File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "/home/user/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
builtins.execfile(filename, *where)
File "/home/user/MZH/seglink-master/test_seglink.py", line 164, in
tf.app.run()
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/user/MZH/seglink-master/test_seglink.py", line 160, in main
eval()
File "/home/user/MZH/seglink-master/test_seglink.py", line 98, in eval
link_conf_threshold = config.link_conf_threshold)
File "tf_extended/seglink.py", line 680, in tf_seglink_to_bbox
tf.float32);
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func
name=name)
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/user/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

UnknownError (see above for traceback): exceptions.AttributeError: 'module' object has no attribute 'cv'
[[Node: test/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](test/strided_slice_4, test/strided_slice_5, test/strided_slice_2, test/strided_slice_3, test/PyFunc/input_4, test/PyFunc/input_5)]]