argman / east Goto Github PK

View Code? Open in Web Editor NEW

3.0K 113.0 1.1K 2 MB

A tensorflow implementation of EAST text detector

License: GNU General Public License v3.0

Python 12.94% Makefile 0.05% C++ 86.22% CSS 0.02% HTML 0.74% Shell 0.02%

tensorflow text-detection deep-learning ocr

east's Issues

Trying to run EAST on Windows I got the following error !

divided by zero, maybe have a try "arctan2"

in icdar.py , then angle and theta are calculated by np.arctan
L362
So there exists a risk that value is divided by zero.
I change it to np.arctan2(dy,-dx), it seems works well.

@argman,the code has large error,Can you correct all the common mistakes?

Does `from icdar import restore_rectangle` need so much time?

I run the python3 eval.py, or python3 run_demo_server.py, it will run from icdar import restore_rectangle. However, the terminal show that 1000 training images in ./data/train/ and then it may load these training image and it has taken so much time. Does anybody has the same situation?

Want to change dice coefficient function to class-balanced cross entropy function.

I tried to change this code to class-balanced cross entropy function.

def dice_coefficient(y_true_cls, y_pred_cls,
training_mask):
'''
dice loss
:param y_true_cls:
:param y_pred_cls:
:param training_mask:
:return:
'''
eps = 1e-5
intersection = tf.reduce_sum(y_true_cls * y_pred_cls * training_mask)
union = tf.reduce_sum(y_true_cls * training_mask) + tf.reduce_sum(y_pred_cls * training_mask) + eps
loss = 1. - (2 * intersection / union)
tf.summary.scalar('classification_dice_loss', loss)
return loss

However, I don't understand why there is training mask and what its' role is. I would be thankful if somebody tells :) Thanks

More training details

Hi,

I am trying to exactly reproduce your released model. Could you provide some more details about the training. In the readme it looks like you use 14 images per gpu and I see you've mentioned training with 4 gpus? Was your total batch size then 56? Did you adjust the learning rate at all for such large batch size or was the default one used?

Also, you mention using icdar2013 training set as well. Anything special here or is sampling between icdar2015 and 2013 1:1.

Any more details that you think may be relevant?

Btw. Small typo in the readme "Thanks for the author's (@zxytim) help! Please site his paper if you find this useful." site -> cite

Thanks for releasing the code. It's great!

about balanced cross-entropy loss

The code use the dice_coefficient loss but not balanced cross-entropy loss in the paper, so I follow the paper try the balanced cross-entropy loss, but the performance is very poor with balanced cross-entropy loss which can't achieve the result in paper. I can't figure out this problem why dice_coefficient loss is greater than balanced cross-entropy loss.

Problem during training

Hi, @argman
I get the following error during training:

,,,
Step 000830, model loss 0.0111, total loss 0.0264, 71.25 seconds/step, 0.39 examples/second
Step 000840, model loss 0.0121, total loss 0.0272, 71.00 seconds/step, 0.39 examples/second
Step 000850, model loss 0.0124, total loss 0.0274, 71.36 seconds/step, 0.39 examples/second
Step 000860, model loss 0.0130, total loss 0.0279, 71.22 seconds/step, 0.39 examples/second
Step 000870, model loss 0.0107, total loss 0.0255, 71.07 seconds/step, 0.39 examples/second
Step 000880, model loss 0.0109, total loss 0.0256, 70.99 seconds/step, 0.39 examples/second
StepTraceback (most recent call last):
  File ".../EAST/icdar.py", line 657, in generator
    score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
  File ".../EAST/icdar.py", line 520, in generate_rbox
    if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
  File ".../EAST/icdar.py", line 248, in point_dist_to_line
    return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

but, training proceeds without stopping.
Do you know anything about this problem? And is it a serious problem in model creation?

您好，请问用SCUT_FORU_DB 数据可以测试吗？

[RFC] Roadmap

To serve better as a baseline for further research and those who just want a fast text detector, we are planning to polish this repo from "just works" to "works great". Here are our current plans:

As we both have our full time jobs, this roadmap will not be subject to a timetable. If you want take one of the tasks above, please start a dedicated issue for that task and kindly submit a pull request.

Also, any suggestions are warmly welcomed.

problem in the output

I have trained a model with this command:

python multigpu_train.py --gpu_list=0,1,2 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/backup/EAST/
--text_scale=1024 --training_data_path=/DATA/EAST/data/ --geometry=RBOX --learning_rate=0.0001 --num_readers=12

and i've waited till:

Step 007130, model loss 0.0316, total loss 0.0827, 7.33 seconds/step, 5.73 examples/second

first Question should i make him , do more iterations or this is enough ???

second Question:
The output of all the images seems to be 1 size , why this is happening ?
i couldn't see many variations in the output dimensions

examples:

so what's missing to be able to detect blocks of text ?

您好，我用MSRA-TD500 数据在运行 east模型，报错如下，请问是什么原因？是数据的原因吗？

数据格式如下：

Text scale

I have alot of different text regions with different dimensions and scales , how can i set the parameter "TEXT SCALE" correctly ?
how can i put the right number ?

and does it depened on the parameter "INPUT SIZE"

Tried to write class-balanced cross entropy, but couldn't understand it.

I just want to modify loss function, from dice coefficient to class balanced xentropy, but I still don't get what to change.

def batch_flatten(x):
"""
Flatten the tensor except the first dimension.
"""
shape = x.get_shape().as_list()[1:]
if None not in shape:
return tf.reshape(x, [-1, int(np.prod(shape))])
return tf.reshape(x, tf.stack([tf.shape(x)[0], -1]))

def xentropy(y_true_cls, y_pred_cls,
training_mask):
eps = 1e-7

z = batch_flatten(y_pred_cls)
y = tf.cast(batch_flatten(y_true_cls), tf.float32)

count_neg = tf.reduce_sum((1. - y) * training_mask)
count_pos = tf.reduce_sum(y * training_mask)
beta = count_neg / (count_neg + count_pos)
loss_pos = -beta * tf.reduce_mean(y * tf.log(z + eps))

loss_neg = (1. - beta) * tf.reduce_mean((1. - y) * tf.log(1. - z + eps))
cost = tf.subtract(loss_pos, loss_neg, name=name)

return cost

Would this code be work?

can I run it on CPU mode ? I have no GPU devices

Results on Arabic and Urdu Datasets

Hello Argman!!!!

Hope that you are in your finest health, here are some of the results, trained on Arabic Dataset

Basically, what our main purpose was to detect Urdu News-Tickers, therefore I'm sending you those too.

You can use the photos anyway you want, just cite my github link.

Anyway I can also give you the model if you want!!!!

Thanks Again
Burhan Ul Tayyab

Using my own dataset to train the model

Hi, @argman
I want to train the model using my own dataset, what can I do for this?
How to create the gt text files? And what's the each parameter stands for?

How to generate the train dataset?

@zxytim @argman Hello, thanks for your sharing! Now I am curious about the format of the train dataset, so could provide a brief descriptions about it? Thanks very much!

problem of choosing "max_side_len" in eval.py

@argman hi, in eval.py, if using the default max_side_line=2400, the inference result is strange, the large text will not be detected, but even the very small text can be detected. however, when the max_side_line is set to 512 the same as INPUT_SIZE, the very large text can be correctly detected , but the small text will be ignored. thanks!

about pvanet training

hi, have you tried pvanet as basenetwork? I tried pvanet using caffe but encountered overfitting problem.
my training sets is 950 images from icdar 2015 trainningsets( the other 50 images as validation sets) and 229 images from icdar 2013.
model is trained by online data augmentation which includes scaling and rotations between ±30°. iou loss overfits a lot that when trainning iou descend to 0.25 validation iou loss still stays high at 0.7. I think I have confirmed everything so much that I can not solve this problem. please help me, Mr. Argman!!!!!!. I have cost two month on this problem.... 555555

@argman , i have train the model 15000 iterator,its take me 3 day to train it, i use the MSRA-TD500 dataset which has chinese and english(total has 1000 images), but ,the result is poor,is there has something wrong???

Score Map Generation

In the Section of 3.3.1, the reference length ri = min(D(pi, p(i mod 4)+1),D(pi, p((i+3) mod 4)+1)). When i = 1, r1 = min(D(p1, p2), D(p1, p1)). So r1 = 0, does it? Can you explain in more detail or which part of code is compute this. Thanks!

the performance between CTPN and EAST ?

Am I doing right?

I changed loss function and tried to train data through EAST. However, when I tried it and look how training was going on, I found something weird.

Above pictures are input data and corresponding score map. Shouldn't gt area be black and elsewhere white, rather than the picture? (In picture, gt area is white and elsewhere black)

Cannot compile lanms

I have .jpg images in the folder, I trying to run eval.py, I have trained the model and have checkpoint file.
The command I am using is: python3 eval.py --test_data_path=/home/kamranjanjua/EAST/icdarData/ --gpu_list=0 --checkpoint_path=/modelse/ --output_path=/home/kamranjanjua/EAST/output_icdar/

icdarData folder contains the images.

However, when I run it, the error I get is: raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))

Any solution?

cannot detect single char

I run the eval.py to detect text. But all the single digit number in this image cannot be detected. Could someone tell me why? thanks

question about fit_line()

@zxytim @argman oneline of code I think may be wrong,
in function fit_line() of icdar.py:
def fit_line(p1, p2):
# fit a line ax+by+c = 0
if p1[0] == p1[1]: # Here I think should change to if p1[0] == p2[0]:
return [1., 0., -p1[0]]
else:
[k, b] = np.polyfit(p1, p2, deg=1)
return [k, -1., b]

Question on EAST/icdar.py

def load_annoataion(p):
    text_polys = []
    text_tags = []
    if not os.path.exists(p):
        return np.array(text_polys, dtype=np.float32)
    with open(p, 'r') as f:
        reader = csv.reader(f)
        for line in reader:
            label = line[-1]
            # strip BOM. \ufeff for python3,  \xef\xbb\bf for python2
            line = [i.strip('\ufeff').strip('\xef\xbb\xbf') for i in line]
            x1, y1, x2, y2, x3, y3, x4, y4 = list(map(float, line[:8]))
            text_polys.append([[x1, y1], [x2, y2], [x3, y3], [x4, y4]])
            if label == '*' or label == '###':
                text_tags.append(True)
            else:
                text_tags.append(False)
return np.array(text_polys, dtype=np.float32), np.array(text_tags, dtype=np.bool)

In here, why 'if label is '*' or ###' then text_tag is true, not false? Shouldn't it be vice versa? If so, what if label has text information?

when i run run_demo_server.py ,i use linux python2.7, there always wrong, lanms, how can i correct the error?

How to reproduce the performance of f1-score=80.83 on ICDAR2015?

Hi,

I can't reproduce the 80.83 f1score when directly run python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \ --text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \ --pretrained_model_path=/tmp/resnet_v1_50.ckpt on the ICDAR2015+2013 training images.

Could you please tell me the parameter configurations of your experiment that achieves 80.83 f1-score?
about: 1. the batch size per GPU; 2. the number of GPUs ;3. the initial learning rate 4. the number of steps that you train your model for.

Thank you very much.

weird training problems !

I'm trying to run multigpu_train.py on ubuntu 14.04 , python 2.7.6 and tensorflow 1.1.0 but I got an incomprehensible error :
KeyError: 'pool4'

classfication loss function

Thanks for sharing this excellent repo. I noticed that the classification loss function you used in code is different from the paper. you use dice coefficient instead of cross entropy. Could you provide more detail on this part?

i directly used to test, positioning are wrong, the effect is poor, is the reason for the training set?

can't compile lnms on windows

楼主，您好，我在windows下运行python run_demo_server.py,出现error：
image

说是import lnms这儿出错，看了下lanms文件夹下会执行__init__.py的函数
if subprocess.call(['make', '-C', BASE_DIR]) != 0: # return value
raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))
在这个地方出错，楼主知道怎么解决吗？感谢！

Questions about restore_rectangle_rbox

The function restore_rectangle_rbox in icdar.py is so complicate that after spent a lot time to read and study it, but I still can't understand it! Could you provide more information or comments about this function?

Training errors 'Cross point does not exist'

When I train the model with dataset ICDAR2015, I meet the error:

Cross point does not exist
Traceback (most recent call last):
  File "/home/lairf/EAST/icdar.py", line 657, in generator
    score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
  File "/home/lairf/EAST/icdar.py", line 520, in generate_rbox
    if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
  File "/home/lairf/EAST/icdar.py", line 248, in point_dist_to_line
    return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

Why it happens? Does it influence the training result?

Training Issue

Hi!!!

I have been trying to initiate the training for Arabic Datasets for this model but as soon as try to start it, it gives me this
"poly in wrong direction "

The dataset consists of 9 values compromising (x0,y0) to (x3, y3) clockwise and one word to describe the selected region.

I am using Tensorflow v1.2 using Python 3.5 and I have successfully initiated the demo on my server.

I request you to please guide me on this issue.

Thanks
Burhan Ul Tayyab

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Hi, @argman

I have a next problem (training )

....
2017-08-31 18:22:46.750620: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-31 18:22:46.750670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-31 18:22:46.750683: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

Do you know this problem? (I used 2 gpu)

A forked Python 2 version without C++ nms

I have made a Python 2 compatible fork.
https://github.com/AKSHAYUBHAT/EAST

Training problem

I have a dataset that has 12982 images
when i started training it , i used 24 reader but all what i see is

) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:23:00.0)
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/

and the gpu utilization is 95% and all the 24 core are 100%

it has been 1 hour till now and nothing changed ?

so is there any thing wrong happened ?

the pre-train model can't achieve 80% F1-score.

I run the east_icdar2015_resnet_v1_50_rbox model/model.ckpt-49491 on ICDAR2015-TRW public_test_data, then test the result with the detection_eval_tool but only get 0.4697 F1-score. I don't know what wrong in it.

could you share the trained model?

Multi scale test

Test error： undefined symbol：PyInstanceMethod_Type

When I run python eval.py, it comes to the error: ImportError: home/EAST-master/lanms/adaptor.so: undefined symbol: PyInstanceMethod_Type

GPU usage is zero?

@argman @zxytim Hi, I find a new problem, the Volatile GPU-Util is 0 but GPU memory-Usage is about 23 G, and I print the running log, watched that the model load the dataset all the time. Why the model not do actual computation on GPU?

May be some error in locality_aware_nms.py

In the line 68 of locality_aware_nms.py, you have writen:
return standard_nms(np.array(polys), thres)
However, in the paper, the author wrote that:
return STANDARDNMS(S)
Does it has better performance or is just an error?

Error occurs when trying eval.py

When trying to use the eval.py, error occurs, and it looks like the adaptor.so may have something wrong (e.g. complied by not suitable g++). I'm using g++ 5.4.0

The error report is like this:

Find 1 images
40795 text boxes before nms
Traceback (most recent call last):
  File "eval.py", line 194, in <module>
    tf.app.run()
  File "/home/aqua/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "eval.py", line 160, in main
    boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer)
  File "eval.py", line 98, in detect
    boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
  File "/home/aqua/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9
    from .adaptor import merge_quadrangle_n9 as nms_impl
ImportError: /home/aqua/EAST/lanms/adaptor.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm

Can you help me fix it?

slim.get_trainable_variables(),

Hell argman!
My tensorflow version is 1.01. But I encounter the problem as :

File "multigpu_train.py", line 135, in main variable_restore_op = slim.assign_from_checkpoint_fn(FLAGS.pretrained_model_path, slim.get_trainable_variables(), AttributeError: 'module' object has no attribute 'get_trainable_variables'
And I check tensorflow slim API by Ipython, function"get_trainable_variables" is not available in my version.

So, maybe you should consider to upgrade the required TF version.

About the python version

It seems that we should use tensorflow with python3.x to support lanms, which is a process in eval.py. Is it possible for us to use tensorflow with python2.7 to run eval.py?

Thanks.

Model Testing Issue

Hello !!!

I've successfully trained the model on Arabic dataset, however when I try to test the model, it just returns only the same image as before without any text boxes, can you please help me in that? I've checked the paths again and again and they are correct.

Thanks
Burhan Ul Tayyab

argman / east Goto Github PK

east's Issues

return cost

Recommend Projects

Recommend Topics

Recommend Org