argman / east Goto Github PK
View Code? Open in Web Editor NEWA tensorflow implementation of EAST text detector
License: GNU General Public License v3.0
A tensorflow implementation of EAST text detector
License: GNU General Public License v3.0
in icdar.py
, then angle and theta are calculated by np.arctan
L362
So there exists a risk that value is divided by zero.
I change it to np.arctan2(dy,-dx)
, it seems works well.
I run the python3 eval.py
, or python3 run_demo_server.py
, it will run from icdar import restore_rectangle
. However, the terminal show that 1000 training images in ./data/train/
and then it may load these training image and it has taken so much time. Does anybody has the same situation?
I tried to change this code to class-balanced cross entropy function.
def dice_coefficient(y_true_cls, y_pred_cls,
training_mask):
'''
dice loss
:param y_true_cls:
:param y_pred_cls:
:param training_mask:
:return:
'''
eps = 1e-5
intersection = tf.reduce_sum(y_true_cls * y_pred_cls * training_mask)
union = tf.reduce_sum(y_true_cls * training_mask) + tf.reduce_sum(y_pred_cls * training_mask) + eps
loss = 1. - (2 * intersection / union)
tf.summary.scalar('classification_dice_loss', loss)
return loss
However, I don't understand why there is training mask and what its' role is. I would be thankful if somebody tells :) Thanks
Hi,
I am trying to exactly reproduce your released model. Could you provide some more details about the training. In the readme it looks like you use 14 images per gpu and I see you've mentioned training with 4 gpus? Was your total batch size then 56? Did you adjust the learning rate at all for such large batch size or was the default one used?
Also, you mention using icdar2013 training set as well. Anything special here or is sampling between icdar2015 and 2013 1:1.
Any more details that you think may be relevant?
Btw. Small typo in the readme "Thanks for the author's (@zxytim) help! Please site his paper if you find this useful." site -> cite
Thanks for releasing the code. It's great!
The code use the dice_coefficient loss but not balanced cross-entropy loss in the paper, so I follow the paper try the balanced cross-entropy loss, but the performance is very poor with balanced cross-entropy loss which can't achieve the result in paper. I can't figure out this problem why dice_coefficient loss is greater than balanced cross-entropy loss.
Hi, @argman
I get the following error during training:
,,,
Step 000830, model loss 0.0111, total loss 0.0264, 71.25 seconds/step, 0.39 examples/second
Step 000840, model loss 0.0121, total loss 0.0272, 71.00 seconds/step, 0.39 examples/second
Step 000850, model loss 0.0124, total loss 0.0274, 71.36 seconds/step, 0.39 examples/second
Step 000860, model loss 0.0130, total loss 0.0279, 71.22 seconds/step, 0.39 examples/second
Step 000870, model loss 0.0107, total loss 0.0255, 71.07 seconds/step, 0.39 examples/second
Step 000880, model loss 0.0109, total loss 0.0256, 70.99 seconds/step, 0.39 examples/second
StepTraceback (most recent call last):
File ".../EAST/icdar.py", line 657, in generator
score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
File ".../EAST/icdar.py", line 520, in generate_rbox
if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
File ".../EAST/icdar.py", line 248, in point_dist_to_line
return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
but, training proceeds without stopping.
Do you know anything about this problem? And is it a serious problem in model creation?
To serve better as a baseline for further research and those who just want a fast text detector, we are planning to polish this repo from "just works" to "works great". Here are our current plans:
As we both have our full time jobs, this roadmap will not be subject to a timetable. If you want take one of the tasks above, please start a dedicated issue for that task and kindly submit a pull request.
Also, any suggestions are warmly welcomed.
I have trained a model with this command:
python multigpu_train.py --gpu_list=0,1,2 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/backup/EAST/
--text_scale=1024 --training_data_path=/DATA/EAST/data/ --geometry=RBOX --learning_rate=0.0001 --num_readers=12
and i've waited till:
Step 007130, model loss 0.0316, total loss 0.0827, 7.33 seconds/step, 5.73 examples/second
first Question should i make him , do more iterations or this is enough ???
second Question:
The output of all the images seems to be 1 size , why this is happening ?
i couldn't see many variations in the output dimensions
so what's missing to be able to detect blocks of text ?
I have alot of different text regions with different dimensions and scales , how can i set the parameter "TEXT SCALE" correctly ?
how can i put the right number ?
and does it depened on the parameter "INPUT SIZE"
I just want to modify loss function, from dice coefficient to class balanced xentropy, but I still don't get what to change.
def batch_flatten(x):
"""
Flatten the tensor except the first dimension.
"""
shape = x.get_shape().as_list()[1:]
if None not in shape:
return tf.reshape(x, [-1, int(np.prod(shape))])
return tf.reshape(x, tf.stack([tf.shape(x)[0], -1]))
def xentropy(y_true_cls, y_pred_cls,
training_mask):
eps = 1e-7
z = batch_flatten(y_pred_cls)
y = tf.cast(batch_flatten(y_true_cls), tf.float32)
count_neg = tf.reduce_sum((1. - y) * training_mask)
count_pos = tf.reduce_sum(y * training_mask)
beta = count_neg / (count_neg + count_pos)
loss_pos = -beta * tf.reduce_mean(y * tf.log(z + eps))
loss_neg = (1. - beta) * tf.reduce_mean((1. - y) * tf.log(1. - z + eps))
cost = tf.subtract(loss_pos, loss_neg, name=name)
Would this code be work?
Hello Argman!!!!
Hope that you are in your finest health, here are some of the results, trained on Arabic Dataset
Basically, what our main purpose was to detect Urdu News-Tickers, therefore I'm sending you those too.
You can use the photos anyway you want, just cite my github link.
Anyway I can also give you the model if you want!!!!
Thanks Again
Burhan Ul Tayyab
Hi, @argman
I want to train the model using my own dataset, what can I do for this?
How to create the gt text files? And what's the each parameter stands for?
@argman hi, in eval.py, if using the default max_side_line=2400, the inference result is strange, the large text will not be detected, but even the very small text can be detected. however, when the max_side_line is set to 512 the same as INPUT_SIZE, the very large text can be correctly detected , but the small text will be ignored. thanks!
hi, have you tried pvanet as basenetwork? I tried pvanet using caffe but encountered overfitting problem.
my training sets is 950 images from icdar 2015 trainningsets( the other 50 images as validation sets) and 229 images from icdar 2013.
model is trained by online data augmentation which includes scaling and rotations between ±30°. iou loss overfits a lot that when trainning iou descend to 0.25 validation iou loss still stays high at 0.7. I think I have confirmed everything so much that I can not solve this problem. please help me, Mr. Argman!!!!!!. I have cost two month on this problem.... 555555
In the Section of 3.3.1, the reference length ri = min(D(pi, p(i mod 4)+1),D(pi, p((i+3) mod 4)+1)). When i = 1, r1 = min(D(p1, p2), D(p1, p1)). So r1 = 0, does it? Can you explain in more detail or which part of code is compute this. Thanks!
I changed loss function and tried to train data through EAST. However, when I tried it and look how training was going on, I found something weird.
Above pictures are input data and corresponding score map. Shouldn't gt area be black and elsewhere white, rather than the picture? (In picture, gt area is white and elsewhere black)
I have .jpg images in the folder, I trying to run eval.py, I have trained the model and have checkpoint file.
The command I am using is: python3 eval.py --test_data_path=/home/kamranjanjua/EAST/icdarData/ --gpu_list=0 --checkpoint_path=/modelse/ --output_path=/home/kamranjanjua/EAST/output_icdar/
icdarData folder contains the images.
However, when I run it, the error I get is: raise RuntimeError('Cannot compile lanms: {}'.format(BASE_DIR))
Any solution?
def load_annoataion(p):
text_polys = []
text_tags = []
if not os.path.exists(p):
return np.array(text_polys, dtype=np.float32)
with open(p, 'r') as f:
reader = csv.reader(f)
for line in reader:
label = line[-1]
# strip BOM. \ufeff for python3, \xef\xbb\bf for python2
line = [i.strip('\ufeff').strip('\xef\xbb\xbf') for i in line]
x1, y1, x2, y2, x3, y3, x4, y4 = list(map(float, line[:8]))
text_polys.append([[x1, y1], [x2, y2], [x3, y3], [x4, y4]])
if label == '*' or label == '###':
text_tags.append(True)
else:
text_tags.append(False)
return np.array(text_polys, dtype=np.float32), np.array(text_tags, dtype=np.bool)
In here, why 'if label is '*' or ###' then text_tag is true, not false? Shouldn't it be vice versa? If so, what if label has text information?
Hi,
I can't reproduce the 80.83 f1score when directly run python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \ --text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \ --pretrained_model_path=/tmp/resnet_v1_50.ckpt
on the ICDAR2015+2013 training images.
Could you please tell me the parameter configurations of your experiment that achieves 80.83 f1-score?
about: 1. the batch size per GPU; 2. the number of GPUs ;3. the initial learning rate 4. the number of steps that you train your model for.
Thank you very much.
Thanks for sharing this excellent repo. I noticed that the classification loss function you used in code is different from the paper. you use dice coefficient instead of cross entropy. Could you provide more detail on this part?
The function restore_rectangle_rbox
in icdar.py is so complicate that after spent a lot time to read and study it, but I still can't understand it! Could you provide more information or comments about this function?
When I train the model with dataset ICDAR2015, I meet the error:
Cross point does not exist
Traceback (most recent call last):
File "/home/lairf/EAST/icdar.py", line 657, in generator
score_map, geo_map, training_mask = generate_rbox((new_h, new_w), text_polys, text_tags)
File "/home/lairf/EAST/icdar.py", line 520, in generate_rbox
if point_dist_to_line(p1, new_p2, p0) > point_dist_to_line(p1, new_p2, p3):
File "/home/lairf/EAST/icdar.py", line 248, in point_dist_to_line
return np.linalg.norm(np.cross(p2 - p1, p1 - p3)) / np.linalg.norm(p2 - p1)
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
Why it happens? Does it influence the training result?
Hi!!!
I have been trying to initiate the training for Arabic Datasets for this model but as soon as try to start it, it gives me this
"poly in wrong direction "
The dataset consists of 9 values compromising (x0,y0) to (x3, y3) clockwise and one word to describe the selected region.
I am using Tensorflow v1.2 using Python 3.5 and I have successfully initiated the demo on my server.
I request you to please guide me on this issue.
Thanks
Burhan Ul Tayyab
Hi, @argman
I have a next problem (training )
....
2017-08-31 18:22:46.750620: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-31 18:22:46.750670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-31 18:22:46.750683: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Do you know this problem? (I used 2 gpu)
I have made a Python 2 compatible fork.
https://github.com/AKSHAYUBHAT/EAST
I have a dataset that has 12982 images
when i started training it , i used 24 reader but all what i see is
) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:23:00.0)
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
12982 training images in /media/rmmal/data/East_data/
and the gpu utilization is 95% and all the 24 core are 100%
it has been 1 hour till now and nothing changed ?
so is there any thing wrong happened ?
I run the east_icdar2015_resnet_v1_50_rbox model/model.ckpt-49491 on ICDAR2015-TRW public_test_data, then test the result with the detection_eval_tool but only get 0.4697 F1-score. I don't know what wrong in it.
When I run python eval.py, it comes to the error: ImportError: home/EAST-master/lanms/adaptor.so: undefined symbol: PyInstanceMethod_Type
@argman @zxytim Hi, I find a new problem, the Volatile GPU-Util is 0 but GPU memory-Usage is about 23 G, and I print the running log, watched that the model load the dataset all the time. Why the model not do actual computation on GPU?
nvidia-smi Info:
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
5 Tesla M60 24GB On | 0000:85:00.0 Off | Off |
| N/A 35C P0 56W / 250W | 23377MiB / 24472MiB | 0% Default
In the line 68 of locality_aware_nms.py, you have writen:
return standard_nms(np.array(polys), thres)
However, in the paper, the author wrote that:
return STANDARDNMS(S)
Does it has better performance or is just an error?
When trying to use the eval.py
, error occurs, and it looks like the adaptor.so
may have something wrong (e.g. complied by not suitable g++). I'm using g++ 5.4.0
The error report is like this:
Find 1 images
40795 text boxes before nms
Traceback (most recent call last):
File "eval.py", line 194, in <module>
tf.app.run()
File "/home/aqua/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "eval.py", line 160, in main
boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer)
File "eval.py", line 98, in detect
boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
File "/home/aqua/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9
from .adaptor import merge_quadrangle_n9 as nms_impl
ImportError: /home/aqua/EAST/lanms/adaptor.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm
Can you help me fix it?
Hell argman!
My tensorflow version is 1.01. But I encounter the problem as :
File "multigpu_train.py", line 135, in main variable_restore_op = slim.assign_from_checkpoint_fn(FLAGS.pretrained_model_path, slim.get_trainable_variables(), AttributeError: 'module' object has no attribute 'get_trainable_variables'
And I check tensorflow slim API by Ipython, function"get_trainable_variables" is not available in my version.
So, maybe you should consider to upgrade the required TF version.
It seems that we should use tensorflow with python3.x to support lanms, which is a process in eval.py. Is it possible for us to use tensorflow with python2.7 to run eval.py?
Thanks.
Hello !!!
I've successfully trained the model on Arabic dataset, however when I try to test the model, it just returns only the same image as before without any text boxes, can you please help me in that? I've checked the paths again and again and they are correct.
Thanks
Burhan Ul Tayyab
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.