michalbusta / e2e-mlt Goto Github PK

View Code? Open in Web Editor NEW

292.0 292.0 84.0 16.47 MB

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text

License: MIT License

Python 15.50% Makefile 0.04% C++ 84.13% Dockerfile 0.27% Shell 0.03% C 0.02%

e2e-mlt's People

Contributors

Stargazers

Watchers

Forkers

alwc hung96ad nataliehanzhina gitchenguang elavin11 fendaq godofsmallthings noji36 doglegs2333 ganji15 clscy feedforward hn18001 blakexiaochu thorpham inglejaya95 llf10811020205 fireae legolas123 yongduek yuckfu happog coorful anneyanggg hmh123456 houxy12 pcampr roy-algoritm zzmcdc 2016xjtuzyt zengqi0730 kapitsa2811 billyzju sunxingxingtf gds101054108 gottacatchemai xgmiao klarajanouskova ruguowoshiyu chunxia75qin styjb thisisisaac rkshuai qq2737499951 brownsweater mathpopo rj1997 dun933 hyunjinp chengmuni66 miteng0215 lhwcv zhangbo2008 jxncyym dy1998 cnzeki zhy3213 duxiangcheng wangjianyuan10 sherlockeded peternara aniketgurav tiendv fansofdetective jy-justin wuxiaolianggit hackveda freeworkearth nhh1501 real-yej mohammed-emad hgamal11 falimoradi mohamadmansourx chenzhwsysu57 davidko3 mxin262 xiaotian12333 neverstoplearn innat yash0307 tony8888lrz swaileh agoryuno

e2e-mlt's Issues

per-feature text/no-text confidence score

Dear @MichalBusta ,
According to your paper, The prediction output of Text Localization part consists of seven channels:
per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.

Then I checked your code here:

E2E-MLT/models.py

Lines 428 to 430 in 2858358

 segm_pred = torch.sigmoid(self.act(x)) 

 rbox = torch.sigmoid(self.rbox(x)) * 128 

 angle = torch.sigmoid(self.angle(x)) * 2 - 1

and didn't find any value for that confidence score.

Can you help to explain it?
Thanks a lot,

Kei,

loss always -2.00

When I train my own dataset, the values of ·loss· and seg_loss· are always equal -2.00. Other values are 0. Is there a problem? , I have not changed anything ,excep data set and batch_size(32), and train start from scratch

seg_loss: -1.101

when i training on my own data, the seg_loss always a negative number, is this normal. thanks.

I am unable to lead the e2e-mlt model, incompatible tensor sizes. Any help?

Traceback (most recent call last):
File "/home/rmunshi/E2E-MLT/net_utils.py", line 30, in load_net
v.copy_(param)
RuntimeError: The size of tensor a (8400) must match the size of tensor b (7500) at non-singleton dimension 0
Traceback (most recent call last):
File "/home/rmunshi/E2E-MLT/net_utils.py", line 30, in load_net
v.copy_(param)
RuntimeError: The size of tensor a (8400) must match the size of tensor b (7500) at non-singleton dimension 0

Some problems when I train the model

When I use ICDAR2015 to train the model,
Inside the file sample_train_data/MLT/trainMLT.txt are icdar2015 localization training images such as icdar-2015-Ch4/Train/img_1.jpg and inside sample_train_data/MLT_CROPS/gt.txt are icdar2015 recognition training images such as word_1.png, "Genaxis Theatre".
I have not changed other paths. When I train the model by:
python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt
the output are:
root@10ca3ad2a7d1:/home/zy/jupyter/recognition/spotter/E2E-MLT-master# python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt
Using E2E-MLT
loading model from e2e-mlt.h5
e2e-mlt.h5
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
4468 training images in sample_train_data/MLT_CROPS/gt.txt
4468 training images in sample_train_data/MLT_CROPS/gt.txt
I waited for half an hour, but no more output. can you help me? thank you.

When was train_ocr.py used?

@MichalBusta, congrats on the great performance by your newest model. Can you give a few more details on your training? When/why did you use the train_ocr.py code?

about codec.txt

@MichalBusta Hi, there are 7398 characters in codec.txt, in the models.py, why self.conv11 = Conv2d(256, 8400, (1, 1), padding=(0,0))? I think it should be 7399

data generator random seed specification not working?

I'd like to be able to train in a repeatable manner--in particular, generate batches of images such that every time I start training from scratch, the images are presented for training in the same order. the data_util.GeneratorEnqueuer class accepts a random_seed parameter, but even when I set it a fixed number, the images are in a different order each time I train. I'm only using one reader. @MichalBusta, any ideas?

Your EAST implementation vs. argman/EAST

I notice that your EAST implementation performs much better than argman/EAST when there are large fonts.

Here is the prediction from your model:

And here is the prediction from argman/EAST

Do you mind sharing what's the key components of improving such detection? Thanks!

Need help to understand pts=np.roll(pts, 2)

I'm going deeper to your repo and I cannot understand this below code:

E2E-MLT/data_gen.py

Lines 122 to 123 in 2858358

 if is_icdar: 

 pts = np.roll(pts,2)

I've checked some examples with this and find out that will shift an array liked:
[1224. 2041. 1537. 2041. 1537. 2134. 1224. 2134.]
into:
[1224. 2134. 1224. 2041. 1537. 2041. 1537. 2134.]

I know how numpy.roll function works, but I don't understand why you did it only for ICDAR dataset.
Can you help to explain this?
Thanks,

403 Forbidden

Hello, MichalBusta
I tried to download the model, but I got the following error:

Forbidden
You don't have permission to access /public_datasets/SyntText/e2e-mlt.h5 on this server.
Apache/2.4.25 (Debian) Server at ptak.felk.cvut.cz Port 80

score_map vs. training_masks

Dear @MichalBusta,
I'm trying to understand the difference between score_map and training_masks, which are generated from:

E2E-MLT/data_gen.py

Line 555 in 2858358

return score_map, geo_map, training_mask, gt_idx, gt_out, labels_out

Having some questions come in with me:

Which one is more important to train the model? (I checked and found that score_map always have all ground-truth boxes, but training_masks is not)
Why do you generate score_map and training_masks? (Because according to my work to understand your works, I think just score_map is enough. Pls help to fix if I have misunderstood, thanks!)

Why do you just add a box to training_masks if its text contains " ", according to this line of code:

E2E-MLT/data_gen.py

Lines 478 to 496 in 2858358

 if txt.find(" ") != -1: 

 pts_line = np.copy(pts2) 

 c1 = ( pts[1] + pts[2] ) / 2 

 dw1 = (pts[2] - c1) / 1.2 

 pts_line[2] = c1 + dw1 

 dw2 = (pts[1] - c1) / 1.2 

 pts_line[1] = c1 + dw2 

 c1 = ( pts[0] + pts[3] ) / 2 

 dw1 = (pts[3] - c1) / 1.2 

 pts_line[3] = c1 + dw1 

 dw2 = (pts[0] - c1) / 1.2 

 pts_line[0] = c1 + dw2 

 cv2.fillPoly(training_mask, np.asarray([pts_line.round()], np.int32), 0)

Hope to hear from you soon,
Thanks in advance,

th13 = (2 * xc - input_W - 1) / (input_W - 1)

when i read the code, i am confused about th13 = (2 * xc - input_W - 1) / (input_W - 1), cloud you give some links for explain this code. thanks.

Specific format of annotation

Can you please explain the format of annotation? Usually there are only 1 integer and 4 float numbers. I see there are 5 float numbers here.

Error when running demo.py

Hello @MichalBusta
I have a problem running demo.py
Apparently I cannot import get_boxes. The WrapCTC Loss was installed without a problem apparently, however this is the error I am getting..

make: Entering directory '/home/caffe/E2E-MLT/nms'
g++ -o adaptor.so -I include -std=c++11 -O3 -I/home/caffe/anaconda3/envs/pytorch36/include/python3.6m -I/home/caffe/anaconda3/envs/pytorch36/include/python3.6m -Wno-unused-result -Wsign-compare -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -fdebug-prefix-map==/usr/local/src/conda/- -fdebug-prefix-map==/usr/local/src/conda-prefix -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -flto -DNDEBUG -fwrapv -O3 -Wall -L/home/caffe/anaconda3/envs/pytorch36/lib/python3.6/config-3.6m-x86_64-linux-gnu -L/home/caffe/anaconda3/envs/pytorch36/lib -lpython3.6m -lpthread -ldl -lutil -lrt -lm -Xlinker -export-dynamic adaptor.cpp include/clipper/clipper.cpp --shared -fPIC
g++: error: unrecognized command line option ‘-fno-plt’
Makefile:10: recipe for target 'adaptor.so' failed
make: *** [adaptor.so] Error 1
make: Leaving directory '/home/caffe/E2E-MLT/nms'
Traceback (most recent call last):
File "demo.py", line 10, in
from nms import get_boxes
File "/home/caffe/E2E-MLT/nms/init.py", line 8, in
raise RuntimeError('Cannot compile nms: {}'.format(BASE_DIR))
RuntimeError: Cannot compile nms: /home/caffe/E2E-MLT/nms

Any idea? Thanks in advance for your help.

what does geo_type mean

@MichalBusta Hi, thank you very much for your sharing. Could u tell me what's the meaning of geo_type at generator function in data_gen.py?

Please update it to compatible with latest torch version

why my loss is negative and 0.000000

why my loss is negative and 0.000000,please help child!!@MichalBusta
like this:

Detect rectangle box only

Hi @MichalBusta,
First of all, thanks for your great work!
My training dataset has only samples with rectangle boxes! After 200 epoch of training, the model can predict liked below image:

Because of wrong angle detecting, that leads to wrong OCR result!

Then I think about a model with no angle (or unchanged angle) to predict rectangle boxes only.
I've known that your model with angle is really awesome, but in my case to predict rectangle boxes only.
How can I do it?

Thanks a lot!

Function of ocr_feed_list

hi @MichalBusta
I am sorry for this naive question.

I want to fine tune your model e2e-mlt.h5 with my dataset. For this, I have various images along with the ground truth of the texts in the image.

Now, the train.py has 2 parameters:

-train_list
-ocr_feed_list

-train_list: points to the directory where you have images along with their gt
-ocr_feed_list: points to the directory where you have cropped words

Is having cropped words mandatory for training? And is there any way to train the model without using the cropped word images (i.e. only using Scene Images with gt of all the text in it).

My priority is not only to achieve better text localisation but also a better text recognition, hence the OCR branch needs to be trained too, but using the gt of text present in the Scene Image and not separate cropped word images.

process_boxes() bug: does not iterate over all samples in the batch

There appears to be a bug in the process_boxes() function in train.py.

Line 70 should be:

 for bid in range(iou_pred.size(0)):

not

 for bid in range(iou_pred[0].size(0)):

In the main function of train.py, iou_pred[0] is passed to process_boxes(), not iou_pred. As written, it will not iterate over all samples in the batch, but over the length of the second dimension of iou_pred[0] (which is 1, the number of image color channel, blank/white in this case).

What does label named ### mean?

Dear @MichalBusta,
Some lines in ground-truth
522,145,535,144,537,160,523,161,###
535,141,598,137,599,155,537,159,BUTTER
608,136,649,133,650,150,609,153,TRUE
649,132,723,125,724,143,650,150,###

What does ### mean? What is its mean for training?

Thanks,

Just curious what are the bbox values in the raw dataset labels?

The MLT format labels have been given. Thanks a lot!

But i am curious what do these floats in the raw label mean?
0 0.46809631347656255 0.8593862826680818 0.0739853187168 0.0808596013931 -0.0298556635598 منذ

The first number is the language label, e.g., 0 for Arabic, 2 for Chinese...
The 2nd and 3rd number is the center point of the bounding box.
But I have no idea about the others, thanks in advance if someone can clear my question.

A small bug in ocr_utils.py

Hello Busta,
Maybe there is a bug in func print_seq_ext(ocr_utils.py line 28):
if c > 3 and c < len(codec):
should be:
if c > 3 and c < (len(codec)+4):

training problem: the program is stopped

when I train ,I meet the problem below.

CUDA_VISIBLE_DEVICES="0,1" python3 train_ocr.py -train_list=sample_train_data/MLT/trainMLT.txt -valid_list=data/valid/valid.txt -model=e2e-mlt.h5 -debug=1 -batch_size=8 -num_readers=5

7398
loading model from e2e-mlt.h5
e2e-mlt.h5
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt

the program is like to stoped. It is mostly is the problem of data. Please help me if you have solved this problem, thank you!!!!

Questions on training the model from scratch

First of all, congrats on getting the best paper award at the 3rd International Workshop on Robust Reading and releasing the v2 of the paper on arXiv!

Few days ago I've been trying to train the model from scratch and I've came across some questions:

1/ Why do you use instance norm instead of other normalization methods like batch norm, group norm and etc.? Have you compared the results of using different normalization methods?

1.b/ How come for conv_dw_in, the InstanceNorm2d layer didn't set affine=True (by default, affine=False, which means there are no learnable parameters for this instance norm layer) while the rest of your InstanceNorm2d layers have affine=True.

2/ How long does it take to train the e2e-mlt.h5 model? Was it trained on multiple GPUs?

3/ Have you tried using transfer learning like how argman/EAST uses pretrained resnet-50?

Thanks!

process_boxes() unknown chars and misidentifies chars

I started training again and noticed many characters not being identified as existing in the codec_rev. The data is from icdar2015, icdar2017 (MLT) and icdar2019 (MLT) and the provided codec.txt is used.

Stranger still is that the same error (unknown char) is showing up for data from icdar2015, which is completely composed of english characters.

As shown by the image above, the character "थ" is not found in the codec_rev, but is found in GT for image 277 from icdar2015. However that's the GT from image 277:

Is there some file enconding for the codec.txt that I must set? Can you provide some information about why is this happening?

get error codes when run demo.py

when I run python demo.py -model=e2e-mlt.h5 on my image using pretrained model, I get error chars like:

Using cuda ... 炯豚树糊Ｖ鐫혁 労潔菉瘙늠캄视胎興瘙菁綜헤誉숯덧顾瞑주诈밟赌漓誉瘙룹覆払멸戈捉貯Ω汽逢茉帛ш²터勇炯君ш羞鐫労성炯嫉물孤勇軒扮挣労彬 송労厂孤蓝風卓炯勇客瞑쭉텟嫉扮勇瞑瘙勇آ喪宅勇逢興凰勇も隣亨г닫驶姨송钰琢漬勃逢因逢® ヅ労塔厂赌곡労尬 .豚己珊ш独备炯鐫합勇г힉炯柚機労孤树苛翔軒扮찜廠肯码띈穫송労뱃띈孤漬�拼払慢炯끼南拼勇煸糖炯勇炯آ拓勇瞑ュш孤勇戈軒ш擘宅г杏萃赌굳г逢夢労勇悉慢防炯视 굳汽赌퓨 廠╩Ｖ軒浩漬송労棒卓苛밍炯펍ヅ堅逃庶ш²拐勇 헤 몬ø 杏捉封绩앉树Ｖ労뱃孤漬树勇瞑糖炯勇码客骏밍马蠡結송戈己凰ш勇гآ顾炯켭炯 7外旧慶誉鼠廠殖îヅ송币শ琢牟￥勇炯琢顾炯候钿夢ル緒视匾물瘙ø펍숱 汽ш 拡咖勇炯豚菉障噜廠戈讃آ간勇弟厂视آ彬3锡송労 힉炯阁坤띈孤찜も뒷興찜굳г逢卓候물 树勇可簧视勇卓苛송労펍ш夢労杏誉썼반庶ш ²咖씻 炯独몬ø盛根廠焱蠡안炯柚尬労充兵송펍费貯 굳逢扮翰鐫Ｖ翔缤漬戈勇風힉■橋텟瞑牟豚众遺펍汽ш간勇炯 힉ø沈縛ḥ╩逑系룹임 ă惯俏勇г閉求炯켭 菉잘 漬г 룩균漱汰尬労妍ぼ众良涼╩翘尼년 渣 헤崩戈噜讐盯╩誉쟁 旺ш굳轻룩尬候鉄漬邦╩杏ш臭若労শ臭ш孤련誉ぃ鐫労绩 묽労誉ֳ孤顾炯勇巷漬찜 訴7淺7南廠勇炯縛労视去币労囤Ω孤굳守牟蠡汽²茸瞑 헤崩砂贺菉ω칠╩겉誉ム树旺멸勇R轻瘙尬候粤未╩ш牟ш 労헤囤争僅顾勇防入瞑鐫労绩脯瞑労—ш纽顾候府瞑찜貯淺南鎖书 헤 몬豚菉障备鐫간╩Ｖ煸炯舊Ｖ測视労彬惑担송煸Ð炯勇風浜夢瞑诈労誉란آ 찡澤룹炯ш搓ш遺労丧 ш간勇炯揉آ힉労간労劈逢孤誉싼牟树ｆ孤兵炯视囤Ω孤顾렷녹 ш瞑労守ш牟视홈郴챌찜횟労塑综눙勇炯 앨泼崭备╩労련왕ш찜련暦犊捉㡢瑅ш茗孤居旺 헤Ω춥抬터룹形산封柚杏炯란猕豚祸帛淺²閠柜

training my own

作者用的python3.几的版本呀

Read by image instead of read by videocapture

About training

Hi, Thanks for shraring amazing code.

I have some question and ask some favor.

I run demo using your pre-trained model.

It works fine, but sometimes it recognized as a wrong language.( real: Korean word, pred: Chinese word)

So, I try to make my own model for two languages(Korean, English )

In your README, you shared Synthetic MLT Data, but there isn't English dataset.

Can you also share English dataset like other language?

I know there is English data (http://www.robots.ox.ac.uk/~vgg/data/scenetext/)

Honestly ,I have no idea how to pre-processing for training.

So If you share your English dataset, It would be very helpful for me.

And I have a question about the loss value in the training process.

epoch 29[297400], loss: -0.740, bbox_loss: 1.441, seg_loss: -1.745, ang_loss: 0.142, ctc_loss: inf, rec: 0.53351 in 0.008

ctc_loss looks weird. but when I run the demo using my trained model, It works fine though.

I want to make sure whether I use dataset properly or not.

Here is my sample.

"train_list" in train.py

4 0.6138464864095052 0.9437334566510797 0.100082146022 0.0380848404567 -0.0165453118083 승인
4 0.7581702677408855 0.9419909103653437 0.15256916863 0.0400553852073 -0.0165452067969 신청을
4 0.09844179789225262 0.4498745809216279 0.127543301498 0.0471818991097 0.0601694282711 킬러와
4 0.14116963704427085 0.934938899501977 0.153929254901 0.0502906237088 -0.00324781919693 클락슨은
4 0.30354619344075523 0.9330855692100061 0.0971145094056 0.0515150747853 -0.0032484308943 같은
4 0.127603022257487 0.5502413635996427 0.106233448994 0.0524479823728 0.0817744338364 인수를
4 0.1683737055460612 0.8213418556825959 0.105878256893 0.0450125720438 -0.0295469131212 수요와
4 0.7358302307128907 0.8595470419185295 0.195197484112 0.0518889909319 -0.0354592041309 갑이다
4 0.5390819803873699 0.710513298238861 0.090994413401 0.0300411324556 0.395192956606 우리당은
4 0.04096550623575847 0.7650540020053984 0.0663619143835 0.0345938411181 -0.0283670735217 2005
4 0.1388903299967448 0.7581292590955748 0.0735245075651 0.0328411962243 -0.0289554923933 03:05
4 0.1029958724975586 0.311124815557995 0.159051668836 0.0493272034958 0.0643214178445 아니라

"ocr_feed_list" in train.py

My question is so basic since I'm a newbie in this filed. Sorry for that(also my bad English).

Request for the evaluation code

Thanks for your great work!
Could you please release the evaluation code for Table 5 and Table 10 in the paper?

Weird bounding box results

Today I've tested the model with a few images and I'm getting some weird results, there are 2 cases that I'll highlight:

Here are three OCR prediction results

Notice that some of the text recognition predictions include words/characters from the neighboring bounding boxes. For example, for the first image, you can see the bounding box that predicts at $50000 contains only $50000, but it can still predict at for some reason. Is that a bug?

For tiny sequence of characters, I'm getting some really weird bounding boxes

Notice that for this particular image I didn't resize the image (i.e. im_resized = im in demo.py). Here is the input image if you are interested in reproducing the predictions:

Can't find script identification model

It seems the current implementation of the E2E model only output "word transcription". Are you planning to release the part that output the "script identification"?
There are 3 outputs from the ocr_image: det_text, conf, and dec_s. I know what det_text is and I think conf means confidence, but how about dec_s?

Here is a partial log for the output of ocr_image function:

conf: [[ 0 87]] dec_s: [[87]]
conf: [[ 0 69]] dec_s: [[63 69]]
conf: [[ 0 47]] dec_s: [[47]]
conf: [[ 0 48]] dec_s: [[47 48]]
conf: [[ 4 47]] dec_s: [[46 47]]
conf: [[ 4 95]] dec_s: [[94 95]]
conf: [[ 3 39]] dec_s: [[38 39]]
conf: [[ 4 55]] dec_s: [[54 55]]
conf: [[ 0 63]] dec_s: [[55 63]]
conf: [[ 0 71]] dec_s: [[70 71]]
conf: [[ 3 47]] dec_s: [[ 9 46 47]]
...

For conf, what does the single digit at index 0 represent? The double digit at index 1 is the confidence in range [0,100] right? For dec_s, I'm not sure how to interpret the numbers. Do you mind explaining it in depth?

Thank you for releasing the code btw!

Model Evaluation

Is there a script for model evaluation for detection and/or end-to-end?

ctc_loss gets inf values and Unknow chars

When I use the pre-trained model you provide to continue training on the MLT-2019 dataset, the ctc_loss gets inf values at most steps. Is there something wrong with it?

Also, there are some characters that seem not included in the directory. Will this infect the training performance?

The screen prints are look like:

epoch 12[205000], loss: 8.969, bbox_loss: 5.751, seg_loss: -0.604, ang_loss: 3.349, ctc_loss: 9.002, rec: 0.00000 in -0.000
Unknown char: 铣
Unknown char: 綦
epoch 12[205100], loss: 5.823, bbox_loss: 5.086, seg_loss: -0.910, ang_loss: 2.095, ctc_loss: inf, rec: 0.02500 in 0.000
epoch 12[205200], loss: 2.607, bbox_loss: 3.810, seg_loss: -1.047, ang_loss: 0.875, ctc_loss: inf, rec: 0.02069 in -0.000
Unknown char: 铣
epoch 12[205300], loss: 2.268, bbox_loss: 3.619, seg_loss: -0.995, ang_loss: 0.727, ctc_loss: inf, rec: 0.03540 in -0.003
Unknown char: 桷
Unknown char: 灏
Unknown char: 綦
epoch 12[205400], loss: 1.631, bbox_loss: 3.175, seg_loss: -1.080, ang_loss: 0.562, ctc_loss: inf, rec: 0.00775 in -0.000
epoch 12[205500], loss: 1.843, bbox_loss: 3.320, seg_loss: -1.135, ang_loss: 0.659, ctc_loss: inf, rec: 0.02113 in 0.000
Unknown char: 捌
epoch 12[205600], loss: 1.517, bbox_loss: 2.975, seg_loss: -1.124, ang_loss: 0.577, ctc_loss: 7.738, rec: 0.05634 in 0.000
epoch 12[205700], loss: 1.242, bbox_loss: 2.867, seg_loss: -1.183, ang_loss: 0.496, ctc_loss: inf, rec: 0.01439 in 0.000
Unknown char: 綦
epoch 12[205800], loss: 1.263, bbox_loss: 2.826, seg_loss: -1.203, ang_loss: 0.527, ctc_loss: inf, rec: 0.02899 in 0.001
Unknown char: 灏
epoch 12[205900], loss: 1.187, bbox_loss: 2.795, seg_loss: -1.212, ang_loss: 0.501, ctc_loss: inf, rec: 0.05825 in 0.000
epoch 12[206000], loss: 1.216, bbox_loss: 2.815, seg_loss: -1.169, ang_loss: 0.489, ctc_loss: inf, rec: 0.03731 in -0.001
Unknown char: 螃
Unknown char: 捌
Unknown char: 閩
epoch 12[206100], loss: 1.194, bbox_loss: 2.801, seg_loss: -1.191, ang_loss: 0.492, ctc_loss: inf, rec: 0.04032 in -0.006
epoch 12[206200], loss: 0.760, bbox_loss: 2.559, seg_loss: -1.231, ang_loss: 0.356, ctc_loss: inf, rec: 0.04487 in 0.024

Vertical text recognition

Dear @MichalBusta,

This label is confusing me. Those are Japanese vertical text and I didn't find any algorithm in your works can solve vertical text-line recognition.

Sorry for my lack of knowledge if your works can solve it, I've just worked in OCR field for a year.
Can you show me which part of your works solve vertical text-line??
That is really interesting if my next works can contribute to your works a vertical text recognition module.
Thanks,

build_docker.sh not work

Having a docker folder in your project, I've tried to run build_docker.sh for environment creating.

But build_docker.sh doesn't work.

Does it have any available docker for your project?

how to train English scene text alone?

@MichalBusta
I appreciate your nice work!
i am new to scene text detection&recognition,which is part of my feature work.the procedure of training english scene text alone with my dataset.
could you please teach me how to train english scene text model with my dataset?

ImportError: /home/yll/E2E-MLT-master/nms/adaptor.so: undefined symbol: _PyThreadState_UncheckedGet

bad image

hello,i'm sorry to bother you
when i run the train_ocr.py, it always show the bad image message,

the validation dataset i used is part of ic15 test word images,could you please tell me why this codition happens?
thanks a lot!!!

Is it possible to use e2e-mlt weight file for android mobile???

I have a plan to make OCR app(on-device) for android mobile using my own training model.

I guess E2E-MLT is a multi-language OCR engine that shows the best performance so far.

Is there any way that I convert trained weight file on Android ???

作者你好,我在运行demo.py的时候有这个错

Traceback (most recent call last):
File "demo.py", line 10, in
from nms import get_boxes
File "/media/chen/软件/DeepCode/E2E-MLT/nms/init.py", line 8, in
raise RuntimeError('Cannot compile nms: {}'.format(BASE_DIR))
RuntimeError: Cannot compile nms: /media/chen/软件/DeepCode/E2E-MLT/nms

这个感觉是要编译里面的文件无法编译

Training_ocr all zeros

Hello, I want to retrain the model for Chinese only, however, when run python train_ocr.py the predicted label is all zeros, and the ctc loss is up and down

have you ever encountered this problem before? I also tried to change the learning rate, but there is nothing help.

RuntimeError: CUDA out of memory

Hello, I have met a problem when training the model as follow:

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 983.69 MiB total capacity; 209.53 MiB already allocated; 69.56 MiB free; 6.47 MiB cached)

could you give me any solutions?
Thank you!

VIDEOIO ERROR while running demo.py in CPU machine

Hi, while running the demo as follows:
python3 demo.py -model=e2e-mlt.h5 -cuda=0

I am getting the error as

make: Entering directory '/home/virenv/ocv-td/E2E/E2E-MLT-master/nms'
make: 'adaptor.so' is up to date.
make: Leaving directory '/home/virenv/ocv-td/E2E/E2E-MLT-master/nms'
e2e-mlt.h5
VIDEOIO ERROR: V4L: can't open camera by index 0

I have changed this line
sp = torch.load(fname)
as sp = torch.load(fname, map_location='cpu') in net_utils.py

(Running in a virtual environment with python 3.5.2, Ubunutu 16.04, CPU machine, NO CUDA, torch version (CPU) '1.0.1.post2', openCV 4.0.0)
Thank you.

training my own data

Evaluate Error nms

The ocr loss come to nan

I use my own Japanese dataset, and crop all words into single word images.
Then I use train_ocr to train the ocr network, (using e2e-mltrctw.h5 as pretrained model, but change the output size of the model from 7500 to 4748 that is number of word types of my dataset). But the loss come to nan very fast. Is there some reason for that? Thanks!

These are training loss:
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
epoch 0[0], loss: 55.214, lr: 0.00010
epoch 0[500], loss: 54.610, lr: 0.00010
epoch 0[1000], loss: 14.609, lr: 0.00010
epoch 0[1500], loss: 7.219, lr: 0.00010
epoch 0[2000], loss: 6.109, lr: 0.00010
epoch 0[2500], loss: 5.536, lr: 0.00010
epoch 0[3000], loss: 4.826, lr: 0.00010
epoch 0[3500], loss: 4.030, lr: 0.00010
epoch 0[4000], loss: 3.301, lr: 0.00010
epoch 0[4500], loss: nan, lr: 0.00010
epoch 1[5000], loss: nan, lr: 0.00010
save model: backup2/E2E_5000.h5
epoch 1[5500], loss: nan, lr: 0.00010
epoch 1[6000], loss: nan, lr: 0.00010
epoch 1[6500], loss: nan, lr: 0.00010
epoch 1[7000], loss: nan, lr: 0.00010
epoch 1[7500], loss: nan, lr: 0.00010
epoch 1[8000], loss: nan, lr: 0.00010
epoch 1[8500], loss: nan, lr: 0.00010
epoch 1[9000], loss: nan, lr: 0.00010
epoch 1[9500], loss: nan, lr: 0.00010

	segm_pred = torch.sigmoid(self.act(x))
	rbox = torch.sigmoid(self.rbox(x)) * 128
	angle = torch.sigmoid(self.angle(x)) * 2 - 1

	if txt.find(" ") != -1:

	pts_line = np.copy(pts2)

	c1 = ( pts[1] + pts[2] ) / 2
	dw1 = (pts[2] - c1) / 1.2
	pts_line[2] = c1 + dw1
	dw2 = (pts[1] - c1) / 1.2
	pts_line[1] = c1 + dw2


	c1 = ( pts[0] + pts[3] ) / 2
	dw1 = (pts[3] - c1) / 1.2
	pts_line[3] = c1 + dw1
	dw2 = (pts[0] - c1) / 1.2
	pts_line[0] = c1 + dw2


	cv2.fillPoly(training_mask, np.asarray([pts_line.round()], np.int32), 0)

michalbusta / e2e-mlt Goto Github PK

e2e-mlt's People

Contributors

Stargazers

Watchers

Forkers

e2e-mlt's Issues

Recommend Projects

Recommend Topics

Recommend Org