Git Product home page Git Product logo

e2e-mlt's People

Contributors

alwc avatar michalbusta avatar pcampr avatar yash0307 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e2e-mlt's Issues

per-feature text/no-text confidence score

Dear @MichalBusta ,
According to your paper, The prediction output of Text Localization part consists of seven channels:
per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.

Then I checked your code here:

E2E-MLT/models.py

Lines 428 to 430 in 2858358

segm_pred = torch.sigmoid(self.act(x))
rbox = torch.sigmoid(self.rbox(x)) * 128
angle = torch.sigmoid(self.angle(x)) * 2 - 1

and didn't find any value for that confidence score.

Can you help to explain it?
Thanks a lot,

Kei,

loss always -2.00

When I train my own dataset, the values of ·loss· and seg_loss· are always equal -2.00. Other values are 0. Is there a problem? , I have not changed anything ,excep data set and batch_size(32), and train start from scratch

seg_loss: -1.101

when i training on my own data, the seg_loss always a negative number, is this normal. thanks.

I am unable to lead the e2e-mlt model, incompatible tensor sizes. Any help?

Traceback (most recent call last):
File "/home/rmunshi/E2E-MLT/net_utils.py", line 30, in load_net
v.copy_(param)
RuntimeError: The size of tensor a (8400) must match the size of tensor b (7500) at non-singleton dimension 0
Traceback (most recent call last):
File "/home/rmunshi/E2E-MLT/net_utils.py", line 30, in load_net
v.copy_(param)
RuntimeError: The size of tensor a (8400) must match the size of tensor b (7500) at non-singleton dimension 0

Some problems when I train the model

When I use ICDAR2015 to train the model,
Inside the file sample_train_data/MLT/trainMLT.txt are icdar2015 localization training images such as icdar-2015-Ch4/Train/img_1.jpg and inside sample_train_data/MLT_CROPS/gt.txt are icdar2015 recognition training images such as word_1.png, "Genaxis Theatre".
I have not changed other paths. When I train the model by:
python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt
the output are:
root@10ca3ad2a7d1:/home/zy/jupyter/recognition/spotter/E2E-MLT-master# python3 train.py -train_list=sample_train_data/MLT/trainMLT.txt -batch_size=8 -num_readers=5 -debug=0 -input_size=512 -ocr_batch_size=256 -ocr_feed_list=sample_train_data/MLT_CROPS/gt.txt
Using E2E-MLT
loading model from e2e-mlt.h5
e2e-mlt.h5
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
1000 training images in sample_train_data/MLT/trainMLT.txt
4468 training images in sample_train_data/MLT_CROPS/gt.txt
4468 training images in sample_train_data/MLT_CROPS/gt.txt
I waited for half an hour, but no more output. can you help me? thank you.

about codec.txt

@MichalBusta Hi, there are 7398 characters in codec.txt, in the models.py, why self.conv11 = Conv2d(256, 8400, (1, 1), padding=(0,0))? I think it should be 7399

data generator random seed specification not working?

I'd like to be able to train in a repeatable manner--in particular, generate batches of images such that every time I start training from scratch, the images are presented for training in the same order. the data_util.GeneratorEnqueuer class accepts a random_seed parameter, but even when I set it a fixed number, the images are in a different order each time I train. I'm only using one reader. @MichalBusta, any ideas?

Your EAST implementation vs. argman/EAST

I notice that your EAST implementation performs much better than argman/EAST when there are large fonts.

Here is the prediction from your model:

screen shot 2018-11-16 at 11 32 27 am

And here is the prediction from argman/EAST

screen shot 2018-11-16 at 11 32 45 am

Do you mind sharing what's the key components of improving such detection? Thanks!

Need help to understand pts=np.roll(pts, 2)

I'm going deeper to your repo and I cannot understand this below code:

E2E-MLT/data_gen.py

Lines 122 to 123 in 2858358

if is_icdar:
pts = np.roll(pts,2)

I've checked some examples with this and find out that will shift an array liked:
[1224. 2041. 1537. 2041. 1537. 2134. 1224. 2134.]
into:
[1224. 2134. 1224. 2041. 1537. 2041. 1537. 2134.]

I know how numpy.roll function works, but I don't understand why you did it only for ICDAR dataset.
Can you help to explain this?
Thanks,

403 Forbidden

Hello, MichalBusta
I tried to download the model, but I got the following error:

Forbidden
You don't have permission to access /public_datasets/SyntText/e2e-mlt.h5 on this server.
Apache/2.4.25 (Debian) Server at ptak.felk.cvut.cz Port 80

score_map vs. training_masks

Dear @MichalBusta,
I'm trying to understand the difference between score_map and training_masks, which are generated from:

return score_map, geo_map, training_mask, gt_idx, gt_out, labels_out

Having some questions come in with me:

  1. Which one is more important to train the model? (I checked and found that score_map always have all ground-truth boxes, but training_masks is not)
  2. Why do you generate score_map and training_masks? (Because according to my work to understand your works, I think just score_map is enough. Pls help to fix if I have misunderstood, thanks!)
  3. Why do you just add a box to training_masks if its text contains " ", according to this line of code:

    E2E-MLT/data_gen.py

    Lines 478 to 496 in 2858358

    if txt.find(" ") != -1:
    pts_line = np.copy(pts2)
    c1 = ( pts[1] + pts[2] ) / 2
    dw1 = (pts[2] - c1) / 1.2
    pts_line[2] = c1 + dw1
    dw2 = (pts[1] - c1) / 1.2
    pts_line[1] = c1 + dw2
    c1 = ( pts[0] + pts[3] ) / 2
    dw1 = (pts[3] - c1) / 1.2
    pts_line[3] = c1 + dw1
    dw2 = (pts[0] - c1) / 1.2
    pts_line[0] = c1 + dw2
    cv2.fillPoly(training_mask, np.asarray([pts_line.round()], np.int32), 0)

    Hope to hear from you soon,
    Thanks in advance,

Specific format of annotation

Can you please explain the format of annotation? Usually there are only 1 integer and 4 float numbers. I see there are 5 float numbers here.

Error when running demo.py

Hello @MichalBusta
I have a problem running demo.py
Apparently I cannot import get_boxes. The WrapCTC Loss was installed without a problem apparently, however this is the error I am getting..

make: Entering directory '/home/caffe/E2E-MLT/nms'
g++ -o adaptor.so -I include -std=c++11 -O3 -I/home/caffe/anaconda3/envs/pytorch36/include/python3.6m -I/home/caffe/anaconda3/envs/pytorch36/include/python3.6m -Wno-unused-result -Wsign-compare -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -fdebug-prefix-map==/usr/local/src/conda/- -fdebug-prefix-map==/usr/local/src/conda-prefix -fuse-linker-plugin -ffat-lto-objects -flto-partition=none -flto -DNDEBUG -fwrapv -O3 -Wall -L/home/caffe/anaconda3/envs/pytorch36/lib/python3.6/config-3.6m-x86_64-linux-gnu -L/home/caffe/anaconda3/envs/pytorch36/lib -lpython3.6m -lpthread -ldl -lutil -lrt -lm -Xlinker -export-dynamic adaptor.cpp include/clipper/clipper.cpp --shared -fPIC
g++: error: unrecognized command line option ‘-fno-plt’
Makefile:10: recipe for target 'adaptor.so' failed
make: *** [adaptor.so] Error 1
make: Leaving directory '/home/caffe/E2E-MLT/nms'
Traceback (most recent call last):
File "demo.py", line 10, in
from nms import get_boxes
File "/home/caffe/E2E-MLT/nms/init.py", line 8, in
raise RuntimeError('Cannot compile nms: {}'.format(BASE_DIR))
RuntimeError: Cannot compile nms: /home/caffe/E2E-MLT/nms

Any idea? Thanks in advance for your help.

Detect rectangle box only

Hi @MichalBusta,
First of all, thanks for your great work!
My training dataset has only samples with rectangle boxes! After 200 epoch of training, the model can predict liked below image:
Annotation 2019-05-23 142022
Because of wrong angle detecting, that leads to wrong OCR result!

Then I think about a model with no angle (or unchanged angle) to predict rectangle boxes only.
I've known that your model with angle is really awesome, but in my case to predict rectangle boxes only.
How can I do it?

Thanks a lot!

Function of ocr_feed_list

hi @MichalBusta
I am sorry for this naive question.

I want to fine tune your model e2e-mlt.h5 with my dataset. For this, I have various images along with the ground truth of the texts in the image.

Now, the train.py has 2 parameters:

  • -train_list
  • -ocr_feed_list

-train_list: points to the directory where you have images along with their gt
-ocr_feed_list: points to the directory where you have cropped words

Is having cropped words mandatory for training? And is there any way to train the model without using the cropped word images (i.e. only using Scene Images with gt of all the text in it).

My priority is not only to achieve better text localisation but also a better text recognition, hence the OCR branch needs to be trained too, but using the gt of text present in the Scene Image and not separate cropped word images.

process_boxes() bug: does not iterate over all samples in the batch

There appears to be a bug in the process_boxes() function in train.py.

Line 70 should be:

 for bid in range(iou_pred.size(0)):

not

 for bid in range(iou_pred[0].size(0)):

In the main function of train.py, iou_pred[0] is passed to process_boxes(), not iou_pred. As written, it will not iterate over all samples in the batch, but over the length of the second dimension of iou_pred[0] (which is 1, the number of image color channel, blank/white in this case).

What does label named ### mean?

Dear @MichalBusta,
Some lines in ground-truth
522,145,535,144,537,160,523,161,###
535,141,598,137,599,155,537,159,BUTTER
608,136,649,133,650,150,609,153,TRUE
649,132,723,125,724,143,650,150,###

What does ### mean? What is its mean for training?

Thanks,

Just curious what are the bbox values in the raw dataset labels?

The MLT format labels have been given. Thanks a lot!

But i am curious what do these floats in the raw label mean?
0 0.46809631347656255 0.8593862826680818 0.0739853187168 0.0808596013931 -0.0298556635598 منذ

The first number is the language label, e.g., 0 for Arabic, 2 for Chinese...
The 2nd and 3rd number is the center point of the bounding box.
But I have no idea about the others, thanks in advance if someone can clear my question.

A small bug in ocr_utils.py

Hello Busta,
Maybe there is a bug in func print_seq_ext(ocr_utils.py line 28):
if c > 3 and c < len(codec):
should be:
if c > 3 and c < (len(codec)+4):

training problem: the program is stopped

when I train ,I meet the problem below.

CUDA_VISIBLE_DEVICES="0,1" python3 train_ocr.py -train_list=sample_train_data/MLT/trainMLT.txt -valid_list=data/valid/valid.txt -model=e2e-mlt.h5 -debug=1 -batch_size=8 -num_readers=5

7398
loading model from e2e-mlt.h5
e2e-mlt.h5
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt
2 training images in sample_train_data/MLT/trainMLT.txt

the program is like to stoped. It is mostly is the problem of data. Please help me if you have solved this problem, thank you!!!!

Questions on training the model from scratch

First of all, congrats on getting the best paper award at the 3rd International Workshop on Robust Reading and releasing the v2 of the paper on arXiv!

Few days ago I've been trying to train the model from scratch and I've came across some questions:

1/ Why do you use instance norm instead of other normalization methods like batch norm, group norm and etc.? Have you compared the results of using different normalization methods?

1.b/ How come for conv_dw_in, the InstanceNorm2d layer didn't set affine=True (by default, affine=False, which means there are no learnable parameters for this instance norm layer) while the rest of your InstanceNorm2d layers have affine=True.

2/ How long does it take to train the e2e-mlt.h5 model? Was it trained on multiple GPUs?

3/ Have you tried using transfer learning like how argman/EAST uses pretrained resnet-50?

Thanks!

process_boxes() unknown chars and misidentifies chars

I started training again and noticed many characters not being identified as existing in the codec_rev. The data is from icdar2015, icdar2017 (MLT) and icdar2019 (MLT) and the provided codec.txt is used.

Stranger still is that the same error (unknown char) is showing up for data from icdar2015, which is completely composed of english characters.

unknown-chars

As shown by the image above, the character "थ" is not found in the codec_rev, but is found in GT for image 277 from icdar2015. However that's the GT from image 277:

gt-277

Is there some file enconding for the codec.txt that I must set? Can you provide some information about why is this happening?

get error codes when run demo.py

when I run python demo.py -model=e2e-mlt.h5 on my image using pretrained model, I get error chars like:

Using cuda ... 炯 豚树糊V 鐫혁 労潔菉 瘙늠캄视胎興瘙菁綜헤誉숯덧顾 瞑주诈밟赌漓誉 瘙룹覆払멸戈 捉貯Ω汽 逢茉帛ш²터勇 炯 君ш羞 鐫労성炯 嫉물孤勇軒扮挣労彬 송労厂孤蓝風卓炯勇客瞑쭉텟嫉扮勇 瞑瘙勇آ喪宅 勇逢興凰勇も隣亨г닫驶姨송钰琢漬勃逢因逢® ヅ労塔厂赌곡労 尬 .豚己珊ш独备 炯鐫합勇г힉炯柚機労孤树苛翔軒扮찜廠肯码띈穫송労뱃띈孤漬�拼払慢炯끼南拼勇煸糖炯勇炯آ拓勇瞑ュш孤勇戈軒ш擘宅г杏萃赌굳г逢夢 労勇悉慢防炯视 굳汽赌퓨 廠╩V軒浩漬송労棒卓苛밍炯펍ヅ堅逃庶ш²拐勇 헤 몬ø 杏 捉封绩앉树V労뱃孤漬树勇瞑糖炯勇码客骏밍马蠡結송戈己凰ш勇гآ顾炯켭炯 7外旧慶誉鼠 廠殖îヅ송币শ琢牟¥勇炯琢顾炯候 钿 夢ル緒视匾물瘙ø펍숱 汽ш 拡咖勇 炯 豚菉障噜廠戈讃آ간勇弟厂视آ彬3锡송労 힉炯阁坤띈孤찜も뒷興찜굳г逢卓候물 树勇可簧视勇 卓苛송労펍ш夢労杏 誉썼반庶ш ²咖씻 炯独몬ø盛根廠焱蠡안炯柚尬労 充兵송펍费貯 굳逢扮翰鐫V翔缤漬戈勇風힉■橋텟瞑牟豚众遺펍汽ш간勇 炯 힉ø沈縛ḥ╩逑系룹임 ă惯俏勇г閉求炯켭 菉잘 漬г 룩균漱 汰尬労妍ぼ众良涼╩翘尼년 渣 헤崩戈 噜讐 盯╩誉쟁 旺ш굳轻룩尬候鉄漬邦╩杏ш臭若労শ臭ш孤련誉ぃ鐫 労绩 묽労誉ֳ孤顾炯勇巷漬찜 訴7淺7南廠勇 炯 縛労视去 币労囤Ω孤굳守牟蠡汽²茸瞑 헤崩 砂贺菉ω칠╩겉誉ム树旺멸勇R轻瘙尬候粤 未╩ш牟ш 労헤囤争僅顾勇防入瞑鐫 労绩 脯 瞑労—ш纽顾候府瞑찜貯 淺南鎖书 헤 몬豚菉障备 鐫간╩V煸炯舊V測视労彬惑担송煸Ð炯勇風浜夢瞑诈労誉란آ 찡澤룹炯ш搓ш遺労丧 ш간勇 炯 揉آ힉労간労劈逢孤誉싼牟树 f孤兵炯视囤Ω孤顾렷녹 ш瞑労守ш牟视홈郴챌찜횟労塑综눙勇 炯 앨泼崭备╩労련왕ш찜련暦犊捉㡢瑅ш茗孤居 旺 헤Ω춥抬터룹形산封柚杏炯란猕豚祸帛 淺²閠 柜

About training

Hi, Thanks for shraring amazing code.

I have some question and ask some favor.

I run demo using your pre-trained model.

It works fine, but sometimes it recognized as a wrong language.( real: Korean word, pred: Chinese word)

So, I try to make my own model for two languages(Korean, English )

In your README, you shared Synthetic MLT Data, but there isn't English dataset.

  1. Can you also share English dataset like other language?

I know there is English data (http://www.robots.ox.ac.uk/~vgg/data/scenetext/)

Honestly ,I have no idea how to pre-processing for training.

So If you share your English dataset, It would be very helpful for me.

And I have a question about the loss value in the training process.

epoch 29[297400], loss: -0.740, bbox_loss: 1.441, seg_loss: -1.745, ang_loss: 0.142, ctc_loss: inf, rec: 0.53351 in 0.008

ctc_loss looks weird. but when I run the demo using my trained model, It works fine though.

  1. I want to make sure whether I use dataset properly or not.

Here is my sample.

  1. "train_list" in train.py

ant hill_102_1

4 0.6138464864095052 0.9437334566510797 0.100082146022 0.0380848404567 -0.0165453118083 승인
4 0.7581702677408855 0.9419909103653437 0.15256916863 0.0400553852073 -0.0165452067969 신청을
4 0.09844179789225262 0.4498745809216279 0.127543301498 0.0471818991097 0.0601694282711 킬러와
4 0.14116963704427085 0.934938899501977 0.153929254901 0.0502906237088 -0.00324781919693 클락슨은
4 0.30354619344075523 0.9330855692100061 0.0971145094056 0.0515150747853 -0.0032484308943 같은
4 0.127603022257487 0.5502413635996427 0.106233448994 0.0524479823728 0.0817744338364 인수를
4 0.1683737055460612 0.8213418556825959 0.105878256893 0.0450125720438 -0.0295469131212 수요와
4 0.7358302307128907 0.8595470419185295 0.195197484112 0.0518889909319 -0.0354592041309 갑이다
4 0.5390819803873699 0.710513298238861 0.090994413401 0.0300411324556 0.395192956606 우리당은
4 0.04096550623575847 0.7650540020053984 0.0663619143835 0.0345938411181 -0.0283670735217 2005
4 0.1388903299967448 0.7581292590955748 0.0735245075651 0.0328411962243 -0.0289554923933 03:05
4 0.1029958724975586 0.311124815557995 0.159051668836 0.0493272034958 0.0643214178445 아니라
  1. "ocr_feed_list" in train.py

word_27583

My question is so basic since I'm a newbie in this filed. Sorry for that(also my bad English).

Weird bounding box results

Today I've tested the model with a few images and I'm getting some weird results, there are 2 cases that I'll highlight:

  1. Here are three OCR prediction results

latin_1

chinese_1

latin_3

Notice that some of the text recognition predictions include words/characters from the neighboring bounding boxes. For example, for the first image, you can see the bounding box that predicts at $50000 contains only $50000, but it can still predict at for some reason. Is that a bug?

  1. For tiny sequence of characters, I'm getting some really weird bounding boxes

latin_2

Notice that for this particular image I didn't resize the image (i.e. im_resized = im in demo.py). Here is the input image if you are interested in reproducing the predictions:
table_with_tiny_numbers

Can't find script identification model

  1. It seems the current implementation of the E2E model only output "word transcription". Are you planning to release the part that output the "script identification"?

  2. There are 3 outputs from the ocr_image: det_text, conf, and dec_s. I know what det_text is and I think conf means confidence, but how about dec_s?

Here is a partial log for the output of ocr_image function:

conf: [[ 0 87]] dec_s: [[87]]
conf: [[ 0 69]] dec_s: [[63 69]]
conf: [[ 0 47]] dec_s: [[47]]
conf: [[ 0 48]] dec_s: [[47 48]]
conf: [[ 4 47]] dec_s: [[46 47]]
conf: [[ 4 95]] dec_s: [[94 95]]
conf: [[ 3 39]] dec_s: [[38 39]]
conf: [[ 4 55]] dec_s: [[54 55]]
conf: [[ 0 63]] dec_s: [[55 63]]
conf: [[ 0 71]] dec_s: [[70 71]]
conf: [[ 3 47]] dec_s: [[ 9 46 47]]
...

For conf, what does the single digit at index 0 represent? The double digit at index 1 is the confidence in range [0,100] right? For dec_s, I'm not sure how to interpret the numbers. Do you mind explaining it in depth?

Thank you for releasing the code btw!

Model Evaluation

Is there a script for model evaluation for detection and/or end-to-end?

ctc_loss gets inf values and Unknow chars

When I use the pre-trained model you provide to continue training on the MLT-2019 dataset, the ctc_loss gets inf values at most steps. Is there something wrong with it?

Also, there are some characters that seem not included in the directory. Will this infect the training performance?

The screen prints are look like:

epoch 12[205000], loss: 8.969, bbox_loss: 5.751, seg_loss: -0.604, ang_loss: 3.349, ctc_loss: 9.002, rec: 0.00000 in -0.000
Unknown char: 铣
Unknown char: 綦
epoch 12[205100], loss: 5.823, bbox_loss: 5.086, seg_loss: -0.910, ang_loss: 2.095, ctc_loss: inf, rec: 0.02500 in 0.000
epoch 12[205200], loss: 2.607, bbox_loss: 3.810, seg_loss: -1.047, ang_loss: 0.875, ctc_loss: inf, rec: 0.02069 in -0.000
Unknown char: 铣
epoch 12[205300], loss: 2.268, bbox_loss: 3.619, seg_loss: -0.995, ang_loss: 0.727, ctc_loss: inf, rec: 0.03540 in -0.003
Unknown char: 桷
Unknown char: 灏
Unknown char: 綦
epoch 12[205400], loss: 1.631, bbox_loss: 3.175, seg_loss: -1.080, ang_loss: 0.562, ctc_loss: inf, rec: 0.00775 in -0.000
epoch 12[205500], loss: 1.843, bbox_loss: 3.320, seg_loss: -1.135, ang_loss: 0.659, ctc_loss: inf, rec: 0.02113 in 0.000
Unknown char: 捌
epoch 12[205600], loss: 1.517, bbox_loss: 2.975, seg_loss: -1.124, ang_loss: 0.577, ctc_loss: 7.738, rec: 0.05634 in 0.000
epoch 12[205700], loss: 1.242, bbox_loss: 2.867, seg_loss: -1.183, ang_loss: 0.496, ctc_loss: inf, rec: 0.01439 in 0.000
Unknown char: 綦
epoch 12[205800], loss: 1.263, bbox_loss: 2.826, seg_loss: -1.203, ang_loss: 0.527, ctc_loss: inf, rec: 0.02899 in 0.001
Unknown char: 灏
epoch 12[205900], loss: 1.187, bbox_loss: 2.795, seg_loss: -1.212, ang_loss: 0.501, ctc_loss: inf, rec: 0.05825 in 0.000
epoch 12[206000], loss: 1.216, bbox_loss: 2.815, seg_loss: -1.169, ang_loss: 0.489, ctc_loss: inf, rec: 0.03731 in -0.001
Unknown char: 螃
Unknown char: 捌
Unknown char: 閩
epoch 12[206100], loss: 1.194, bbox_loss: 2.801, seg_loss: -1.191, ang_loss: 0.492, ctc_loss: inf, rec: 0.04032 in -0.006
epoch 12[206200], loss: 0.760, bbox_loss: 2.559, seg_loss: -1.231, ang_loss: 0.356, ctc_loss: inf, rec: 0.04487 in 0.024

Vertical text recognition

Dear @MichalBusta,

This label is confusing me. Those are Japanese vertical text and I didn't find any algorithm in your works can solve vertical text-line recognition.

img_5407

Sorry for my lack of knowledge if your works can solve it, I've just worked in OCR field for a year.
Can you show me which part of your works solve vertical text-line??
That is really interesting if my next works can contribute to your works a vertical text recognition module.
Thanks,

build_docker.sh not work

Having a docker folder in your project, I've tried to run build_docker.sh for environment creating.

But build_docker.sh doesn't work.

Does it have any available docker for your project?

how to train English scene text alone?

@MichalBusta
I appreciate your nice work!
i am new to scene text detection&recognition,which is part of my feature work.the procedure of training english scene text alone with my dataset.
could you please teach me how to train english scene text model with my dataset?

bad image

hello,i'm sorry to bother you
when i run the train_ocr.py, it always show the bad image message,
搜狗截图20190428222431
the validation dataset i used is part of ic15 test word images,could you please tell me why this codition happens?
thanks a lot!!!

作者你好,我在运行demo.py的时候有这个错

Traceback (most recent call last):
File "demo.py", line 10, in
from nms import get_boxes
File "/media/chen/软件/DeepCode/E2E-MLT/nms/init.py", line 8, in
raise RuntimeError('Cannot compile nms: {}'.format(BASE_DIR))
RuntimeError: Cannot compile nms: /media/chen/软件/DeepCode/E2E-MLT/nms

这个感觉是要编译里面的文件无法编译

Training_ocr all zeros

Hello, I want to retrain the model for Chinese only, however, when run python train_ocr.py the predicted label is all zeros, and the ctc loss is up and down

default

have you ever encountered this problem before? I also tried to change the learning rate, but there is nothing help.

RuntimeError: CUDA out of memory

Hello, I have met a problem when training the model as follow:

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 983.69 MiB total capacity; 209.53 MiB already allocated; 69.56 MiB free; 6.47 MiB cached)

could you give me any solutions?
Thank you!

VIDEOIO ERROR while running demo.py in CPU machine

Hi, while running the demo as follows:
python3 demo.py -model=e2e-mlt.h5 -cuda=0

I am getting the error as

make: Entering directory '/home/virenv/ocv-td/E2E/E2E-MLT-master/nms'
make: 'adaptor.so' is up to date.
make: Leaving directory '/home/virenv/ocv-td/E2E/E2E-MLT-master/nms'
e2e-mlt.h5
VIDEOIO ERROR: V4L: can't open camera by index 0

I have changed this line
sp = torch.load(fname)
as sp = torch.load(fname, map_location='cpu') in net_utils.py

(Running in a virtual environment with python 3.5.2, Ubunutu 16.04, CPU machine, NO CUDA, torch version (CPU) '1.0.1.post2', openCV 4.0.0)
Thank you.

The ocr loss come to nan

I use my own Japanese dataset, and crop all words into single word images.
Then I use train_ocr to train the ocr network, (using e2e-mltrctw.h5 as pretrained model, but change the output size of the model from 7500 to 4748 that is number of word types of my dataset). But the loss come to nan very fast. Is there some reason for that? Thanks!

These are training loss:
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
683464 training images in data/crop_train_images/crop_trainkuzushi.txt
epoch 0[0], loss: 55.214, lr: 0.00010
epoch 0[500], loss: 54.610, lr: 0.00010
epoch 0[1000], loss: 14.609, lr: 0.00010
epoch 0[1500], loss: 7.219, lr: 0.00010
epoch 0[2000], loss: 6.109, lr: 0.00010
epoch 0[2500], loss: 5.536, lr: 0.00010
epoch 0[3000], loss: 4.826, lr: 0.00010
epoch 0[3500], loss: 4.030, lr: 0.00010
epoch 0[4000], loss: 3.301, lr: 0.00010
epoch 0[4500], loss: nan, lr: 0.00010
epoch 1[5000], loss: nan, lr: 0.00010
save model: backup2/E2E_5000.h5
epoch 1[5500], loss: nan, lr: 0.00010
epoch 1[6000], loss: nan, lr: 0.00010
epoch 1[6500], loss: nan, lr: 0.00010
epoch 1[7000], loss: nan, lr: 0.00010
epoch 1[7500], loss: nan, lr: 0.00010
epoch 1[8000], loss: nan, lr: 0.00010
epoch 1[8500], loss: nan, lr: 0.00010
epoch 1[9000], loss: nan, lr: 0.00010
epoch 1[9500], loss: nan, lr: 0.00010

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.