chenjun2hao / attention_ocr.pytorch Goto Github PK

This repository implements the the encoder and decoder model with attention model for OCR

Python 100.00%

attention-model ocr pytorch attentionocr

attention_ocr.pytorch's Introduction

attention-ocr.pytorch:Encoder+Decoder+attention model

This repository implements the the encoder and decoder model with attention model for OCR, the encoder uses CNN+Bi-LSTM, the decoder uses GRU. This repository is modified from https://github.com/meijieru/crnn.pytorch
Earlier I had an open source version, but had some problems identifying images of fixed width. Recently I modified the model to support image recognition with variable width. The function is the same as CRNN. Due to the time problem, there is no pre-training model this time, which will be updated later.

requirements

pytorch 0.4.1
opencv_python

cd Attention_ocr.pytorch
pip install -r requirements.txt

Test

pretrained model coming soon

Train

Here i choose a small dataset from Synthetic_Chinese_String_Dataset, about 270000+ images for training, 20000 images for testing. download the image data from Baidu
the train_list.txt and test_list.txt are created as the follow form:

# path/to/image_name.jpg label
path/AttentionData/50843500_2726670787.jpg 情笼罩在他们满是沧桑
path/AttentionData/57724421_3902051606.jpg 心态的松弛决定了比赛
path/AttentionData/52041437_3766953320.jpg 虾的鲜美自是不可待言

change the trainlist and vallist parameter in train.py, and start train

cd Attention_ocr.pytorch
python train.py --trainlist ./data/ch_train.txt --vallist ./data/ch_test.txt

then you can see in the terminel as follow: there uses the decoderV2 model for decoder.

The previous version

git checkout AttentionOcrV1

Reference

TO DO

change LSTM to Conv1D, it can greatly accelerate the inference
change the cnn bone model with inception net, densenet
realize the decoder with transformer model

attention_ocr.pytorch's People

Contributors

Stargazers

Watchers

Forkers

sosoho xgmiao zgai fireae fendaq alwc yasuharaaa yongduek elavin11 whitesharkbrother yyfanxing chanuku roec faizwhb brittonalone warrior701 thanhhoang283 lijian10086 jj456789 ustczhouyu wangke0809 yangheng111 hgithuby lbw1320028474 lmpan zhangbo2008 wuxiaolianggit donniezhang586 kevinchen1223 chenjianhua meizh holygen ntcuong2103 hajaulee kapitsa2811 jzw0025 hell-to-heaven aliushn vincezengqiang chadpieere aaferrero dgarlor hjwseu alchemistyui persuelx cocol11 mess-lelouch shaoyandea sonsuhyune kaylio zhdai web20opensource lidaweinuc dy1998 siyisan danielecoli hiterstudy keventimcai advancer-debug xyy19920105 mengxiaolu dyf-ai bqdqj rulcsoft iammosespaulr dreamerllllll chaitusvk jimmy-inl wjy199708 hpc203 tianyu06030020 tavishjain yangyin2016 aircraft-852 xianyisanren ming-zhou0201 akshit61 abulice yale1417 wenjie1239 matheuspp zhuth teacrown sixsixliu fengyuwuqing vbkbmqj sumanmichael vegetbirdkai wangnaijia z-mu-z aristotle-li deepakcrk witzou chasingdreams2020 linxin04 bigmai-1234 haojiepan1 linh0704 garspace codehorse-max

attention_ocr.pytorch's Issues

Increasing batch_size of validation set throws tensor size mismatch error

您好，大佬问您一个问题，为什么attention解码训练的时候，都要重置 decoder_hidden = decoder.initHidden(b).cuda()参数呢

您好，大佬问您一个问题，为什么attention解码训练的时候，都要重置 decoder_hidden = decoder.initHidden(b).cuda()参数呢，我的理解应该是编码层输出会有一个decoder_hidden 参数啊，大佬可以解答一下吗？ @chenjun2hao

前几个loss正常，为什么后面的损失都不正常，从5.多到0.1

感觉就取了前几张训练，loss下降的很快，然后test都为空

图片位置和decoder_5.pth位置

不定长的识别问题

你好，用您提供的开源模型进行不定长测试，有这两种问题：
1.图片不定长：
transformer = dataset.resizeNormalize((280, 32))，非280会报错，CRNN的处理是按照32的高然后同比例缩放图片的宽，因此输入是（x,32）
2.文字不定长：
可能是因为训练的时候都是10个字，预测的时候不管图片里面几个字，预测结果还都是10个字左右？

举个例子，把图片

中的字去掉几个后，还是280*32输入识别，

结果是这样：
predict_str:，__不愿意意意资（9个字） => prob:0.002346405293792486

predict_str:**通信信位主办、《 (10个字) => prob:0.05960559844970703

predict_str:，（通信学会主主府（9个字） => prob:0.000349084148183465

predict_str:叶国通信学会主里”《（10个字） => prob:0.01799328438937664

想问如何解决？是不是训练需要不定长训练啊？谢谢~

'unexpected key "cnn.7.num_batches_tracked" in state_dict'

运行demo.py的时候，出错？
loading pretrained models ......
Traceback (most recent call last):
File "demo.py", line 34, in
encoder.load_state_dict(torch.load(encoder_path))
File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "cnn.7.num_batches_tracked" in state_dict'

报错KeyError:' '

你好，我在运行你的代码时候报错KeyError:' '，这种是怎么回事呀？
if isinstance(text, str):
text = [self.dict[item] for item in text]

模型准确率的问题

你好，我想问一下，我重新运行了一下你的代码但是21轮epoch之后，识别的准确率还是很低，达不到你所给出的效果。您觉得，这可能与什么原因有关呢？

GO 和END_TOKEN？

这里面训练有加GO(START_TOKEN)和END_TOKEN么？我只在crnn.lang中看到target_txt_decode有加，但是这个函数没有被调用到。
data = val_iter.next()
cpu_images, cpu_texts = data
...
target_variable = converter.encode(cpu_texts)
target_variable = target_variable.cuda()
decoder_input = target_variable[0].cuda()
这里decoder_input val的decode_input(no_teach_forcing)应该是一个GO(START_TOKEN)，看上去它调用的是一个cpu_texts的第一个字吧?

Class Attention()中的text_length具体是指什么

AssertionError: Torch not compiled with CUDA enabled 请问，这个程序一定要在GPU上跑吗？

作者你好，我的电脑只有CPU，没有GPU，也没有安装CUDA，Ubuntu环境。请问能够正常运行这个程序吗？我试着运行了几次都是失败了。查找原因好像是需要CUDA，可是我没有GPU，还能够有什么方法让程序继续正常运行吗？

超参数设置

请问作者超参设置是程序默认值吗，大概训练多少epoch模型收敛？

inference 能改为批量预测吗

现在inference只能一张一张预测，比较慢

预训练模型什么时候出来？

多行文本使用attention能训练吗

1.固定2行,第一行4个字,第2行7个字,在不分隔的情况下能使用attention训练吗

2.试了ctc不行

训练报错 AttributeError: 'str' object has no attribute 'decode'

AttributeError Traceback (most recent call last)
~/Attention_ocr.pytorch-master/train.py in
9 import numpy as np
10 import os
---> 11 import src.utils as utils
12 import src.dataset as dataset
13 import time

~/Attention_ocr.pytorch-master/src/utils.py in
16 data = f.readlines()
17 alphabet = [x.rstrip() for x in data]
---> 18 alphabet = ''.join(alphabet).decode('utf-8') # python2不加decode的时候会乱码
19
20
调用decode时候报错

decoder每次预测一个字符，这样是不是很慢

不定长测试图片

你好，
我目前还没有条件运行你的程序。
我想先问一下，这个模型可以识别长一点的文本行图片么？
我看了demo程序，里面有设置最大字符个数15，这个值是固定的么？
谢谢。

关于损失函数的问题，CRNN的损失函数不是CTC loss吗？为什么你的代码是NULLloss的？我刚入门不太清楚，望解答谢谢啦~

如题

为什么接着以前的训练，比从零开始的loss还要大.... 总感觉存在问题

字典文件char_std_5990.txt找不到

你好，想测试下，但是提示字典文件char_std_5990.txt找不到

RuntimeError: expand(torch.cuda.FloatTensor{[16, 71]}, size=[71]):

python3 change

Great Thanks for sharing the code!

I found that this code must have been developed with python2.7.

In order to do experiments with python 3.x, I had to change some parts that dealing with unicode & utf-8.

Following is what I did.
dataset.py:
label = line_splits[1]#.decode('utf-8')
utils.py (line 53):
if isinstance(text, str): # python3 string default is unicode #unicode):

ref: https://stackoverflow.com/questions/4987327/how-do-i-check-if-a-string-is-unicode-or-ascii

thanks again for code sharing. It is very much helpful for studying DNNs.

what is the difference of you repo and attention-ocr of da03 except for different frameworks?

你好，非常感谢你的代码，我正在参考它理解Attention-OCR，但是我有一些不明白的地方，

我想知道“教师强制：将目标label作为下一个输入”是在干什么？

Class Attention()中的text_length具体是指什么

请问新的模型什么时候能出来，文件是太大了吗，一直下载不下来

encoder和decoder使用两次optimizer是为了更好的收敛吗？

load decoder_path error

hi, thanks your excellent job, I meet the error:
RuntimeError: Error(s) in loading state_dict for decoder:
size mismatch for decoder.embedding.weight: copying a param of torch.Size([17765, 256]) from checkpoint, where the shape is torch.Size([5992, 256]) in current model.
size mismatch for decoder.out.bias: copying a param of torch.Size([17765]) from checkpoint, where the shape is torch.Size([5992]) in current model.
size mismatch for decoder.out.weight: copying a param of torch.Size([17765, 256]) from checkpoint, where the shape is torch.Size([5992, 256]) in current model.

it looks like model your list for inference has size error. so, how to fix it.

解码器权重加载写成加载编码器了

if opt.decoder:
    print('loading pretrained encoder model from %s' % opt.decoder)
    encoder.load_state_dict(torch.load(opt.encoder))

上面这段代码应该是加载decoder, 但其实加载成了encoder,会导致后面测试的时候全是错的