courao / ocr.pytorch Goto Github PK

A pure pytorch implemented ocr project including text detection and recognition

License: MIT License

Python 100.00%

ocr text-detection text-recognition ctpn crnn ocr-pytorch

ocr.pytorch's Introduction

ocr.pytorch

A pure pytorch implemented ocr project.
Text detection is based CTPN and text recognition is based CRNN.
More detection and recognition methods will be supported!

Prerequisite

python-3.5+
pytorch-0.4.1+
torchvision-0.2.1
opencv-3.4.0.14
numpy-1.14.3

They could all be installed through pip except pytorch and torchvision. As for pytorch and torchvision, they both depends on your CUDA version, you would prefer to reading pytorch's official site

Detection

Detection is based on CTPN, some codes are borrowed from pytorch_ctpn, several detection results:

Recognition

Recognition is based on CRNN, some codes are borrowed from crnn.pytorch

Test

Download pretrained models from Baidu Netdisk (extract code: u2ff) or Google Driver and put these files into checkpoints. Then run

python3 demo.py

The image files in ./test_images will be tested for text detection and recognition, the results will be stored in ./test_result.

If you want to test a single image, run

python3 test_one.py [filename]

Train

Training codes are placed into train_code directory.
Train CTPN
Train CRNN

Licence

MIT License

ocr.pytorch's People

Contributors

Stargazers

Watchers

Forkers

zhs108 alwc ustczhouyu kuhncwlvbswxclur fendaq kapitsa2811 yingao4937 happog leoli08 beijing-penguin qixing-ai mael-zys muhammedhassanm fcakyon dokyeongk siemens-aopen iamweiweishi arijit-pande aiedward thomaswang525 huyhoang17 yinhu ganggang233 skyfish-qc tjm2020 zhuzhengxiong yqmac keventimcai light201212 akshat188 chez zongke-zjut alokpredictly julesmhad iammosespaulr hakomori64 inf800 arthurmsouza ls-dai ajeet28 hommmm zhangfengyo jjz-learning yyiilluu aldemaro14 prasiyer yale1417 szknbyk ttthomaschan lohzhunyewcs tejastank islands5 mlopezgez ruslanom saraansh1999 ucrscholar xrosliang zctt00 zz110 longliveping mu-l insting wckjlu chenlei00 pshwetank mathemusician pythonuz huanbia ducbx rebortboss unjason josemaureirab jansonjiang wenxuefeng3930 inza111 1158644219 linhong00316 fenghy nicolalandro dutyhong miaomiaoxiaobai riosted mytoolset lyingcs makao007 nicabedgathaba sakura-plus hailangzz wsadczh hersonar 171011 nathanaelg16 yingcy1 huahaibao rosulucian pony-2020 wybryan oldsport-996 datouready jonesky

ocr.pytorch's Issues

提供的CRNN模型不可用

抱歉这么久打扰，为什么提供的CRNN模型里只有一个可以用呢？
RuntimeError: Error(s) in loading state_dict for CRNN:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias", "conv2.weight", "conv2.bias", "conv3_1.weight", "conv3_1.bias", "bn3.weight", "bn3.bias", "bn3.running_mean", "bn3.running_var", "conv3_2.weight", "conv3_2.bias", "conv4_1.weight", "conv4_1.bias", "bn4.weight", "bn4.bias", "bn4.running_mean", "bn4.running_var", "conv4_2.weight", "conv4_2.bias", "conv5.weight", "conv5.bias", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param with shape torch.Size([5997, 512]) from checkpoint, the shape in current model is torch.Size([5835, 512]).
size mismatch for rnn.1.embedding.bias: copying a param with shape torch.Size([5997]) from checkpoint, the shape in current

关于crnn训练中出现train loss=nan的问题

1.我的数据集是这样的（有360w张图片）

2.我的config是这样的

3、运行情况是
$UO0CR5%(4ZZ_WF(HX )F5{S$

烦请作者大大解答一下，感激不尽

问一下作者，我训练ctpn过程中loss值停在0.7左右难以收敛，图像分辨率是957，532 ，我应该做哪些修改，

CRNN出现str不可用，不知道该怎么修改，求赐教

这个问题不知道怎么搞，百度没有查出来，不知道是不是我环境问题嘛，麻烦大家看一下

也出现了loss突然变nan的问题

您好，我这边也出现了loss变nan的问题，我网上搜了一下，发现有一种解决方法方法是添加hook函数（连接如下：https://discuss.pytorch.org/t/ctcloss-performance-of-pytorch-1-0-0/27524），我按照上述方法，定义backward_hook，并在倒数第二行添加crnn.register_backward_hook(backward_hook)代码，您看这样的解决方法在逻辑上是否可行？

空格不识别

能否说说怎么返回识别出的文字的在原图中的坐标位置？

如题

CRNN这个是怎么回事？Aborted (core dumped)

free(): invalid next size (normal)
Aborted (core dumped)

更改crnn_recognizer.py报错

您好，我和前面的朋友遇见的问题一样，修改crnn_recognizer.py文件的第100行def init(self, model_path='/root/zjut/ocr.pytorch/checkpoints/CRNN.pth')。当我执行'python demo.py'命令出错，显示如下：
Traceback (most recent call last):
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 432, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 316, in run_file
runpy.run_path(target, run_name='main')
File "/root/anaconda3/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/root/anaconda3/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/root/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/zjut/ocr.pytorch/demo.py", line 10, in
from ocr import ocr
File "/root/zjut/ocr.pytorch/ocr.py", line 6, in
recognizer = PytorchOcr()
File "/root/zjut/ocr.pytorch/recognize/crnn_recognizer.py", line 111, in init
self.model.load_state_dict({k.replace('module.', ''): v for k, v in torch.load(model_path).items()})
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias", "conv2.weight", "conv2.bias", "conv3_1.weight", "conv3_1.bias", "bn3.weight", "bn3.bias", "bn3.running_mean", "bn3.running_var", "conv3_2.weight", "conv3_2.bias", "conv4_1.weight", "conv4_1.bias", "bn4.weight", "bn4.bias", "bn4.running_mean", "bn4.running_var", "conv4_2.weight", "conv4_2.bias", "conv5.weight", "conv5.bias", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param with shape torch.Size([5997, 512]) from checkpoint, the shape in current model is torch.Size([5835, 512]).
size mismatch for rnn.1.embedding.bias: copying a param with shape torch.Size([5997]) from checkpoint, the shape in current model is torch.Size([5835]).
其中CRNN.pth是您度盘所提供的。

训练CRNN

作者您好，想请教一下，我用自己合成的数据集训练CRNN，loss降下来了，但是拿图片测试不出结果，大概是哪里出了问题呢

对一段代码不解，特来与up主交流

ocr.pytorch/detect/ctpn_predict.py第43行，
image = image.astype(np.float32) - config.IMAGE_MEAN
对这一步的操作的意义（是减均值吗？），以及config.IMAGE_MEAN取值的依据不懂。。。
我是刚入门机器视觉，问题也比较小白，有劳up主了。。

请问一下作者，这个模型使用pytorch的nn.ctcloss能够达到目的吗？

感谢作者大大，我的问题就是这样，如果按照您的train_pytorch_ctc.py文件训练（我的数据集是不定长的牌照（4——9位）且不同的位数对应的W值要改变，h会微微改变），nn.ctcloss能否胜任这个工作呢？

运行demo出错

训练模式和预测模式性能差别很大

作者您好！我在使用您的resnet-ctc以及原版cnn-ctc训练准确率均能96%以上，但是验证准确率只有10%。对于同一张图片，无论是训练集还是验证集，model.train(）结果还可以，model.eval()结果完全蠢。另外我用的是nn.CTCLoss,这可能是什么问题？感谢回复

Upload pretrained models to other host

Can you please upload the pretrained models to another site than pan.baidu? Most non chinese users can't download from there.
Maybe https://www.mediafire.com/, https://www.4shared.com/, https://mega.nz/, googledrive or https://zippyshare.com/
Thank you very much.

CRNN

可否告知CRNN的训练数据格式

C++部署CRNN

请问您有尝试使用C++部署识别模型么

CRNN数据集

请问这个模型支持不同图像尺寸，不同label长度的数据集吗？

crnn训练的时候出现损失在140多，精度为0

咱们这个框架的crnn部分，train.py train_python_ctc.py keys.py recognizer.py 麻烦大家解释一下这几个的用法呀，我在训练中文的时候出现损失特别大

CRNN

请问，送入crnn网络的数据如果高度大于32，训练crnn时将数据预先resize成高度32，对ctpn检测出的文本，也resize成32的然后再送入训练好的crnn模型，这样子的话是不是针对高度大于32的文本也可以正常进行识别呢？

ctpn batch_size can not > 1?

why?

关于CRNN识别问题

作者您好，请问这个项目中的CRNN可否用来识别特殊符号呢例如 ℃ 这类字符

Training CRNN & extracting the CTPN detection

@courao Thank you for your hard work,

Will you release CRNN training code & documentation.
Can you add the option to extract the detected lines of CTPN.

运行环境

可否贴一下运行环境呢？谢谢

无法识别英文段的空格

你好，这个代码我有研究过，有两个问题想请教下：

train_code中提供了crnn训练的三个代码，ctc，ctcV2，与torch版的CTC，请问，实际训练时用的哪个代码？
我自己也写过类似的工程，但是训练出的模型在识别英文整段语句时，单词和单词之间是相连的，也就没法识别出单词间的空格，我尝试过网上大多数模型在识别英文时也存在这个问题，不过你文中的CRNN-1010.pth这个模型可以识别出一部分空格，请问这是做过特殊处理吗？

Why my accuracy is always 0?

Hi author, i am a beginner and was training the model on my own dataset using your implementation but got the accuracy 0% after every epoch while the training loss kept on decreasing after every epoch. Why is it so?

inference for batch?

Hi!

Can i extract the text from the batch? I don't understand how to do it..

loss=0.44 on the SROIE dataset

@courao
I have been training for 1 day on the SROIE dataset., the loss is still 0.44 !
It works well on other datasets, but not the SROIE dataset?
Am I doing something wrong?
dataset download link

CRNN可以用在Khmer语言吗？

我尝试训练CRNN模型，但是我得到的结果一直是：

Not Covering Char: ១ - 6113
Not Covering Char: ១ - 6113
Not Covering Char: ៩ - 6121
Not Covering Char: ៧ - 6119
Not Covering Char: ១ - 6113
Not Covering Char: ទ - 6033
Not Covering Char: ស - 6047
Not Covering Char: រ - 6042
error
Train loss: 0.000000

Start val
~/image0-1.jpg
~/image0-1.jpg
pred :—眯恂
target:គោគ្គនាមនិងនាមៈ កូល វន្ធសហា
0.0
ocr_acc: 0.000000

请问我该如何成功地训练CRNN模型呢？感谢您的解答。

关于自己数据的训练问题

作者你好，在使用您的CTPN网络训练自己的数据的时候出现了一些问题
1.加载预训练模型对自己标注的数据进行训练后，检测效果反而变差了
2.训练CTPN时，出现了loss值不下降的情况
3.输入的图片是需要一定的尺度缩放吗
希望能得到解答

GPU训练

作者您好，我使用您的crnn训练代码在cpu上运行会出现valueerror（weight或height需要大于0），经过修改trans.py中的参数可以解决这一问题。但使用相同的参数在GPU上训练仍会出现valueerror，请问这是什么原因？

ctpn_model_v2.py 是怎么用的？

您好，看了下相比ctpn_model.py里面多了RPN_Loss和rpn_refiment，然而训练代码并未提及这个文件，这个些不同是出于什么方面的考虑呢？谢谢。

crnn.py中看到有好几个不同的crnn模型

这几个不同的crnn模型，请求大佬给出解释哈，有没有不限制输入图片尺寸的模型

感谢大佬回答我的疑问对于crnn数据集的获取在此分享给有需要的人

训练数据分为两块
1.一个是生成的数据，也就是这个github上也有不少相关工作，
可以在这里下数据集：https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码：lu7m)
自己生成的话可参考这个代码 https://github.com/Sanster/text_renderer
2.还有一块是自己项目用到的一些数据，不过数据量比较小
这一块当然也越多越好，不过获取以及标注难度比较大

我换了ctpn预训练模型进行训练，loss值降下来了，但是拿训练出的模型检测不到结果，下面是我的图，要检测表盘数字区域，这可能会是什么原因呢

识别的图片是否有要求呢？

作者您好，我在运行您的项目识别一些图片时出现cannot identified image file错误，请问这是什么原因呢？是否对输入的图片有要求？

请问这个项目和chinese-ocr的那个项目一样吗

最近在研究ocr方面，请问作者有好的数据集可以分享吗

Anyway possible to use this in Android in offline mode?

Hello, I am new to PyTorch and image manipulation. I was wondering if this codebase can be linked with Android code base as listed here: https://pytorch.org/mobile/home.

Thank you.

关于重新训练自己的数据集，预测为空的问题

作者您好。我按照数据集的规范完成我自己的数据集格式放上去之后，训练代码可以运行，但是预测结果一直是空字符，我已经更新过alphabet.pkl文件想咨询一下是哪里出了问题？

Multi-GPU training

@courao
How to train CTPN using 2 GPUs?

Missing key(s) in state_dict ,the weigh are mismatch for crnn.py

Hi, I use the weight file you provided, and modify only the 100th line of the crnn_recognizer.py file to def init(self, model_path='checkpoints/CRNN.pth'). When I execute ' python demo.py' command is an error, the display is as follows

Traceback (most recent call last):
File "demo.py", line 2, in
from ocr import ocr
File "/media/hgh/HGH_30/plate/ocr.pytorch-master/ocr.py", line 6, in
recognizer = PytorchOcr()
File "/media/hgh/HGH_30/plate/ocr.pytorch-master/recognize/crnn_recognizer.py", line 111, in init
self.model.load_state_dict(torch.load(model_path))
File "/mnt/home/hgh/anaconda2/envs/py3_torch4.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN_v2:
Missing key(s) in state_dict: "conv1_1.weight", "conv1_1.bias", "bn1_1.weight", "bn1_1.bias", "bn1_1.running_mean", "bn1_1.running_var", "conv1_2.weight", "conv1_2.bias", "bn1_2.weight", "bn1_2.bias", "bn1_2.running_mean", "bn1_2.running_var", "conv2_1.weight", "conv2_1.bias", "bn2_1.weight", "bn2_1.bias", "bn2_1.running_mean", "bn2_1.running_var", "conv2_2.weight", "conv2_2.bias", "bn2_2.weight", "bn2_2.bias", "bn2_2.running_mean", "bn2_2.running_var", "conv3_1.weight", "conv3_1.bias", "bn3_1.weight", "bn3_1.bias", "bn3_1.running_mean", "bn3_1.running_var", "conv3_2.weight", "conv3_2.bias", "bn3_2.weight", "bn3_2.bias", "bn3_2.running_mean", "bn3_2.running_var", "conv4_1.weight", "conv4_1.bias", "bn4_1.weight", "bn4_1.bias", "bn4_1.running_mean", "bn4_1.running_var", "conv4_2.weight", "conv4_2.bias", "bn4_2.weight", "bn4_2.bias", "bn4_2.running_mean", "bn4_2.running_var", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param of torch.Size([5835, 512]) from checkpoint, where the shape is torch.Size([5997, 512]) in current model.
size mismatch for rnn.1.embedding.bias: copying a param of torch.Size([5835]) from checkpoint, where the shape is torch.Size([5997]) in current model.

Does the weight file you provide correspond to the network? Thanks!

Format for Label files in CTPN Training

Hi,
Kindly please provide example file for Label files for training CTPN

Data used to train pretrained model

Can you describe more about the data used to train your pretrained model? About the language, number of samples, label format...

请问可以公开下您训练CRNN的数据集么？

您好，我对 CRNN 训练数据有一些疑惑，如果我训练数据存在大量的尺度不同（图片的宽度 w ，存在一定差异（8px - 300px）），我看了您的训练代码，数据那部分一个batch 使用了 padding 空白操作，选取 batch 中最大的 w，将小于宽度 w，全进行padding，这样的padding对识别有影响么

运行demo.py的时候，出现报错。。。

C:\Users\Administrator\AppData\Local\Programs\Python\Python37\python.exe C:/Users/Administrator/Desktop/pytorch/ocr.pytorch-master/demo.py
Traceback (most recent call last):
./test_result\test_images\t1.txt
File "C:/Users/Administrator/Desktop/pytorch/ocr.pytorch-master/demo.py", line 29, in
txt_f = open(txt_file, 'w')
FileNotFoundError: [Errno 2] No such file or directory: './test_result\test_images\t1.txt'

Process finished with exit code 1

我看到demo.py运行之后会清空掉test_result文件夹里的内容，然后就报这个错误。。。
求up主帮忙看一下。。

请问你这个训练多少个epoch呢？

我使用项目中的训练代码对icdar2015进行训练，发现训练过程中预测的pred为空，不知道是怎么回事，导致准确率为0.另外我使用CRNN.pth以及CRNN-1010.pth作为预训练模型，在online_test中进行测试，发现准确率也十分低，请问有人遇到过这种情况吗

I made a pytorch-lightning implementation of your CTPN

Hi! I've made a pytorch-lightning implementation of ctpn, mainly by using your code. Pytorch-lightning has many nice features, such as training with tpus/multiple gpus by changing one line of code, 16-bit precision, works on cpu (nice for testing), automatic learning rate finder... Would you be open to a pull request? Link to fork here! I'm in the process of converting your CRNN to pytorch-lightning as well.

Here's the simplified training loop:

datamodule = ICDARDataModule(
        config.icdar17_mlt_img_dir,
        config.icdar17_mlt_gt_dir,
        batch_size=1,
        num_workers=config.num_workers,
        shuffle=True,
    )

len_train_dataset = len(datamodule.train_data)

model = CTPN_Model()

trainer = pl.Trainer(gpus=1, # number of gpus, 0 if you want to use cpu
                       max_epochs=max_epochs,
                       log_every_n_steps=1,
                       callbacks=[LoadCheckpoint(config.pretrained_weights),
                                  InitializeWeights(),
                                  LossAndCheckpointCallback(config, len_train_dataset)])

trainer.fit(model, datamodule)