Git Product home page Git Product logo

ocr.pytorch's Introduction

ocr.pytorch

A pure pytorch implemented ocr project.
Text detection is based CTPN and text recognition is based CRNN.
More detection and recognition methods will be supported!

Prerequisite

  • python-3.5+
  • pytorch-0.4.1+
  • torchvision-0.2.1
  • opencv-3.4.0.14
  • numpy-1.14.3

They could all be installed through pip except pytorch and torchvision. As for pytorch and torchvision, they both depends on your CUDA version, you would prefer to reading pytorch's official site

Detection

Detection is based on CTPN, some codes are borrowed from pytorch_ctpn, several detection results: detect1 detect2

Recognition

Recognition is based on CRNN, some codes are borrowed from crnn.pytorch

Test

Download pretrained models from Baidu Netdisk (extract code: u2ff) or Google Driver and put these files into checkpoints. Then run

python3 demo.py

The image files in ./test_images will be tested for text detection and recognition, the results will be stored in ./test_result.

If you want to test a single image, run

python3 test_one.py [filename]

Train

Training codes are placed into train_code directory.
Train CTPN
Train CRNN

Licence

MIT License

ocr.pytorch's People

Contributors

courao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocr.pytorch's Issues

提供的CRNN模型不可用

抱歉这么久打扰,为什么提供的CRNN模型里只有一个可以用呢?
RuntimeError: Error(s) in loading state_dict for CRNN:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias", "conv2.weight", "conv2.bias", "conv3_1.weight", "conv3_1.bias", "bn3.weight", "bn3.bias", "bn3.running_mean", "bn3.running_var", "conv3_2.weight", "conv3_2.bias", "conv4_1.weight", "conv4_1.bias", "bn4.weight", "bn4.bias", "bn4.running_mean", "bn4.running_var", "conv4_2.weight", "conv4_2.bias", "conv5.weight", "conv5.bias", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param with shape torch.Size([5997, 512]) from checkpoint, the shape in current model is torch.Size([5835, 512]).
size mismatch for rnn.1.embedding.bias: copying a param with shape torch.Size([5997]) from checkpoint, the shape in current

更改crnn_recognizer.py报错

您好,我和前面的朋友遇见的问题一样,修改crnn_recognizer.py文件的第100行def init(self, model_path='/root/zjut/ocr.pytorch/checkpoints/CRNN.pth')。当我执行'python demo.py'命令出错,显示如下:
Traceback (most recent call last):
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 432, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2019.11.50794/pythonFiles/lib/python/old_ptvsd/ptvsd/main.py", line 316, in run_file
runpy.run_path(target, run_name='main')
File "/root/anaconda3/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/root/anaconda3/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/root/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/zjut/ocr.pytorch/demo.py", line 10, in
from ocr import ocr
File "/root/zjut/ocr.pytorch/ocr.py", line 6, in
recognizer = PytorchOcr()
File "/root/zjut/ocr.pytorch/recognize/crnn_recognizer.py", line 111, in init
self.model.load_state_dict({k.replace('module.', ''): v for k, v in torch.load(model_path).items()})
File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN:
Missing key(s) in state_dict: "conv1.weight", "conv1.bias", "conv2.weight", "conv2.bias", "conv3_1.weight", "conv3_1.bias", "bn3.weight", "bn3.bias", "bn3.running_mean", "bn3.running_var", "conv3_2.weight", "conv3_2.bias", "conv4_1.weight", "conv4_1.bias", "bn4.weight", "bn4.bias", "bn4.running_mean", "bn4.running_var", "conv4_2.weight", "conv4_2.bias", "conv5.weight", "conv5.bias", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param with shape torch.Size([5997, 512]) from checkpoint, the shape in current model is torch.Size([5835, 512]).
size mismatch for rnn.1.embedding.bias: copying a param with shape torch.Size([5997]) from checkpoint, the shape in current model is torch.Size([5835]).
其中CRNN.pth是您度盘所提供的。

训练CRNN

作者您好,想请教一下,我用自己合成的数据集训练CRNN,loss降下来了,但是拿图片测试不出结果,大概是哪里出了问题呢

对一段代码不解,特来与up主交流

ocr.pytorch/detect/ctpn_predict.py第43行,
image = image.astype(np.float32) - config.IMAGE_MEAN
对这一步的操作的意义(是减均值吗?),以及config.IMAGE_MEAN取值的依据不懂。。。
我是刚入门机器视觉,问题也比较小白,有劳up主了。。

训练模式和预测模式性能差别很大

作者您好!我在使用您的resnet-ctc以及原版cnn-ctc训练准确率均能96%以上,但是验证准确率只有10%。对于同一张图片,无论是训练集还是验证集,model.train()结果还可以,model.eval()结果完全蠢。另外我用的是nn.CTCLoss,这可能是什么问题?感谢回复

CRNN

可否告知CRNN的训练数据格式

CRNN数据集

请问这个模型支持不同图像尺寸,不同label长度的数据集吗?

CRNN

请问,送入crnn网络的数据如果高度大于32,训练crnn时将数据预先resize成高度32,对ctpn检测出的文本,也resize成32的然后再送入训练好的crnn模型,这样子的话是不是针对高度大于32的文本也可以正常进行识别呢?

关于CRNN识别问题

作者您好,请问这个项目中的CRNN可否用来识别特殊符号呢 例如 ℃ 这类字符

无法识别英文段的空格

你好,这个代码我有研究过,有两个问题想请教下:

  1. train_code中提供了crnn训练的三个代码,ctc,ctcV2,与torch版的CTC,请问,实际训练时用的哪个代码?
  2. 我自己也写过类似的工程,但是训练出的模型在识别英文整段语句时,单词和单词之间是相连的,也就没法识别出单词间的空格,我尝试过网上大多数模型在识别英文时也存在这个问题,不过你文中的CRNN-1010.pth这个模型可以识别出一部分空格,请问这是做过特殊处理吗?

Why my accuracy is always 0?

Hi author, i am a beginner and was training the model on my own dataset using your implementation but got the accuracy 0% after every epoch while the training loss kept on decreasing after every epoch. Why is it so?

CRNN可以用在Khmer语言吗?

我尝试训练CRNN模型, 但是我得到的结果一直是:

Not Covering Char: ១ - 6113
Not Covering Char: ១ - 6113
Not Covering Char: ៩ - 6121
Not Covering Char: ៧ - 6119
Not Covering Char: ១ - 6113
Not Covering Char: ទ - 6033
Not Covering Char: ស - 6047
Not Covering Char: រ - 6042
error
Train loss: 0.000000

Start val
~/image0-1.jpg
~/image0-1.jpg
pred :—眯恂
target:គោគ្គនាមនិងនាមៈ កូល វន្ធសហា
0.0
ocr_acc: 0.000000

请问我该如何成功地训练CRNN模型呢?感谢您的解答。

关于自己数据的训练问题

作者你好,在使用您的CTPN网络训练自己的数据的时候出现了一些问题
1.加载预训练模型对自己标注的数据进行训练后,检测效果反而变差了
2.训练CTPN时,出现了loss值不下降的情况
3.输入的图片是需要一定的尺度缩放吗
希望能得到解答

GPU训练

作者您好,我使用您的crnn训练代码在cpu上运行会出现valueerror(weight或height需要大于0),经过修改trans.py中的参数可以解决这一问题。但使用相同的参数在GPU上训练仍会出现valueerror,请问这是什么原因?

ctpn_model_v2.py 是怎么用的?

您好,看了下相比ctpn_model.py里面多了RPN_Loss和rpn_refiment,然而训练代码并未提及这个文件,这个些不同是出于什么方面的考虑呢?谢谢。

感谢大佬回答我的疑问对于crnn数据集的获取 在此分享给有需要的人

训练数据分为两块
1.一个是生成的数据,也就是这个github上也有不少相关工作,
可以在这里下数据集:https://pan.baidu.com/s/1QkI7kjah8SPHwOQ40rS1Pw (密码:lu7m)
自己生成的话可参考这个代码 https://github.com/Sanster/text_renderer
2.还有一块是自己项目用到的一些数据,不过数据量比较小
这一块当然也越多越好,不过获取以及标注难度比较大

识别的图片是否有要求呢?

作者您好,我在运行您的项目识别一些图片时出现cannot identified image file错误,请问这是什么原因呢?是否对输入的图片有要求?

Missing key(s) in state_dict ,the weigh are mismatch for crnn.py

Hi, I use the weight file you provided, and modify only the 100th line of the crnn_recognizer.py file to def init(self, model_path='checkpoints/CRNN.pth'). When I execute ' python demo.py' command is an error, the display is as follows

Traceback (most recent call last):
File "demo.py", line 2, in
from ocr import ocr
File "/media/hgh/HGH_30/plate/ocr.pytorch-master/ocr.py", line 6, in
recognizer = PytorchOcr()
File "/media/hgh/HGH_30/plate/ocr.pytorch-master/recognize/crnn_recognizer.py", line 111, in init
self.model.load_state_dict(torch.load(model_path))
File "/mnt/home/hgh/anaconda2/envs/py3_torch4.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for CRNN_v2:
Missing key(s) in state_dict: "conv1_1.weight", "conv1_1.bias", "bn1_1.weight", "bn1_1.bias", "bn1_1.running_mean", "bn1_1.running_var", "conv1_2.weight", "conv1_2.bias", "bn1_2.weight", "bn1_2.bias", "bn1_2.running_mean", "bn1_2.running_var", "conv2_1.weight", "conv2_1.bias", "bn2_1.weight", "bn2_1.bias", "bn2_1.running_mean", "bn2_1.running_var", "conv2_2.weight", "conv2_2.bias", "bn2_2.weight", "bn2_2.bias", "bn2_2.running_mean", "bn2_2.running_var", "conv3_1.weight", "conv3_1.bias", "bn3_1.weight", "bn3_1.bias", "bn3_1.running_mean", "bn3_1.running_var", "conv3_2.weight", "conv3_2.bias", "bn3_2.weight", "bn3_2.bias", "bn3_2.running_mean", "bn3_2.running_var", "conv4_1.weight", "conv4_1.bias", "bn4_1.weight", "bn4_1.bias", "bn4_1.running_mean", "bn4_1.running_var", "conv4_2.weight", "conv4_2.bias", "bn4_2.weight", "bn4_2.bias", "bn4_2.running_mean", "bn4_2.running_var", "bn5.weight", "bn5.bias", "bn5.running_mean", "bn5.running_var".
Unexpected key(s) in state_dict: "cnn.conv0.weight", "cnn.conv0.bias", "cnn.conv1.weight", "cnn.conv1.bias", "cnn.conv2.weight", "cnn.conv2.bias", "cnn.batchnorm2.weight", "cnn.batchnorm2.bias", "cnn.batchnorm2.running_mean", "cnn.batchnorm2.running_var", "cnn.batchnorm2.num_batches_tracked", "cnn.conv3.weight", "cnn.conv3.bias", "cnn.conv4.weight", "cnn.conv4.bias", "cnn.batchnorm4.weight", "cnn.batchnorm4.bias", "cnn.batchnorm4.running_mean", "cnn.batchnorm4.running_var", "cnn.batchnorm4.num_batches_tracked", "cnn.conv5.weight", "cnn.conv5.bias", "cnn.conv6.weight", "cnn.conv6.bias", "cnn.batchnorm6.weight", "cnn.batchnorm6.bias", "cnn.batchnorm6.running_mean", "cnn.batchnorm6.running_var", "cnn.batchnorm6.num_batches_tracked".
size mismatch for rnn.1.embedding.weight: copying a param of torch.Size([5835, 512]) from checkpoint, where the shape is torch.Size([5997, 512]) in current model.
size mismatch for rnn.1.embedding.bias: copying a param of torch.Size([5835]) from checkpoint, where the shape is torch.Size([5997]) in current model.

Does the weight file you provide correspond to the network? Thanks!

请问可以公开下您训练CRNN的数据集么?

您好,我对 CRNN 训练数据有一些疑惑,如果我训练数据存在大量的尺度不同(图片的宽度 w ,存在一定差异(8px - 300px)),我看了您的训练代码,数据那部分一个batch 使用了 padding 空白操作, 选取 batch 中最大的 w, 将小于宽度 w, 全进行padding, 这样的padding对识别有影响么

运行demo.py的时候,出现报错。。。

C:\Users\Administrator\AppData\Local\Programs\Python\Python37\python.exe C:/Users/Administrator/Desktop/pytorch/ocr.pytorch-master/demo.py
Traceback (most recent call last):
./test_result\test_images\t1.txt
File "C:/Users/Administrator/Desktop/pytorch/ocr.pytorch-master/demo.py", line 29, in
txt_f = open(txt_file, 'w')
FileNotFoundError: [Errno 2] No such file or directory: './test_result\test_images\t1.txt'

Process finished with exit code 1

我看到demo.py运行之后会清空掉test_result文件夹里的内容,然后就报这个错误。。。
求up主帮忙看一下。。

请问你这个训练多少个epoch呢?

我使用项目中的训练代码对icdar2015进行训练,发现训练过程中预测的pred为空,不知道是怎么回事,导致准确率为0.另外我使用CRNN.pth以及CRNN-1010.pth作为预训练模型,在online_test中进行测试,发现准确率也十分低,请问有人遇到过这种情况吗

I made a pytorch-lightning implementation of your CTPN

Hi! I've made a pytorch-lightning implementation of ctpn, mainly by using your code. Pytorch-lightning has many nice features, such as training with tpus/multiple gpus by changing one line of code, 16-bit precision, works on cpu (nice for testing), automatic learning rate finder... Would you be open to a pull request? Link to fork here! I'm in the process of converting your CRNN to pytorch-lightning as well.

Here's the simplified training loop:

datamodule = ICDARDataModule(
        config.icdar17_mlt_img_dir,
        config.icdar17_mlt_gt_dir,
        batch_size=1,
        num_workers=config.num_workers,
        shuffle=True,
    )

len_train_dataset = len(datamodule.train_data)

model = CTPN_Model()

trainer = pl.Trainer(gpus=1, # number of gpus, 0 if you want to use cpu
                       max_epochs=max_epochs,
                       log_every_n_steps=1,
                       callbacks=[LoadCheckpoint(config.pretrained_weights),
                                  InitializeWeights(),
                                  LossAndCheckpointCallback(config, len_train_dataset)])

trainer.fit(model, datamodule)

能不能给出训练部分的代码?

作者您好,我试了一个你给出的模型,效果挺不错的,非常谢谢您给出的代码
期待您的字符识别训练的代码和数据集

证照识别

朋友您好,我直接运行的您这个demo,识别的是一个驾照,可是效果不理想,没有传统版面分割准确率高,请问我这是要重新训练一个数据集吗,我是个菜鸟,调参也不是很明白。请问您有推荐的关于文字识别代码讲解这块的教程吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.