Git Product home page Git Product logo

srn.pytorch's Introduction

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Unofficial PyTorch implementation of the paper, which integrates not only global semantic reasoning module but also parallel visual attention module and visual-semantic fusion decoder.the semanti reasoning network(SRN) can be trained end-to-end.

At present, the accuracy of the paper cannot be achieved. And i borrowed code from deep-text-recognition-benchmark

model

result

IIIT5k_3000 SVT IC03_860 IC03_867 IC13_857 IC13_1015 IC15_1811 IC15_2077 SVTP CUTE80
84.600 83.617 92.907 92.849 90.315 88.177 71.010 68.064 71.008 68.641

total_accuracy: 80.597


Feature

  • predict the character at once time
  • DistributedDataParallel training

Requirements

Pytorch >= 1.1.0

Test

  1. download the evaluation data from deep-text-recognition-benchmark

  2. download the pretrained model from Baidu, Password: d2qn

  3. test on the evaluation data

python test.py --eval_data path-to-data --saved_model path-to-model

Train

  1. download the training data from deep-text-recognition-benchmark

  2. training from scratch

python train.py --train_data path-to-train-data --valid-data path-to-valid-data

Reference

  1. bert_ocr.pytorch
  2. deep-text-recognition-benchmark
  3. 2D Attentional Irregular Scene Text Recognizer
  4. Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

difference with the origin paper

  • use resnet for 1D feature not resnetFpn 2D feature
  • use add not gated unit for visual-semanti fusion decoder

other

It is difficult to achieve the accuracy of the paper, hope more people to try and share

srn.pytorch's People

Contributors

chenjun2hao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

srn.pytorch's Issues

pytorch模型转pt时,发生错误

小菜一枚,首先非常感谢作者的开源,我跑了个模型,想转为C++能调用的pt模型,但是,torch.jit.trace()好像无法trace numpy操作以及for操作,具体原因是否如此,pytorch git上好像也没有找到合理的解释,有些numpy转为torch是可以的,但是还是无法全部转换,请问是否有完全利用torch替换numpy的方案呢,比如for循环的计算

用百度中文数据不收敛

用百度的中文数据来做中文识别,没有改相关配置,但是不收敛,loss降到15后就没有下降的趋势,请问有大佬做过相关的实验么?

关于论文中的attention map 可视化

感谢您的工作与分享,有一个问题困扰我很久,就是如何绘制论文中的attention map ,如何将这一过程可视化。你能方便告诉我方法吗?或者您能开源绘图的code我将不胜感激,谢谢您

why do we use two padding tokens ('$', and '#')

Thanks a lot for sharing the awesome work :)

Here I'm a little confused about the padding tokens, from the paper, it seems the author uses only one token to pad the sentence to the batch_max_length, and I also experimented it with your implementation, however, it turns out that your implementation with two tokens produces much better result.

So I'm wondering if there is any specific reason why we chose the two-token method? Any information will be greatly appreciated.

关于中文文本识别率

作者你好,非常感谢能够开源代码,目前正在拜读中。这篇论文我一直在关注,因为论文中的实验效果确实不错,尤其是对中文形近字看起来是个不错的解决方案。请问你在中文数据集上测评过模型效果吗

和腾讯的2DAttentionalIrregularSceneTextRecognizer

作者你好,我在看论文的时候有些问题想请教下,比如PVAM 中的reading order加入有什么必要原因吗,另外总觉得这篇和2DAttentionalIrregularSceneTextRecognizer这一篇的思路很相近,看起来堆叠了更多的attention......

How to change module to fix more image size.

It seems to have resize the image to width to 100,.
if the image isn't resize to width to 100, the training process will report the RuntimeError.
The size pf tensor a (65) must match the size of tensor b (26) at non-singleton dimension 1

File SRN_modules.py line 68, in forward
return x + self.pos_table[:, :x.size(1)].clone().detach()

How could i change to module to fit more kind of image size?

File "SRN_modules.py", line 65, in forward return x + self.pos_table[:, :x.size(1)].clone().detach() RuntimeError: The size of tensor a (320) must match the size of tensor b (256) at non-singleton dimension 1

您好,大神,很感谢您的开源代码,非常棒,我有一个问题想问一下,n_position对应的维度是视觉特征提取后的w(宽度)吗?但是宽度是变化的,这样很容易报错了?大神我的理解对吗?

File "SRN_modules.py", line 65, in forward return x + self.pos_table[:, :x.size(1)].clone().detach() RuntimeError: The size of tensor a (320) must match the size of tensor b (256) at non-singleton dimension 1

@chenjun2hao

训练PAD问题

你好,用该跨框架训练中文场景OCR数据,训练SRN loss一直不会降低,更换ctcloss也是不行,大佬有训练过没,求指导交流交流?训练PAD论文是使用EOS填充,做交叉熵loss的时候ignor_index=PAD,字符行最后字符EOS求loss也忽略吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.