chenjun2hao / srn.pytorch Goto Github PK

Unofficial PyTorch implementation of Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Python 100.00%

srn ocr-recognition cvpr2020

srn.pytorch's Introduction

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Unofficial PyTorch implementation of the paper, which integrates not only global semantic reasoning module but also parallel visual attention module and visual-semantic fusion decoder.the semanti reasoning network(SRN) can be trained end-to-end.

At present, the accuracy of the paper cannot be achieved. And i borrowed code from deep-text-recognition-benchmark

model

result

IIIT5k_3000	SVT	IC03_860	IC03_867	IC13_857	IC13_1015	IC15_1811	IC15_2077	SVTP	CUTE80
84.600	83.617	92.907	92.849	90.315	88.177	71.010	68.064	71.008	68.641

total_accuracy: 80.597

Feature

predict the character at once time
DistributedDataParallel training

Requirements

Pytorch >= 1.1.0

Test

download the evaluation data from deep-text-recognition-benchmark
download the pretrained model from Baidu, Password: d2qn
test on the evaluation data

python test.py --eval_data path-to-data --saved_model path-to-model

Train

download the training data from deep-text-recognition-benchmark
training from scratch

python train.py --train_data path-to-train-data --valid-data path-to-valid-data

Reference

difference with the origin paper

use resnet for 1D feature not resnetFpn 2D feature
use add not gated unit for visual-semanti fusion decoder

other

It is difficult to achieve the accuracy of the paper, hope more people to try and share

srn.pytorch's People

Contributors

Stargazers

Watchers

srn.pytorch's Issues

论文中使用的TRW15测试集能上传下吗？数据集无法下载

pytorch模型转pt时，发生错误

小菜一枚，首先非常感谢作者的开源，我跑了个模型，想转为C++能调用的pt模型，但是，torch.jit.trace()好像无法trace numpy操作以及for操作，具体原因是否如此，pytorch git上好像也没有找到合理的解释，有些numpy转为torch是可以的，但是还是无法全部转换，请问是否有完全利用torch替换numpy的方案呢，比如for循环的计算

用百度中文数据不收敛

用百度的中文数据来做中文识别，没有改相关配置，但是不收敛，loss降到15后就没有下降的趋势，请问有大佬做过相关的实验么？

The PVAM module is different with paddleOcr

acc is always 0

acc 0?

Alternative to download pre-trained model

Can you please share the model such that people with no Baidu account can download the model

why don't use resnet50FPN?

关于论文中的attention map 可视化

感谢您的工作与分享，有一个问题困扰我很久，就是如何绘制论文中的attention map ，如何将这一过程可视化。你能方便告诉我方法吗？或者您能开源绘图的code我将不胜感激，谢谢您

你好，想问一下从头开始训练，数据集的格式应该是什么样的呢

我用MJ数据集也没有训起来，报错：assert len(datasets) > 0, 'datasets should not be an empty iterable' ，这是因为什么呢

没有复现论文的精度是否是训练集不同导致的？

论文原文【The proposed model is trained only on two synthetic datasets, namely Synth90K [13, 14] and SynthText [9] without finetuning on other datasets.】，论文用了Synth90K和SynthText来做训练集。

https://github.com/clovaai/deep-text-recognition-benchmark 这个代码库用的是MJSynth (MJ) and SynthText (ST)这两个数据集来训练的。

已有参考

https://github.com/PaddlePaddle/PaddleOCR，项目中已经实现，不知是否有价值

默认 opt.input_channel = 1 ？

默认用灰度图片做inference吗？这样感觉会降低性能

GSRB中的argmax模块是否可导？应该不可导吧

why do we use two padding tokens ('$', and '#')

Thanks a lot for sharing the awesome work :)

Here I'm a little confused about the padding tokens, from the paper, it seems the author uses only one token to pad the sentence to the batch_max_length, and I also experimented it with your implementation, however, it turns out that your implementation with two tokens produces much better result.

So I'm wondering if there is any specific reason why we chose the two-token method? Any information will be greatly appreciated.

copying a param with shape torch.Size([38, 512]) from checkpoint, the shape in current model is torch.Size([39, 512]).

你好博主，我在跑demo程序的时候，使用的iter_65000.pth模型，结果出来copying a param with shape torch.Size([38, 512]) from checkpoint, the shape in current model is torch.Size([39, 512]).这个报错，demo默认的模型因该是iter_30000这个，是我导入模型的问题，还是需要调节哪边参数，谢谢博主

关于中文文本识别率

作者你好，非常感谢能够开源代码，目前正在拜读中。这篇论文我一直在关注，因为论文中的实验效果确实不错，尤其是对中文形近字看起来是个不错的解决方案。请问你在中文数据集上测评过模型效果吗

和腾讯的2DAttentionalIrregularSceneTextRecognizer

作者你好，我在看论文的时候有些问题想请教下，比如PVAM 中的reading order加入有什么必要原因吗，另外总觉得这篇和2DAttentionalIrregularSceneTextRecognizer这一篇的思路很相近，看起来堆叠了更多的attention......

What is your license?

没达到论文的acc是因为训练集?

我注意到这个训练集中Synthtext数据集并不是完整的数据集,只有200多万张图片

where can download the "BAIDU" datasets?

How to change module to fix more image size.

It seems to have resize the image to width to 100,.
if the image isn't resize to width to 100, the training process will report the RuntimeError.
The size pf tensor a (65) must match the size of tensor b (26) at non-singleton dimension 1

File SRN_modules.py line 68, in forward
return x + self.pos_table[:, :x.size(1)].clone().detach()

How could i change to module to fit more kind of image size?

File "SRN_modules.py", line 65, in forward return x + self.pos_table[:, :x.size(1)].clone().detach() RuntimeError: The size of tensor a (320) must match the size of tensor b (256) at non-singleton dimension 1

您好，大神，很感谢您的开源代码，非常棒，我有一个问题想问一下，n_position对应的维度是视觉特征提取后的w（宽度）吗？但是宽度是变化的，这样很容易报错了？大神我的理解对吗？

File "SRN_modules.py", line 65, in forward return x + self.pos_table[:, :x.size(1)].clone().detach() RuntimeError: The size of tensor a (320) must match the size of tensor b (256) at non-singleton dimension 1

@chenjun2hao

训练PAD问题

你好，用该跨框架训练中文场景OCR数据，训练SRN loss一直不会降低，更换ctcloss也是不行，大佬有训练过没，求指导交流交流？训练PAD论文是使用EOS填充，做交叉熵loss的时候ignor_index=PAD,字符行最后字符EOS求loss也忽略吗？