Comments (23)
@Xiangyu-CAS For the ICDAR15, we used a simple sampling strategy that allows the network to be able to output word-level bounding boxes directly. We increased the number of negative samples collected from the spaces between words. For example, we controlled the ratio of positive samples, negative ones from background and between words as (0.5, 0.4, 0.1) in each batch. This encourages the model to directly output word-level bboxes without further post-processing. In this work, we aim to provide a fundamental solution for text detection. We believe that the performance on ICDAR15 could be improved considerably by using a more powerful approach for word splitting, and enabling our method to handle multi-oriented texts. Thank you:-).
from ctpn.
@398766201
hi,I had a trouble when i compiled the caffe with cuDNN5.0, the problem described in #9.
Did you have the same problem when you compiled the caffe with cuDNN5.0?
Thank you!
from ctpn.
Yes, I encounter the same problem. As the author said, cuDNN 5.0 is not compatible , this projects is based on cuDNN 3.0. So I did not use cuDNN, just comment use-cuDNN statement in makefile.config.
using without cuDNN will not reduce performance and processing speed, the only cost is much more GPU memory
from ctpn.
@Xiangyu-CAS
thank you very much for your answer! 😁
from ctpn.
@tianzhi0549 Thank you very much for your reply, it's so kind of you!
from ctpn.
@tianzhi0549 Thanks a lot for your work, and your kindly reply!
If I read this right, when two or more tilted lines close to each other, the word-splitting style solution may still result bbox containing nearby characters, which will affect the recognition accuracy. Also, for Chinese texts, the lines are always quite long and without any space in-between, thus the bbox will be full of background noise or nearby characters. Is it possible to get tightly bounded bbox in those cases? Thank you!
from ctpn.
@crazylyf it is still an open problem to handle these complicated cases perfectly. The method cannot produce bounding polygons and therefore it cannot fit the text line well if the text line is too inclined. If your goal is to detect multi-oriented text, I suggest that you could try the methods that are originally designed for multi-oriented text. Thank you:-).
from ctpn.
Get it. Thank you~
from ctpn.
@tianzhi0549 Hello! I also faced this problem, very wondering how to sample the space regions between words? Collect them by hand cropping or using some algorithms?
from ctpn.
@LearnerInGithub In my implementation, I obtained space region by algorithms. If the two ground truth boxes are approximately in a line (judge by IoU in vertical), and there's no words in the region between them, the region is selected as space region. I implement this algorithm by two naive loops, traverse all the gt boxess.
from ctpn.
@Xiangyu-CAS Thank your for your sharing algorithm, it's looks very intuitive, I will try to add it to my CTPN code.
from ctpn.
@Xiangyu-CAS A new question comes, how to feed the picked space reegion into the minibatches? I find in the origin Faster-RCNN implementation, it only store the non-background class bbox in gt_boxes, so how can I add my picked space regions into the input mini-batches?
from ctpn.
@LearnerInGithub The same way as non-background bbox. First, You get gt box of space region. Second, anchors which overlap with space gt box > 0.5 was labeled as negative anchors. Third, the ratio of space negative anchor is 10% and ratio background anchor is 40%.
from ctpn.
@Xiangyu-CAS Aquestion about testing, I test the CTPN pre-trained model on ICDAR2013, however, it only give 0.002 AP, so I am very confusing about this, have you tested your model on ICDAR2013 and give me advices?
from ctpn.
@LearnerInGithub In test code provided by CTPN, it resize image to a fixed sized, so does bbox coordinates. revise demo.py in this way, you will obtained right bboxes in orignial size.
im, f=resize_im(im, cfg.SCALE, cfg.MAX_SCALE)
write_result(RESULT_DIR,im_name,text_lines/f)
from ctpn.
@Xiangyu-CAS Yes, now the testing result improved, about 20%, but still far from the paper reported 88%, so what need I to do if I want to get the 80%+ testing result by using the pre-trained model?
from ctpn.
@LearnerInGithub That‘s the only modificatioin I had done on test code and I got 87.5% directly. I suppose it casuse by bbox coordinates mismatch, may be you should check it carefully
from ctpn.
@Xiangyu-CAS I switched to testing by using CTPN test module, and now it works fine, but still have problem with using the test module of py-faster-rcnn, maybe they have different evaluating standard.
from ctpn.
@Xiangyu-CAS Now I want to trained the model on ICDAR2015, and I convert the GT of ICDAR2015 from (x1, y1, x2, y2, x3, y3,x4,y4) to (xmin, ymin, W, H), and the test result of the trained model on ICDAR2015 is abnormal low, only about 10%, and I visualize the detection results, found that there many redundant space between the text and detected bbox, so I am wondering how you handle the ICDAR2015's GT to let CTPN could train on it?
from ctpn.
@LearnerInGithub. To be honest, my model failed on ICDAR2015 too, only 50%. I got confused by your description "redundant space between the text and detected bbox,". Do you mean the detection bbox detect target text correctly, but failed on accurate localization? You can try to dived GT into tilt bbox sequence. . Space sample is gona be help too. However , the performance still far away from 60%. I think tianzhi owe us a lot of details.
Faster RCNN is much more promising than CTPN in ICDAR2015. Few papers had been released to deal with ICDAR2015.
"Arbitrary-Oriented Scene Text Detection via Rotation Proposals"
"Detecting oriented text in natural images by linking segments"
"Deep Direct Regression for Multi-Oriented Scene Text Detection" strongly recomend, it achieved 83%
on ICDAR2015 91% on ICDAR2013,which is state-of-the-art, ranked first on competition website.
from ctpn.
@Xiangyu-CAS Yes, that's my meaning. I watched the detection results one by one, the detected bbox too large even though they put text region inside.
from ctpn.
@Xiangyu-CAS I have downloaded the paper, and roughly look through it, the result really seems good! But I also notice that a team from CASIA called NLPR_CASIA, they got 82.76 % 84.76 % 83.75 %, now it's the No.1. Not make sure whether the paper "Deep Direct Regression for Multi-Oriented Scene Text Detection" is their work...
from ctpn.
@LearnerInGithub As I mentioned , you might trained your CTPN model by horizontal bbox sequences, as a result you obtained detecting result in horizontal bbox sequences. BTW, the proposal connection function should be revised to output tilt rectangle.
That paper is the publication of NLPR_CASIA, you can check out the organization.
from ctpn.
Related Issues (20)
- cuda版本和cudnn版本问题 HOT 4
- Any other implemetation HOT 2
- please reveal details of training ,thanks? HOT 1
- Optimize Convolution layers with MobileNet HOT 2
- Which research paper is used to implement this? HOT 1
- How to generate training labels?
- demo link died HOT 1
- Bath size when training LSTM ? HOT 2
- dimention
- why the result use the author's model is lower the the artical.
- 项目很不友好,编译麻烦 HOT 1
- The text_recs returned has a problem HOT 2
- Can you share your dataset that mentioned in the article? HOT 1
- what is the dataset to get the trained model? HOT 4
- 运行问题 HOT 5
- Issue while running demo.py "Creating layer proposal" HOT 2
- Pytorch vision? HOT 3
- 你好,是否能提供预训练模型下载呢 HOT 1
- error:Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, Concat, ContrastiveLoss, Convolution, Crop, Data, HOT 2
- 请问能提供训练数据集吗
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctpn.