Git Product home page Git Product logo

Comments (23)

tianzhi0549 avatar tianzhi0549 commented on June 18, 2024 1

@Xiangyu-CAS For the ICDAR15, we used a simple sampling strategy that allows the network to be able to output word-level bounding boxes directly. We increased the number of negative samples collected from the spaces between words. For example, we controlled the ratio of positive samples, negative ones from background and between words as (0.5, 0.4, 0.1) in each batch. This encourages the model to directly output word-level bboxes without further post-processing. In this work, we aim to provide a fundamental solution for text detection. We believe that the performance on ICDAR15 could be improved considerably by using a more powerful approach for word splitting, and enabling our method to handle multi-oriented texts. Thank you:-).

from ctpn.

junedgar avatar junedgar commented on June 18, 2024

@398766201
hi,I had a trouble when i compiled the caffe with cuDNN5.0, the problem described in #9.
Did you have the same problem when you compiled the caffe with cuDNN5.0?
Thank you!

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

Yes, I encounter the same problem. As the author said, cuDNN 5.0 is not compatible , this projects is based on cuDNN 3.0. So I did not use cuDNN, just comment use-cuDNN statement in makefile.config.
using without cuDNN will not reduce performance and processing speed, the only cost is much more GPU memory

from ctpn.

junedgar avatar junedgar commented on June 18, 2024

@Xiangyu-CAS
thank you very much for your answer! 😁

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@tianzhi0549 Thank you very much for your reply, it's so kind of you!

from ctpn.

crazylyf avatar crazylyf commented on June 18, 2024

@tianzhi0549 Thanks a lot for your work, and your kindly reply!
If I read this right, when two or more tilted lines close to each other, the word-splitting style solution may still result bbox containing nearby characters, which will affect the recognition accuracy. Also, for Chinese texts, the lines are always quite long and without any space in-between, thus the bbox will be full of background noise or nearby characters. Is it possible to get tightly bounded bbox in those cases? Thank you!

from ctpn.

tianzhi0549 avatar tianzhi0549 commented on June 18, 2024

@crazylyf it is still an open problem to handle these complicated cases perfectly. The method cannot produce bounding polygons and therefore it cannot fit the text line well if the text line is too inclined. If your goal is to detect multi-oriented text, I suggest that you could try the methods that are originally designed for multi-oriented text. Thank you:-).

from ctpn.

crazylyf avatar crazylyf commented on June 18, 2024

Get it. Thank you~

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@tianzhi0549 Hello! I also faced this problem, very wondering how to sample the space regions between words? Collect them by hand cropping or using some algorithms?

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub In my implementation, I obtained space region by algorithms. If the two ground truth boxes are approximately in a line (judge by IoU in vertical), and there's no words in the region between them, the region is selected as space region. I implement this algorithm by two naive loops, traverse all the gt boxess.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS Thank your for your sharing algorithm, it's looks very intuitive, I will try to add it to my CTPN code.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS A new question comes, how to feed the picked space reegion into the minibatches? I find in the origin Faster-RCNN implementation, it only store the non-background class bbox in gt_boxes, so how can I add my picked space regions into the input mini-batches?

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub The same way as non-background bbox. First, You get gt box of space region. Second, anchors which overlap with space gt box > 0.5 was labeled as negative anchors. Third, the ratio of space negative anchor is 10% and ratio background anchor is 40%.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS Aquestion about testing, I test the CTPN pre-trained model on ICDAR2013, however, it only give 0.002 AP, so I am very confusing about this, have you tested your model on ICDAR2013 and give me advices?

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub In test code provided by CTPN, it resize image to a fixed sized, so does bbox coordinates. revise demo.py in this way, you will obtained right bboxes in orignial size.

im, f=resize_im(im, cfg.SCALE, cfg.MAX_SCALE)

write_result(RESULT_DIR,im_name,text_lines/f)

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS Yes, now the testing result improved, about 20%, but still far from the paper reported 88%, so what need I to do if I want to get the 80%+ testing result by using the pre-trained model?

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub That‘s the only modificatioin I had done on test code and I got 87.5% directly. I suppose it casuse by bbox coordinates mismatch, may be you should check it carefully

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS I switched to testing by using CTPN test module, and now it works fine, but still have problem with using the test module of py-faster-rcnn, maybe they have different evaluating standard.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS Now I want to trained the model on ICDAR2015, and I convert the GT of ICDAR2015 from (x1, y1, x2, y2, x3, y3,x4,y4) to (xmin, ymin, W, H), and the test result of the trained model on ICDAR2015 is abnormal low, only about 10%, and I visualize the detection results, found that there many redundant space between the text and detected bbox, so I am wondering how you handle the ICDAR2015's GT to let CTPN could train on it?

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub. To be honest, my model failed on ICDAR2015 too, only 50%. I got confused by your description "redundant space between the text and detected bbox,". Do you mean the detection bbox detect target text correctly, but failed on accurate localization? You can try to dived GT into tilt bbox sequence. . Space sample is gona be help too. However , the performance still far away from 60%. I think tianzhi owe us a lot of details.
Faster RCNN is much more promising than CTPN in ICDAR2015. Few papers had been released to deal with ICDAR2015.
"Arbitrary-Oriented Scene Text Detection via Rotation Proposals"
"Detecting oriented text in natural images by linking segments"
"Deep Direct Regression for Multi-Oriented Scene Text Detection" strongly recomend, it achieved 83%
on ICDAR2015 91% on ICDAR2013,which is state-of-the-art, ranked first on competition website.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS Yes, that's my meaning. I watched the detection results one by one, the detected bbox too large even though they put text region inside.

from ctpn.

LearnerInGithub avatar LearnerInGithub commented on June 18, 2024

@Xiangyu-CAS I have downloaded the paper, and roughly look through it, the result really seems good! But I also notice that a team from CASIA called NLPR_CASIA, they got 82.76 % 84.76 % 83.75 %, now it's the No.1. Not make sure whether the paper "Deep Direct Regression for Multi-Oriented Scene Text Detection" is their work...

from ctpn.

Xiangyu-CAS avatar Xiangyu-CAS commented on June 18, 2024

@LearnerInGithub As I mentioned , you might trained your CTPN model by horizontal bbox sequences, as a result you obtained detecting result in horizontal bbox sequences. BTW, the proposal connection function should be revised to output tilt rectangle.

That paper is the publication of NLPR_CASIA, you can check out the organization.

from ctpn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.