Git Product home page Git Product logo

Comments (13)

chunhui999 avatar chunhui999 commented on May 26, 2024

@tonghe90 I have another question. If I use ICDAR2015, how to generate the data about "mask_gt" and "mask_iou_angle". Looking forward to your reply.

from textspotter.

chunhui999 avatar chunhui999 commented on May 26, 2024

@tonghe90 看完代码我发现文本识别部分的文本标签也包含在gt_bbox中。对于某个gt_bbox,其前8个元素表示bbox坐标,第9个元素表示文本标签的长度,从10开始的label_len个元素表示文本标签,这里完整的标签文本分为单个的元素,其类型是什么,是如何转换的?
layer {
name: 'iou_maps_angles'
type: 'Python'
bottom: 'gt_bbox'
top: 'rois'
top: 'sample_gt_cont'
top: 'sample_gt_label_input'
top: "sample_gt_label_output"
......
}

from textspotter.

crazysal avatar crazysal commented on May 26, 2024

mask_gt is generated only for dataset having character level annotation : Synthtext. Check section 2.3 of paper for training strategy.

mask_iou_angle is generated from output of East proposals in case of rbox (rotated rectangle bounding box) - Output of east is distances of pixel from sides of quadrilateral and angle in 5 channels.

sample_gt_cont is vector of shape of gt labels having zeroes and ones, used for continuity of hidden state of lstm : multiply 0 to hidden state, when start of predict new box, rest values 1.

sample_gt_label_input : one hot encoding or character embedding of each label from groundtruth - shape also used to pad max length of sequence when less than 25 .

sample_gt_label_output : similar as above but for during inference time. used to keep track of how many decoder samples to predict as fed into previous input.

Please correct me if i'm wrong ??

from textspotter.

chunhui999 avatar chunhui999 commented on May 26, 2024

@crazysal Thanks for your reply. I think you are right, and it helps me a lot.

from textspotter.

chunhui999 avatar chunhui999 commented on May 26, 2024

@crazysal Could you tell me how to deal with text labels, and what's the format of text label in gt_bbox?

from textspotter.

wenston2006 avatar wenston2006 commented on May 26, 2024

@crazysal 有没有成功复现训练部分的代码,我基于@tonghe的代码尝试复现训练部分的代码,但遇到segmentation fault的问题,

from textspotter.

wenston2006 avatar wenston2006 commented on May 26, 2024

@chunhui999 @crazysal 细看代码发现, 前面8个是坐标,第十个是标签长度, 第九个没用上,不知是不是我弄错了;python 里面元素下标从0开始的,

from textspotter.

wenston2006 avatar wenston2006 commented on May 26, 2024

@crazysal 数据层我修改了@argman的east python数据层, 我把loss_4s和iou_loss都注释掉了,只训练文字识别的softmaxloss; 但不知为何出现内存溢出的问题;不知你的数据层用什么代码编写的;不知你的数据层怎么编写的? 在@tonghe给的代码基础上,加上自己的数据层和iou_loss层是否就可以成功训练了?

from textspotter.

chunhui999 avatar chunhui999 commented on May 26, 2024

@wenston2006 下标索引你说的是对的,我之前忽略了这个问题。那么假设忽略第9个元素,其他的前移,那么你的gt_label格式是这样吗?(x1, y1, x2, y2, x3, y3, x4,y4, len, 't', 'e', 'x', 't')

from textspotter.

wenston2006 avatar wenston2006 commented on May 26, 2024

@chunhui999 我的理解是这样的,但我目前训练时遇到内存溢出(segmentation fault)的问题; 目前还不清楚是数据层还是别的层存在问题;

from textspotter.

chunhui999 avatar chunhui999 commented on May 26, 2024

@wenston2006 我也遇到了内存溢出的问题,应该是输入图片尺寸的问题,我把resize尺寸改小了一倍(参照之前测试当中遇到的内存溢出问题),就可以训练了。

from textspotter.

ustczhouyu avatar ustczhouyu commented on May 26, 2024

@wenston2006 请问你训练成功了吗?结果怎么样?

from textspotter.

ZDDEAN avatar ZDDEAN commented on May 26, 2024

请问如何能分享一下synthtext格式转换为icdar格式的脚本吗,谢谢鸭

from textspotter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.