Git Product home page Git Product logo

Comments (26)

summerwbb avatar summerwbb commented on June 1, 2024

你好 请问我的label应该是什么格式的

from ocr.pytorch.

 avatar commented on June 1, 2024

@courao Can you release the CRNN training code.

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@deepseek Thank you for your attention, I'll update this repository at the end of my internship(about 20 Sep.).

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@summerwbb
训练ctpn时,由于历史原因,我们直接用两个文件夹直接存放待训练的图片和对应的label文件,label文件名是其对应的图片文件名前加上'gt_',以'txt'作为后缀存储。
label文件中存放的内容参照ICDAR17MLT数据集,样例如下:

​​x1,y1,x2,y2,x3,y3,x4,y4,Chinese,你好

每行前8个数为分别代表左上右上右下左下四个点的坐标,第9个位语种,第10个位具体的文字内容,不同的实例之间用换行符分隔。

from ocr.pytorch.

summerwbb avatar summerwbb commented on June 1, 2024

对于VOC格式我是应该输入 xmin,ymin,xmax,ymax,(x对应图像的宽)吗?还是应该转换成宽是16像素的label

from ocr.pytorch.

summerwbb avatar summerwbb commented on June 1, 2024

可以加微信吗 w1432128357

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@summerwbb VOC格式可以参考这个代码库pytorch_ctpn,我用的是icdar的格式,所以不太了解voc的解析

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@CCxiaoLL gif格式应该是动图吧,可以把这些图删了,占比不大应该没啥影响

from ocr.pytorch.

summerwbb avatar summerwbb commented on June 1, 2024

@courao 非常感谢您的回答! 我现在可以训练上了。

from ocr.pytorch.

lofyol avatar lofyol commented on June 1, 2024

你好,我训练CTPN时候,到Ep:1/29--Batch:100/10000 就会报错,AttributeError: 'NoneType' object has no attribute 'shape' ,我用的2019MLT数据训练的,不知道为啥出错???

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@CCxiaoLL
改一下这一块代码,这是因为如果读取数据异常(比如图片毁坏啥的)时,我会用一张默认图片替代
#####for read error, use default image#####
if img is None:
print(img_path)
with open('error_imgs.txt','a') as f:
f.write('{}\n'.format(img_path))
img_name = 'img_2647.jpg'
img_path = os.path.join(self.datadir, img_name)
img = cv2.imread(img_path)
#####for read error, use default image#####
这里我用的是'img_2647.jpg',你可以换个别的应该就没问题

from ocr.pytorch.

lofyol avatar lofyol commented on June 1, 2024

你好,还有个问题请教下,就是训练到Batch:331/10000的时候,在for batch_i, (imgs, clss, regrs) in enumerate(dataloader)处报错了,typeerror: function takes exactly 5 arguments(1 given), 这种错误会是啥情况造成的呀?是dataloader那里的问题吗?

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@CCxiaoLL 这个不太清楚诶,方便的话可以在dataset.py中每次输出当前图像的文件名,然后定位到是哪个图片报错,把图和label发一下我们可以帮你一起定位一下问题

from ocr.pytorch.

lofyol avatar lofyol commented on June 1, 2024

我在ICDARDatasets类中的getitem函数中打印图片路径,
exist pretrained FalseEpoch 1/30##################################################
./images/tr_img_00001.jpg
./images/tr_img_00002.jpg
./images/tr_img_00003.jpg
Ep:1/29--Batch:0/10000batch: loss_cls:0.7115--loss_regr:0.1776--loss:0.8892Epoch: loss_cls:0.7115--loss_regr:0.1776--loss:0.8892Ep:1/29--Batch:1/10000batch: loss_cls:0.7349--loss_regr:0.2523--loss:0.9872Epoch: loss_cls:0.7232--loss_regr:0.2150--loss:0.9382
./images/tr_img_00004.jpg
Ep:1/29--Batch:2/10000batch: loss_cls:0.7296--loss_regr:0.1385--loss:0.8682Epoch: loss_cls:0.7253--loss_regr:0.1895--loss:0.9148
./images/tr_img_00005.jpg
Ep:1/29--Batch:3/10000batch: loss_cls:0.7352--loss_regr:0.2917--loss:1.0269Epoch: loss_cls:0.7278--loss_regr:0.2151--loss:0.9429
./images/tr_img_00006.jpg
./images/tr_img_00007.jpg
Ep:1/29--Batch:4/10000batch: loss_cls:0.7357--loss_regr:0.2526--loss:0.9883Epoch: loss_cls:0.7294--loss_regr:0.2226--loss:0.9519Ep:1/29--Batch:5/10000batch: loss_cls:0.7288--loss_regr:0.2467--loss:0.9755Epoch: loss_cls:0.7293--loss_regr:0.2266--loss:0.9559
一直到报错,就是这种,img_path和batch_i没有对应上。我更改配置文件中的num_workers也没用,都是在Batch: 331/10000结束报错

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@CCxiaoLL
./images/tr_img_00331.jpg前后的3张图片和label发一下吧,估计是这几张某张label有点问题

from ocr.pytorch.

lofyol avatar lofyol commented on June 1, 2024

我就只是在train_ctpn文件夹中加了images文件和labels文件,然后配置文件中的icdar17_mlt_img_dir和gt_dir改成对应路径,datasets文件中改了gt_path,ctpn_train文件中把shuffle改成了false,其他就什么都没改了。我用的2019MLT的训练数据,我太渣了,不晓得咋个传图片和label上来,所以用百度链接吧,https://pan.baidu.com/s/1-D5Aztiz847JP2q12rt9JQ ,麻烦了,感谢

from ocr.pytorch.

lofyol avatar lofyol commented on June 1, 2024

我想问一下,我把报错位置的图片和标签删了,就能继续训练了,直到下一处报错位置,有一点不懂就是,我下载的2019MLT数据,感觉报错位置的图片和标签和其他的图片标签没区别呀,请问你知不知道什么原因造成的???

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

@CCxiaoLL 你的数据我看了也测试了,暂时没发现啥问题,你把numworker置成0就可以让img_path和batch_i对应上了。把报错位置的图片和标签删了总归还是治标不治本,所以如果可以话你能否具体定位一下是哪个图片有问题?

from ocr.pytorch.

foocker avatar foocker commented on June 1, 2024

i'm waitting for your release. thx.
i have completed it....emmm

from ocr.pytorch.

courao avatar courao commented on June 1, 2024

Hi, training codes for CRNN is updated.
Recognition api is also updated as well.

from ocr.pytorch.

17350220163 avatar 17350220163 commented on June 1, 2024

你好,我有个问题想请教一下,请问初始的预训练权重是什么?

from ocr.pytorch.

foocker avatar foocker commented on June 1, 2024

Hi, training codes for CRNN is updated.
Recognition api is also updated as well.

will change the detection module to arbitrary direction ?

from ocr.pytorch.

ZHANG-hengzhi avatar ZHANG-hengzhi commented on June 1, 2024

大神们,想问一下为什么我用自己训练的CTPN模型来预测,就检测不出来数据呢,我检测的是银行卡。

from ocr.pytorch.

liyihao1230 avatar liyihao1230 commented on June 1, 2024

楼主您好
想问下ctpn训练这部分

看了代码好像最后ICDAR数据集(x1,y1,x2,y2,x3,y3,x4,y4,Chinese,你好)最后两个 ‘Chinese,你好’ 没有用到
我理解cls代表该每个anchor是否包含文字(0或1),而这个cls是从regr得到的
请问是这样的吗

另外如果自己追加训练数据的时候,如果不添加后两个'Cinese,你好'内容应该不会对效果产生影响吧

from ocr.pytorch.

liyihao1230 avatar liyihao1230 commented on June 1, 2024

继续看了下cls好像跟regr是分开的,但都是通过了一个1*1卷积层之后改变shape,得到的应该是一个1或0分类的概率值吧?

想问下这里用softmax可以吗?
另外只用一个1*1卷积层也是可以的吧

from ocr.pytorch.

xxllp avatar xxllp commented on June 1, 2024

大神们,想问一下为什么我用自己训练的CTPN模型来预测,就检测不出来数据呢,我检测的是银行卡。

你样本标注的是啥

from ocr.pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.