whai362 / psenet Goto Github PK

View Code? Open in Web Editor NEW

1.2K 55.0 343.0 16.51 MB

Official Pytorch implementations of PSENet.

License: Apache License 2.0

Python 99.00% Shell 0.15% Cython 0.85%

icdar2015 total-text ctw1500 psenet

psenet's Introduction

My homepage: https://whai362.github.io/

psenet's People

Contributors

Stargazers

Watchers

Forkers

fendaq likehuaer daoyijushi zhuguangqiang cwbjyy zgsxwsdxg yuckfu tobechao caozhengquan dreadlord1984 yuanhang8605 rkshuai wuyunxiangwyx lijun20 tsing-cv cwfxcz higherwang deftruth xwb123 zhangjiekui ieee820 noirwinter xiaomaxiao ouya-bytes liuheng92 chiukin pzw520125 yisampi runauto barongeng blackarrow3542 jeffrey98-ai jstzwjr teresasun521 jacklongking happog 10183308 hbulaoma yiran-thu pkang2017 alwc holygen duanjiaqi hityzy1122 fengdashuai taowenleon shulga xiaoyubing billyzju stoneabc sanster jiangxiaoyan normalappler fanofjava interstate50 leo-xxx keyky bensonlp janzd xiangliu886 jhaiwang zhaoqj2016 fujingling xiaowen-ttkx dfayzur float123 zenozhouzhao ionvision huibinny xxfly hxf930620 yushenxiang zhenxing9968 shiyuan0806 lzd0825 xuliangfrdc xiaohuihuichao 15802662151 vincyqian kk52099 viig99 yuxianmeng 1252187392 aiplus2019 kts12345 utilefuzzball lucaslu1987 huziling ericdoug-qi zhengjiawen liuwenhaha wangjie8682 eglxiang sapjunior yifan-zhao lijian10086 lilianglaoding xieenze wuzuowuyou zhenxingsh

psenet's Issues

progressive scale expansion algorithm

Can you open source the progressive scale expansion algorithm alone？

how to deal the coords of skewed text and curved text datasets if they are trained together

How to deal the coords of skewed text and curved text datasets if they are trained together? Because there are many types of irregular texts in words. Make 4 pairs of coords of ctw1500 ? what about the inference of superfluous background ?
By the way, can you explain the coords of ctw1500 in detail ? thanks a lot.

在训练或者测试的时候是用什么方法把图片resize到640*640的？

看作者代码icdar2015_loader.py，发现其resize图片是用random_crop函数实现的，请问其原理是什么，这样resize合理么？
有没有大神告知一下，谢谢，万分感谢。

输出检测结果坐标问题

输出的坐标结果生成了txt文件，但是发现每个文本检测的结果的起始坐标不固定，有的是左下顶点开始，有的是右下顶点开始。请问这个TXT每一行的结果的起始顶点如何固定一下呢？

关于弯曲文本数据集的检测效果

您好，目前的代码中只看到了ICDAR2015数据集的导入、评估和测试，请问针对弯曲文本数据集（CTW-1500或者Total-Text）的要如何测试？

pre-trained models sharing on other cloud storage?

Hi,

I cannot access baidu yun. It would be much appreciated if anyone can share the models on dropbox, google drive or onedrive.

Thanks.

您好，现在代码有开源吗

ValueError: need more than 2 values to unpack

使用训练好的模型测试

CUDA_VISIBLE_DEVICES=0 python test_ctw1500.py --scale 1 --resume ./model/ctw1500_res50_pretrain_ic17.pth.tar

where does gt.zip come from, and any suggest of this result?

Train with ICDAR2015 Ch4 with 600 epoch and batch_size change to 32, the log result looks like this:
0.000010 0.394331 0.889751 0.852450
0.000010 0.416218 0.898702 0.863312
0.000010 0.406228 0.874345 0.837213
0.000010 0.378316 0.900977 0.864393

2.Testing found the result some intrersting labels like this:

Eval result is
Calculated!{"recall": 0.0, "precision": 0.0, "hmean": 0, "AP": 0}
Where is wrong, all is following with README, only change the batch_size from 16 to 32, and the train result fit your log recorder.

很棒的工作

请问,有代码复现什么的么?对这个工作 ,我很有兴趣...期待代码和论文进一步放出

终于等到你

About OHEM

Dear author,
It's a honor for me to read your work about Shape Robust Text Detection with Progressive Scale Expansion Network, which is an excellent work. However, I am a little confused about how to apply the OHEM in the task of segmentation as it is initially designed for detection.

Any plan to support python3?

Where is the code also more documentation

@whai362 where is the code for PSENet?
also include documentation on how to train.

Have to maintain high resolution to get good result.

Hi, We have implemented your method using tensorflow. We find that to get good result, we have to resize the image to very big size, so it’s not so efficient in practice. We now use your method to detect large angle long text, for normal text and horizontal long text lines, we have much faster method.

waiting for code!

坐等开源哦

asking for the prof of formula about generating of d

hello , i am confusing about your fomular for calculation of d , which use r*r . may i ask the relationship and prof of r and d?.thank you!

ctw1500加载

def get_bboxes(img, gt_path):
    h, w = img.shape[0:2]
    lines = util.io.read_lines(gt_path)
    bboxes = []
    tags = []
    for line in lines:
        line = util.str.remove_all(line, '\xef\xbb\xbf')
        gt = util.str.split(line, ',')

        x1 = np.int(gt[0])
        y1 = np.int(gt[1])

        bbox = [np.int(gt[i]) for i in range(4, 32)]
        bbox = np.asarray(bbox) + ([x1 * 1.0, y1 * 1.0] * 14)
        bbox = np.asarray(bbox) / ([w * 1.0, h * 1.0] * 14)
        
        bboxes.append(bbox)
        tags.append(True)
    return np.array(bboxes), tags

请问ctw-1500标注文件格式是什么，没有找到详细的解释，每行32个值，14个点是28个坐标值，那多出的4个值是什么

ls_loss的疑问

ignore the pixels of non-text region in the segmentation result Sn to avoid a certain
redundancy.

其中用到 Sn > 0.5 的参与计算，但是前期Sn应该预测不到结果，那么ls_loss岂不是为0？
是否有更多的细节？

fps

运行作者的test_ic15.py，fps只有0.65，我的GPU是k40c 12G的。请问怎么提高速度呢？

有关adaptor.so: undefined symbol: _Py_ZeroStruct的错误

您好，用makefile编译完adaptor.so后，调用会报错“undefined symbol: _Py_ZeroStruct”，是怎么回事呀？
你们有遇到吗？
谢谢

网络输出

网络输出最后为什么不适用sigmoid 而是使用的outputs = (torch.sign(outputs - args.binary_th) + 1) / 2呢

about icdar2015

hi, i want to know your result in icdar2015 used fine-tune on other dataset,because myself result is 76% only training on icdar2015

Training Data used

Thanks for sharing your work,The model link that is posted, was that model trained only on Icdar2015 or is it pre-trained on Imagenet/Synthtext.

Thanks in advance.

Is fpn of great importance?

I find the fpn can't improve results，is it right

what is the f-score about rctw2017

what is the f-score about rctw2017, can you tell us some details?

model

您好，模型链接都可以打开和下载，百度云和OneDrive的都可以，但是下载之后文件是损坏的，解压报错。我和其他的人的电脑都试了，都是这样的。 @whai362

ERRROR: PyUnicodeUCS2_AsUTF8String

除了重新编译python，不知有没有其他解决方法？
Traceback (most recent call last):
File "/PSENet/test_ic15.py", line 19, in
from pse import pse
File "/PSENet/pse/init.py", line 11, in
from .adaptor import pse as cpse
ImportError: */PSENet/pse/adaptor.so: undefined symbol: PyUnicodeUCS2_AsUTF8String
environment:
conda
Python 2.7.13

the data of mlt2017 used in pretrained

the trained model trained on icdar2015 using pretrained model on mlt2017, can not detect chinese words. the pretrained trained models on mlt2017 didn't use the chinese datasets? which datasets of mlt2017 used in pretraining? thanks.

About the setting of your experiment

I use the resnet152 as backbone, and the batch size is 16x3x640x640 which is advised by the paper. I use NVIDIA k40 whose memory is 12G. But it raised "Out of Memory". I see that your experiment is based on the NVIDIA 1080TI whose memory is only 11g. Can you provide some details about the settings of the experiment?

what's the mean of traing_mask?

hi, can you tell me what's the traning_mask?

About pretraining on ICDAR2017

Dear Author:
Thanks for the release of code!
However, I'm a little confused about the pretraining on ICDAR2017, which is mentioned in the list of results in github. It is a little different from the setting in the paper in which ICDAR2017 and ICDAR2015 are mixed as a whole training data. Can you provide more training details(epoch, lr, lr_scheluder and so on) about pretaining precess of ICDAR2017? Thanks!
Best Wishes!

为什么在测试的时候要把图片resize到那么大呀，2000多。。

不是特别理解，还请大佬指导，明明训练的时候根本没这么大的图- -

is my result right ?

resnet50 1s no pre-train，author provide weight ic_2015
Calculated!{"recall": 0.7881559942224362, "precision": 0.8309644670050761, "hmean": 0.80899431677786, "AP": 0}

data augmentation

在论文中，提到了
，请问这里是直接从图片中crop出640*640的图片，还是向east那样随机 crop一个区域然后进行宽高直接resize到640？

Is resnet 50 used as backbone in ICPR MTWI 2018 Challenge 2? or other

in ICPR MTWI 2018 Challenge 2, your result is F = 75.2， which do you select as backbone, resnet50,resnet101 or resnet152?

111with n times or 11n？

Hi, I want to know the output score map is 1x1x1 with n times or 1x1xn , look forward to your reply.

模型链接都失效了，可否更新下？

Pretrained model "resnet101"

when train with option --arch='resnet101',error raised
File "/root/data/workspace/PSENet/models/fpn_resnet.py", line 477, in resnet101
pretrained_model = model_zoo.load_url(model_urls['resnet101'])
File "/usr/local/lib/python2.7/dist-packages/torch/utils/model_zoo.py", line 65, in load_url
hash_prefix = HASH_REGEX.search(filename).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Is the link invalid?

license?

please add a proper license file so the code can be reused legally. MIT, Apache 2 or 3-clause BSD seem to be the most popular choices.

(also you added code from wkentaro/pytorch-fcn which is MIT - so maybe MIT it is?)

关于训练速度的问题

GPU为1080x2，batchsize=10，num_work=0，用大概三万张图像的数据集来训练，速度感人，根据上面显示的时间，训练400轮需要166天左右。请问这属于正常情况吗？或者有什么方法可以提速。

about icdar2017

Could you tell me the detail of your result in icdar2017?

Label Generation

作者你好，请问能不能提供Label Generation的代码？

code

When to upload code

前五层结果不对

你好，我使用n=6，在训练过程中发现前五层的结果不如最后一层结果的好，目前我迭代了20个epoch，是不是还需要继续迭代还是说Ls的损失写的有问题啊，Ls中的W一定要有吗？

[Question] Will the source code be available ?

Hello sir @whai362
This repository is 5 months old and even without any source code inside it still get tons of stars.
So when will you update the source code ?

A mistake about scale

in test_ic15.py
line 143,scale = (org_img.shape[0] * 1.0 / pred.shape[0], org_img.shape[1] * 1.0 / pred.shape[1])
it maybe wrong?
shape[0]->h
shape[1]->w
so, I think it is
scale = (org_img.shape[1] * 1.0 / pred.shape[1], org_img.shape[0] * 1.0 / pred.shape[0]) @whai362