stvir / pmtd Goto Github PK

View Code? Open in Web Editor NEW

215.0 215.0 220.0 6.57 MB

Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.

pmtd's People

Stargazers

Watchers

Forkers

yuckfu wwwanghao zhangxujinsh julyliying wangkingkingking ronghanghu fuzzythecat sanster qingqingwang-1 hityzy1122 chenwwweixiang ustczhouyu gyxiaoh cqray1990 duanjiaqi jjprincess wjj19950828 happog zhengjiawen cuppersd xdwjc jacklongking pkq1688 wuxiaolianggit rkshuai liu100286 anxu829 hematank jiangyongyu1 jingwanli6666 hhgxx123 cuhk-hbsun wangweilai1 kapness simon-llong yudie433 cuimiao187561 zuokai lipond theworldofyu thomasyoung76 qutrino soldierofhell shengzhang90 hell-to-heaven zobeirraisi doublecake wenxian-yang zhaoqidetaiyang yohannayin yangchao0053 bobiace goodluckcwl yyjabiding 10183308 d-major banyueqin trendingtechnology yyphoebe chixma xiaojino hellbell lyc6749 pure-melo dexception houzi09 challenging6 dev-strender didiwey bubblesai wqinlong mythke prpankajsingh 980044579 shenshenzhanzhan cosen1024 xxlxx1 dfayzur fdy200533 hanwsf bingo-yang wind-l yjfncu maza11 jonbub-suk shufanwu brooks0519 gilbertbit shenhongjuan yeaf cylvlyl cuongdxk57 zack6514 garyfanhku xuzhongteng02 houhou0925 lizhaokun zbzoujianfa chadpieere kalupiu

pmtd's Issues

关于训练遇到的问题

作者您好，请问该如何进行训练呢，训练数据的格式是需要依据给出的generate_icdar2017.py文件进行转换吗，除此之外，直接执行train.py文件就好了吗？盼复，如有打扰请您见谅

why the output of network is none？

I used the code to train mydata， but I can gain 0 boxes when use the PMTD_demo.py . Can you tell me how to do ?

I have implement your pmtd ideas, but the result is not good enough.

Question about score threshold of Bbox Branch

Q1: When I set the score threshold to 0.05 as maskrcnn default, the precision was very low. Then I set the score threshold to 0.5, the F-measure matches the proposed score(88.20% on ICDAR 2015 test set), but the recall and the precision do not match the score on paper.

Method	Precision	Recall	F-Measure
Baseline of PMTD	85.84	90.55	88.14
Our Baseline	92.50	84.20	88.20

Q2: Have you do the ablation study on Data Augmentation, RPN Anchor and OHEM. In my experiments, Data Augmentation and OHEM improve the performance, but modification for RPN Anchor does not work.

About configurations

First, thank you for your kind paper and github page.
Your work is super useful for studying text detection using mask-rcnn baseline.
I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset)
So I'm figuring out what is wrong with my configuration.
It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64))
Thanks!

RuntimeError: invalid argument 2: non-empty 4D input tensor expected but got: [0 x 256 x 14 x 14]

On executing tools/test_net.py, I am getting a runtime error. I am using the default configurations with the pretrained model. When I increase the value of IMS_PER_BATCH, the error vanishes, however, the predictions that I obtain after this are highly incomplete, with most of the words not being detected.

File "tools/test_net.py", line 131, in
main()
File "tools/test_net.py", line 116, in main
output_folder=output_folder,
File "/home/pranav/PMTD/maskrcnn_benchmark/engine/inference.py", line 82, in inference
predictions = compute_on_dataset(model, data_loader, device, inference_timer)
File "/home/pranav/PMTD/maskrcnn_benchmark/engine/inference.py", line 28, in compute_on_dataset
output = model(images)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 39, in forward
x, detections, loss_mask = self.mask(mask_features, detections, targets)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py", line 71, in forward
mask_logits = self.predictor(x)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py", line 33, in forward
x = F.relu(self.conv5_mask(x))
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/container.py", line 97, in forward
input = module(input)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/upsampling.py", line 134, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/functional.py", line 2523, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: invalid argument 2: non-empty 4D input tensor expected but got: [0 x 256 x 14 x 14] at /opt/conda/conda-bld/pytorch-nightly_1553749764730/work/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:21

Question about Algorithm 1 Plane Clustering

Q1. Why Algorithm 1 's inputs is all segmentation result of a image( H*W points ), while its outputs is just only single one text bounding box ( 4 planes )

Q2. what's the detail about INITPLANES function? what parameters(A, B, D) is after calling the function ? I' cannot see from the paper.

Thanks !

Too many open files

demo

recommended configuration for a smaller batch size setting

Dear author,
Do you have the recommended configuration for a smaller batch size setting? I got NAN under the setting batch_size=36, LR=0.04, even when I use 1*binary_cross_entropy loss. When I reduce the LR to 0.004 or 0.001, the model seems not convergent well. I even tried Amsgrad optimizer with different LR.

By the way, I calculate the cropped text area via cv2.findContours(). Is it OK?

Train error

@JingChaoLiu Hello, I have a error when run PMTD_demo.py (--method="PlaneClustering"), but I modify --method = “ＨａｒｄＴｈｒｅｓｈｏｌｄ”, it ok. I dont kwon why. And I use the trained model by myself.

Traceback (most recent call last):
File "/home/donglin/projects/PMTD-inference/demo/PMTD_demo.py", line 104, in
main()
File "/home/donglin/projects/PMTD-inference/demo/PMTD_demo.py", line 84, in main
predictions = pmtd_demo.run_on_opencv_image(image)
File "/home/donglin/projects/PMTD-inference/demo/predictor.py", line 175, in run_on_opencv_image
predictions = self.compute_prediction(image)
File "/home/donglin/projects/PMTD-inference/demo/predictor.py", line 223, in compute_prediction
masks = self.masker.forward_single_image(masks, prediction)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 27, in forward_single_image
for mask, box in zip(masks, boxes.bbox)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 27, in
for mask, box in zip(masks, boxes.bbox)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 44, in reg_pyramid_in_image
planes = plane_clustering(pos_points, planes)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 87, in plane_clustering
A = torch.gels(B, X)[0][:3]
RuntimeError: Lapack Error in gels : The 1-th diagonal element of the triangular factor of A is zero at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/TH/generic/THTensorLapack.cpp:165

detectron2

@liuxuebo0 @JingChaoLiu
Have you considered upgrading to detectron2?
would there be big improvements?

Does anybody can share the train config file?

Thanks in advance!

请问可以商用吗？

请问可以商用吗？
https://github.com/jjprincess/PMTD

the repo is empty?

why can't i see the code in the repo? is there any wrong?

How to implement PMTD for ICDAR 2017 MLT?

I want to implement PMTD. But I didn't see the guide to train model PMTD. So, can anyone help me to solve this problem?

where is INSTALL.md? where is the code?

I just can't find the code or INDTALL.md, only readme

OHEM implementation?

Hello,
After read your paper, I have some question on your OHEM implementation.
you mean the OHEM is used on the RPN stage? Do you used it only on the RPN?
In my own understanding, you random sample from the RPN output, (maybe value N) and then put all the N proposals to calculate the sum loss, after get the loss, sorting, and choose Top 512 to update the network.
I dont know whether my understanding is right, ask for your help, thanks.

执行test_net.py文件报错

作者您好，按照您给的readme进行环境的配置，之后执行test_net.py文件报如下错误，请问我是否需要编译Lapack呢？

之前发布的code有没有训练代码?

@JingChaoLiu

Question about threshold of mask in baseline

I reviewed the code history and found the commit postprocess Mask by HardThreshold.

As far as I understand, this is supposed to be the baseline described in the paper, which I'm not quite sure though.

One thing I found a bit confusing for me is that the threshold for mask head (i.e. for Masker) is set as 0.01 here. Shouldn't it be 0.5 after applying sigmoid()?

I've noticed that you moved sigmoid() from post-process to predictor. However, I suppose that won't change values feeding into Masker, right? Also, I'd like to know why such a move with sigmoid() is necessary?

Looking forward to your reply! @JingChaoLiu @liuxuebo0

Question about crop step in Data augmentation

Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step？

where is the code?

RuntimeError: copy_if failed to synchronize: device-side assert triggered

@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?

Traceback (most recent call last):
  File "tools/train_net.py", line 186, in <module>
    main()
  File "tools/train_net.py", line 179, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 85, in train
    arguments,
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
    loss_dict = model(images, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 204, in new_fwd
    **applier(kwargs, input_caster))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 207, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 223, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
Traceback (most recent call last):
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 238, in <module>
    main()
  File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 234, in main
    cmd=process.args)

stvir / pmtd Goto Github PK

pmtd's People

Stargazers

Watchers

Forkers

pmtd's Issues

Recommend Projects

Recommend Topics

Recommend Org