stvir / pmtd Goto Github PK
View Code? Open in Web Editor NEWPyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
作者您好,请问该如何进行训练呢,训练数据的格式是需要依据给出的generate_icdar2017.py文件进行转换吗,除此之外,直接执行train.py文件就好了吗?盼复,如有打扰请您见谅
I used the code to train mydata, but I can gain 0 boxes when use the PMTD_demo.py . Can you tell me how to do ?
作者您好,请问该如何进行训练呢,训练数据的格式是需要依据给出的generate_icdar2017.py文件进行转换吗,除此之外,直接执行train.py文件就好了吗?盼复,如有打扰请您见谅
Q1: When I set the score threshold to 0.05 as maskrcnn default, the precision was very low. Then I set the score threshold to 0.5, the F-measure matches the proposed score(88.20% on ICDAR 2015 test set), but the recall and the precision do not match the score on paper.
Method | Precision | Recall | F-Measure |
---|---|---|---|
Baseline of PMTD | 85.84 | 90.55 | 88.14 |
Our Baseline | 92.50 | 84.20 | 88.20 |
Q2: Have you do the ablation study on Data Augmentation, RPN Anchor and OHEM. In my experiments, Data Augmentation and OHEM improve the performance, but modification for RPN Anchor does not work.
First, thank you for your kind paper and github page.
Your work is super useful for studying text detection using mask-rcnn baseline.
I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset)
So I'm figuring out what is wrong with my configuration.
It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64))
Thanks!
On executing tools/test_net.py, I am getting a runtime error. I am using the default configurations with the pretrained model. When I increase the value of IMS_PER_BATCH, the error vanishes, however, the predictions that I obtain after this are highly incomplete, with most of the words not being detected.
File "tools/test_net.py", line 131, in
main()
File "tools/test_net.py", line 116, in main
output_folder=output_folder,
File "/home/pranav/PMTD/maskrcnn_benchmark/engine/inference.py", line 82, in inference
predictions = compute_on_dataset(model, data_loader, device, inference_timer)
File "/home/pranav/PMTD/maskrcnn_benchmark/engine/inference.py", line 28, in compute_on_dataset
output = model(images)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 39, in forward
x, detections, loss_mask = self.mask(mask_features, detections, targets)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py", line 71, in forward
mask_logits = self.predictor(x)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/pranav/PMTD/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py", line 33, in forward
x = F.relu(self.conv5_mask(x))
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/container.py", line 97, in forward
input = module(input)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/upsampling.py", line 134, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners)
File "/tmp/yes/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/functional.py", line 2523, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: invalid argument 2: non-empty 4D input tensor expected but got: [0 x 256 x 14 x 14] at /opt/conda/conda-bld/pytorch-nightly_1553749764730/work/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:21
Q1. Why Algorithm 1 's inputs is all segmentation result of a image( H*W points ), while its outputs is just only single one text bounding box ( 4 planes )
Q2. what's the detail about INITPLANES function? what parameters(A, B, D) is after calling the function ? I' cannot see from the paper.
Thanks !
Dear author,
Do you have the recommended configuration for a smaller batch size setting? I got NAN under the setting batch_size=36, LR=0.04, even when I use 1*binary_cross_entropy loss. When I reduce the LR to 0.004 or 0.001, the model seems not convergent well. I even tried Amsgrad optimizer with different LR.
By the way, I calculate the cropped text area via cv2.findContours(). Is it OK?
@JingChaoLiu Hello, I have a error when run PMTD_demo.py (--method="PlaneClustering"), but I modify --method = “HardThreshold”, it ok. I dont kwon why. And I use the trained model by myself.
Traceback (most recent call last):
File "/home/donglin/projects/PMTD-inference/demo/PMTD_demo.py", line 104, in
main()
File "/home/donglin/projects/PMTD-inference/demo/PMTD_demo.py", line 84, in main
predictions = pmtd_demo.run_on_opencv_image(image)
File "/home/donglin/projects/PMTD-inference/demo/predictor.py", line 175, in run_on_opencv_image
predictions = self.compute_prediction(image)
File "/home/donglin/projects/PMTD-inference/demo/predictor.py", line 223, in compute_prediction
masks = self.masker.forward_single_image(masks, prediction)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 27, in forward_single_image
for mask, box in zip(masks, boxes.bbox)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 27, in
for mask, box in zip(masks, boxes.bbox)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 44, in reg_pyramid_in_image
planes = plane_clustering(pos_points, planes)
File "/home/donglin/projects/PMTD-inference/demo/inference.py", line 87, in plane_clustering
A = torch.gels(B, X)[0][:3]
RuntimeError: Lapack Error in gels : The 1-th diagonal element of the triangular factor of A is zero at /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/TH/generic/THTensorLapack.cpp:165
@liuxuebo0 @JingChaoLiu
Have you considered upgrading to detectron2?
would there be big improvements?
Thanks in advance!
请问可以商用吗?
https://github.com/jjprincess/PMTD
I want to implement PMTD. But I didn't see the guide to train model PMTD. So, can anyone help me to solve this problem?
I just can't find the code or INDTALL.md, only readme
Hello,
After read your paper, I have some question on your OHEM implementation.
you mean the OHEM is used on the RPN stage? Do you used it only on the RPN?
In my own understanding, you random sample from the RPN output, (maybe value N) and then put all the N proposals to calculate the sum loss, after get the loss, sorting, and choose Top 512 to update the network.
I dont know whether my understanding is right, ask for your help, thanks.
I reviewed the code history and found the commit postprocess Mask by HardThreshold.
As far as I understand, this is supposed to be the baseline described in the paper, which I'm not quite sure though.
One thing I found a bit confusing for me is that the threshold for mask head (i.e. for Masker) is set as 0.01 here. Shouldn't it be 0.5 after applying sigmoid()?
I've noticed that you moved sigmoid() from post-process to predictor. However, I suppose that won't change values feeding into Masker, right? Also, I'd like to know why such a move with sigmoid() is necessary?
Looking forward to your reply! @JingChaoLiu @liuxuebo0
Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?
@JingChaoLiu @liuxuebo0 Hello, When I always occurs the problem as follow, I don't know the reason? Someone says that learning rate is large, but what learning rate is ok? Could you give me a solution?
Traceback (most recent call last):
File "tools/train_net.py", line 186, in <module>
main()
File "tools/train_net.py", line 179, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 85, in train
arguments,
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/engine/trainer.py", line 75, in do_train
loss_dict = model(images, targets)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 367, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 204, in new_fwd
**applier(kwargs, input_caster))
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 207, in forward
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/rpn.py", line 223, in _forward_train
anchors, objectness, rpn_box_regression, targets
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/modeling/rpn/inference.py", line 115, in forward_for_single_feature_map
boxlist = remove_small_boxes(boxlist, self.min_size)
File "/home/donglin/INSTALL_DIR/PMTD-inference/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
(ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
terminate called without an active exception
Traceback (most recent call last):
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 238, in <module>
main()
File "/home/donglin/anaconda2/envs/maskrcnn/lib/python3.7/site-packages/torch/distributed/launch.py", line 234, in main
cmd=process.args)
Thank you for your greateful work, as the title asks, how can I calculate out five dedicated aspect ratios on own dataset? @JingChaoLiu
When will you release the codes? thanks
hi i want to use this model on my dataset and i using colab and succesfully installed all requirements.
but i dont know how to do the rest, can anyone help me?
Hello,@JingChaoLiu
I occured the problem about 'EOFError' when I train train.py. I can train without any error for much time (such as 24 hours), but after that , occurs problem as follow:
Surprisingly,After the problem occurs, I interrupt code, then I still python train_net.py for a period of time, but then error. Repeated appearance. Cycle.
I cannot find the code to create the soft mask label, can you give me some suggestions?
How can I get a good performance on four 11G GPUS ?
@JingChaoLiu @liuxuebo0 can you share documentation on installation instructions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.