hotaekhan / sstdnet Goto Github PK

View Code? Open in Web Editor NEW

83.0 83.0 17.0 37 KB

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

Python 100.00%

sstdnet's People

Contributors

Stargazers

Watchers

Forkers

duke24k boragocode mingchen62 airyym zgsxwsdxg yuckfu ht-alchera 10183308 klqulei xiaoyubing aniketgurav chunyu-lin-bjtu jjdblast shlpu fireae ustczhouyu peternara

sstdnet's Issues

Focal loss

Hi, I have a question about code in loss.py. Why do you exclude the background, and only use the object labels, when computing focal loss?

SSTD net details problem

Hi, HotaekHan, thanks for sharing the code.

I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

(2) The original paper uses deconv33, conv11 to eastablish attention map. I see that you're using deconv1616 and two conv33 to do it. Does it mean that this implementation is better than that in the original paper?

It's a very nice code and I really appretite your comment!

Thanks

how to gen train data?

How to prepare training data? After I run python3 datagen.py, errors happens

Traceback (most recent call last):
  File "datagen.py", line 540, in <module>
    test()
  File "datagen.py", line 531, in test
    for images, loc_targets, cls_targets, mask_targets in dataloader:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 310, in __iter__
    return DataLoaderIter(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 180, in __init__
    self._put_indices()
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
    indices = next(self.sample_iter, None)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 119, in __iter__
    for idx in self.sampler:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 50, in __iter__
    return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184

Thanks!

why all the loss is nan

This code doesn't work for text detection?

training label

so nice to share the code here.
I have a question, the text bounding box may be incline in one image. so to determine a inline bounding box, (xmin, ymin, xmax, ymax) is not enough, for example, we may need three points to determine a bounding box. why here, you only use (xmin, ymin, xmax, ymax) for training labels?
thank!

How to use this？

could you upload your train directory and your trainned module

thanks

Decoding is very slow

I tested your code with image size 512, and is take a lot of time to decode.

Elapsed time of pred : 91.725ms
Decoding..
Elapsed time of decode : 114360.36300000001ms
Avg. elapsed time of pred : 153.09623809523805ms
Avg. elapsed time of decode : 65703.0309047619ms

I learned that NSM function will run slowly in image with many objects. How can i improve its performance.

Prepare dataset

Hi, I've downloaded a public dataset with annotation, and I've followed the instructions on README, but i'm not sure whether I can just proceed like that.
I see there is a resize function on datagen.py, does it mean I can include image with different sizes/rectangular image? Also, if there is a resize function, will the annotation be affected? Should I change it to relative value instead?

Thanks in advance!

Error while training

Traceback (most recent call last):
File "train.py", line 192, in
train(epoch)
File "train.py", line 133, in train
loss = ((loc_loss + cls_loss) / num_matched_anchors) + mask_loss
RuntimeError: invalid argument 3: divide by zero at /pytorch/torch/lib/THC/generic/THCTensorMathPairwise.cu:88

The error occurs while training the model...how should i solve it?

Type Error

Epoch: 0
Traceback (most recent call last):
File "train.py", line 194, in
train(epoch)
File "train.py", line 118, in train
for batch_idx, (inputs, loc_targets, cls_targets, mask_targets) in enumerate(trainloader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/xendity/SSTDNet/datagen.py", line 492, in collate_fn
loc_target, cls_target = self.data_encoder.encode(boxes[i], labels[i], input_size=(max_w,max_h))
File "/home/xendity/SSTDNet/encoder.py", line 92, in encode
anchor_boxes = self._get_anchor_boxes(input_size)
File "/home/xendity/SSTDNet/encoder.py", line 66, in _get_anchor_boxes
xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'

Hi, I ran train.py and got two or three type error like this. How should I modify the code?

anchor areas

I have a question about anchor_areas， the anchor_areas in encoder.py of your code is [1616., 3232., ..., 256*256.], and I want to know the reason you set them. I think they are correlated with feature maps, but I can't get the explicit relation.

about text detetion

The ori-paper works for text detection，but why this repo say “This code is work for general object detection problem. not for (oriented) text detection problem”？