hotaekhan / sstdnet Goto Github PK
View Code? Open in Web Editor NEWImplement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'
Hi, I have a question about code in loss.py. Why do you exclude the background, and only use the object labels, when computing focal loss?
Hi, HotaekHan, thanks for sharing the code.
I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)
(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?
(2) The original paper uses deconv33, conv11 to eastablish attention map. I see that you're using deconv1616 and two conv33 to do it. Does it mean that this implementation is better than that in the original paper?
It's a very nice code and I really appretite your comment!
Thanks
How to prepare training data? After I run python3 datagen.py
, errors happens
Traceback (most recent call last):
File "datagen.py", line 540, in <module>
test()
File "datagen.py", line 531, in test
for images, loc_targets, cls_targets, mask_targets in dataloader:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 310, in __iter__
return DataLoaderIter(self)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 180, in __init__
self._put_indices()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
indices = next(self.sample_iter, None)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 119, in __iter__
for idx in self.sampler:
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 50, in __iter__
return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184
Thanks!
so nice to share the code here.
I have a question, the text bounding box may be incline in one image. so to determine a inline bounding box, (xmin, ymin, xmax, ymax) is not enough, for example, we may need three points to determine a bounding box. why here, you only use (xmin, ymin, xmax, ymax) for training labels?
thank!
thanks
I tested your code with image size 512, and is take a lot of time to decode.
Elapsed time of pred : 91.725ms
Decoding..
Elapsed time of decode : 114360.36300000001ms
Avg. elapsed time of pred : 153.09623809523805ms
Avg. elapsed time of decode : 65703.0309047619ms
I learned that NSM function will run slowly in image with many objects. How can i improve its performance.
Hi, I've downloaded a public dataset with annotation, and I've followed the instructions on README, but i'm not sure whether I can just proceed like that.
I see there is a resize function on datagen.py, does it mean I can include image with different sizes/rectangular image? Also, if there is a resize function, will the annotation be affected? Should I change it to relative value instead?
Thanks in advance!
Traceback (most recent call last):
File "train.py", line 192, in
train(epoch)
File "train.py", line 133, in train
loss = ((loc_loss + cls_loss) / num_matched_anchors) + mask_loss
RuntimeError: invalid argument 3: divide by zero at /pytorch/torch/lib/THC/generic/THCTensorMathPairwise.cu:88
The error occurs while training the model...how should i solve it?
Epoch: 0
Traceback (most recent call last):
File "train.py", line 194, in
train(epoch)
File "train.py", line 118, in train
for batch_idx, (inputs, loc_targets, cls_targets, mask_targets) in enumerate(trainloader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/xendity/SSTDNet/datagen.py", line 492, in collate_fn
loc_target, cls_target = self.data_encoder.encode(boxes[i], labels[i], input_size=(max_w,max_h))
File "/home/xendity/SSTDNet/encoder.py", line 92, in encode
anchor_boxes = self._get_anchor_boxes(input_size)
File "/home/xendity/SSTDNet/encoder.py", line 66, in _get_anchor_boxes
xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'
Hi, I ran train.py and got two or three type error like this. How should I modify the code?
I have a question about anchor_areas, the anchor_areas in encoder.py of your code is [1616., 3232., ..., 256*256.], and I want to know the reason you set them. I think they are correlated with feature maps, but I can't get the explicit relation.
The ori-paper works for text detection,but why this repo say “This code is work for general object detection problem. not for (oriented) text detection problem”?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.