Git Product home page Git Product logo

sstdnet's People

Contributors

hotaekhan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sstdnet's Issues

Focal loss

Hi, I have a question about code in loss.py. Why do you exclude the background, and only use the object labels, when computing focal loss?

SSTD net details problem

Hi, HotaekHan, thanks for sharing the code.

I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

(2) The original paper uses deconv33, conv11 to eastablish attention map. I see that you're using deconv1616 and two conv33 to do it. Does it mean that this implementation is better than that in the original paper?

It's a very nice code and I really appretite your comment!

Thanks

how to gen train data?

How to prepare training data? After I run python3 datagen.py, errors happens

Traceback (most recent call last):
  File "datagen.py", line 540, in <module>
    test()
  File "datagen.py", line 531, in test
    for images, loc_targets, cls_targets, mask_targets in dataloader:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 310, in __iter__
    return DataLoaderIter(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 180, in __init__
    self._put_indices()
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 219, in _put_indices
    indices = next(self.sample_iter, None)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 119, in __iter__
    for idx in self.sampler:
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/sampler.py", line 50, in __iter__
    return iter(torch.randperm(len(self.data_source)).long())
RuntimeError: invalid argument 1: must be strictly positive at /pytorch/torch/lib/TH/generic/THTensorMath.c:2184

Thanks!

training label

so nice to share the code here.
I have a question, the text bounding box may be incline in one image. so to determine a inline bounding box, (xmin, ymin, xmax, ymax) is not enough, for example, we may need three points to determine a bounding box. why here, you only use (xmin, ymin, xmax, ymax) for training labels?
thank!

Decoding is very slow

I tested your code with image size 512, and is take a lot of time to decode.

Elapsed time of pred : 91.725ms
Decoding..
Elapsed time of decode : 114360.36300000001ms
Avg. elapsed time of pred : 153.09623809523805ms
Avg. elapsed time of decode : 65703.0309047619ms

I learned that NSM function will run slowly in image with many objects. How can i improve its performance.

Prepare dataset

Hi, I've downloaded a public dataset with annotation, and I've followed the instructions on README, but i'm not sure whether I can just proceed like that.
I see there is a resize function on datagen.py, does it mean I can include image with different sizes/rectangular image? Also, if there is a resize function, will the annotation be affected? Should I change it to relative value instead?

Thanks in advance!

Error while training

Traceback (most recent call last):
File "train.py", line 192, in
train(epoch)
File "train.py", line 133, in train
loss = ((loc_loss + cls_loss) / num_matched_anchors) + mask_loss
RuntimeError: invalid argument 3: divide by zero at /pytorch/torch/lib/THC/generic/THCTensorMathPairwise.cu:88

The error occurs while training the model...how should i solve it?

Type Error

Epoch: 0
Traceback (most recent call last):
File "train.py", line 194, in
train(epoch)
File "train.py", line 118, in train
for batch_idx, (inputs, loc_targets, cls_targets, mask_targets) in enumerate(trainloader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/xendity/SSTDNet/datagen.py", line 492, in collate_fn
loc_target, cls_target = self.data_encoder.encode(boxes[i], labels[i], input_size=(max_w,max_h))
File "/home/xendity/SSTDNet/encoder.py", line 92, in encode
anchor_boxes = self._get_anchor_boxes(input_size)
File "/home/xendity/SSTDNet/encoder.py", line 66, in _get_anchor_boxes
xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'

Hi, I ran train.py and got two or three type error like this. How should I modify the code?

anchor areas

I have a question about anchor_areas, the anchor_areas in encoder.py of your code is [1616., 3232., ..., 256*256.], and I want to know the reason you set them. I think they are correlated with feature maps, but I can't get the explicit relation.

about text detetion

The ori-paper works for text detection,but why this repo say “This code is work for general object detection problem. not for (oriented) text detection problem”?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.