Git Product home page Git Product logo

nerminsamet / houghnet Goto Github PK

View Code? Open in Web Editor NEW
173.0 9.0 16.0 1.98 MB

[ECCV-20] Official PyTorch implementation of HoughNet, a voting-based object detector.

Shell 1.88% Python 74.71% C++ 12.74% Cuda 8.77% C 0.81% Makefile 0.01% Cython 1.07%
object-detection deep-learning pytorch voting voting-classifier bottom-up-model hough-transform hough hough-transformation instance-segmentation

houghnet's Introduction

HoughNet: Integrating near and long-range evidence for bottom-up object detection

Official PyTorch implementation of HoughNet.

HoughNet: Integrating near and long-range evidence for bottom-up object detection,
Nermin Samet, Samet Hicsonmez, Emre Akbas,
ECCV 2020. (arXiv pre-print)

Extended HoughNet with new tasks.

HoughNet: Integrating near and long-range evidence for visual detection,
Nermin Samet, Samet Hicsonmez, Emre Akbas,
TPAMI 2022. (arXiv pre-print)

Updates

(August, 2022) Our extended paper is accepted to IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI).

(April, 2021) We extended HoughNet with other visual detection tasks: video object detection, instance segmentation, keypoint detection and 3D object detection.

  • Extended the voting idea to the temporal domain by developing a new video object detection method. Code is avaliable at HoughNet-VID repo.
  • Inspired from BlendMask, we extended HoughNet for instance segmentation. More details regarding training and network architecture are in the paper and supplementary material.
  • We showed the effectivenes of HoughNet for keypoint detection and 3D object detection.
  • We improved the source code of HoughNet by increasing its modularity and train speed.

More details can be found in arXiv pre-print.

Summary

Object detection methods typically rely on only local evidence. For example, to detect the mouse in the image below, only the features extracted at/around the mouse are used. In contrast, HoughNet is able to utilize long-range (i.e. far away) evidence, too. Below, on the right, the votes that support the detection of the mouse are shown: in addition to the local evidence, far away but semantically relevant objects, the two keyboards, vote for the mouse.

HoughNet is a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet achieves 46.4 AP (and 65.1 AP50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of HoughNet in another task, namely, "labels to photo" image generation by integrating the voting module to two different GAN models and showing that the accuracy is significantly improved in both cases.

Highlights

  • Hough voting idea is applied through a log-polar vote field to utilize short and long-range evidence in a deep learning model for generic object detection.
  • Our best single model achieves 46.4 AP on COCO test-dev.
  • HoughNet is effective for small objects (+2.5 AP points over the baseline).
  • We provide Hough voting as a module to be used in another works.
  • We provide COCO minitrain as a mini training set for COCO. It is useful for hyperparameter tuning and reducing the cost of ablation experiments. minitrain is strongly positively correlated with the performance of the same model trained on train2017. For experiments, object instance statistics and download please refer to COCO minitrain

A step-by-step animation of the voting process is provided here.

Object Detection Results on COCO val2017

Backbone AP / AP50 Multi-scale AP / AP50
Hourglass-104 43.0 / 62.2 46.1 / 64.6
ResNet-101 w DCN 37.2 / 56.5 41.5 / 61.5
ResNet-101 36.0 / 55.2 40.7 / 60.6

Instance Segmentation Results on COCO val2017

Model AP / AP50 Box AP / AP50
Baseline 27.2 / 46.4 33.9 / 51.3
HoughNet 28.4 / 48.0 35.0 / 52.9

2D Keypoint Detection Results on COCO val2017

Model AP / AP50 Box AP / AP50
Voting for Person Class. 56.9 / 81.6 50.1 / 71.4
Voting for Keypoint Est. 56.8 / 81.5 50.2 / 70.9
Voting for Both 56.9 / 81.6 50.4 / 71.7

All models could be found in Model zoo.

Installation

Please refer to INSTALL.md for installation instructions.

Evaluation and Training

For evaluation and training details please refer to GETTING_STARTED.md.

Acknowledgement

This work was supported by the AWS Cloud Credits for Research program and by the Scientific and Technological Research Council of Turkey (TUBITAK) through the project titled "Object Detection in Videos with Deep Neural Networks" (grant number 117E054). The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). We also thank the authors of CenterNet for their clean code and inspiring work.

License

HoughNet is released under the MIT License (refer to the LICENSE file for details). We developed HoughNet on top of CenterNet. Please refer to the License of CenterNet for more detail.

Citation

If you find HoughNet useful for your research, please cite our paper as follows.

N. Samet, S. Hicsonmez, E. Akbas, "HoughNet: Integrating near and long-range evidence for bottom-up object detection", In European Conference on Computer Vision (ECCV), 2020.

N. Samet, S. Hicsonmez, E. Akbas, "HoughNet: Integrating near and long-range evidence for visual detection", arXiv, 2021.

BibTeX entry:

@inproceedings{HoughNet,
  author = {Nermin Samet and Samet Hicsonmez and Emre Akbas},
  title = {HoughNet: Integrating near and long-range evidence for bottom-up object detection},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020},
}
@misc{HoughNet2021,
      title={HoughNet: Integrating near and long-range evidence for visual detection}, 
      author={Nermin Samet and Samet Hicsonmez and Emre Akbas},
      year={2021}, 
}

houghnet's People

Contributors

eakbas avatar nerminsamet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

houghnet's Issues

coco minitrain code

Hi,
Can you release the code which can generate the coco-minitrain dataset? I want to test my model on different proportions of COCO, but coco-minitrain only contain 20% images of COCO.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.