Git Product home page Git Product logo

sku110k_cvpr19's Introduction

๐Ÿ†• CVPR 2020 challenge ๐Ÿ†•

SKU-110K

Eran Goldman* , Roei Herzig* , Aviv Eisenschtat*, Jacob Goldberger, Tal Hassner

Dataset and Codebase for CVPR2019 "Precise Detection in Densely Packed Scenes" [Paper link]

A typical image in our SKU-110K, showing densely packed objects. (a) Detection results for the state-of-the-art RetinaNet[2], showing incorrect and overlapping detections, especially for the dark objects at the bottom which are harder to separate. (b) Our results showing far fewer misdetections and better fitting bounding boxes. (c) Zoomed-in views for RetinaNet[2] and (d) our method.

Our novel contributions are:

  1. Soft-IoU layer, added to an object detector to estimate the Jaccard index between the detected box and the (unknown) ground truth box.
  2. EM-Merger unit, which converts detections and Soft-IoU scores into a MoG (Mixture of Gaussians), and resolves overlapping detections in packed scenes.
  3. A new dataset and benchmark, the store keeping unit, 110k categories (SKU-110K), for item detection in store shelf images from around the world.

Introduction

In our SKU-110K paper[1] we focus on detection in densely packed scenes, where images contain many objects, often looking similar or even identical, positioned in close proximity. These scenes are typically man-made, with examples including retail shelf displays, traffic, and urban landscape images. Despite the abundance of such environments, they are under-represented in existing object detection benchmarks, therefore, it is unsurprising that state-of-the-art object detectors are challenged by such images.

Method

We propose learning the Jaccard index with a soft Intersection over Union (Soft-IoU) network layer. This measure provides valuable information on the quality of detection boxes. Those detections can be represented as a Mixture of Gaussians (MoG), reflecting their locations and their Soft-IoU scores. Then, an Expectation-Maximization (EM) based method is then used to cluster these Gaussians into groups, resolving detection overlap conflicts.

System diagram: (a) Input image. (b) A base network, with bounding box (BB) and objectness (Obj.) heads, along with our novel Soft-IoU layer. (c) Our EM-Merger converts Soft-IoU to Gaussian heat-map representing (d) objects captured by multiple, overlapping bounding boxes. (e) It then analyzes these box clusters, producing a single detection per object

Dataset

We compare between key properties for related benchmarks. #Img.: Number of images. #Obj./img.: Average items per image. #Cls.: Number of object classes (more implies a harder detection problem due to greater appearance variations). #Cls./img.: Average classes per image. Dense: Are objects typically densely packed together, raising potential overlapping detection problems?. Idnt: Do images contain multiple identical objects or hard to separate object sub-regions?. BB: Bounding box labels available for measuring detection accuracy?.

The dataset is provided for the exclusive use by the recipient and solely for academic and non-commercial purposes.

CVPR 2020 Challenge

The detection challenge will be held in CVPR 2020 Retail-Vision workshop. Please visit our workshop page for more information. The data and evaluation code are available in the challenge page.

Qualitative Results

Qualitative detection results on SKU-110K.

Notes

Please note that the main part of the code has been released, though we are still testing it to fix possible glitches. Thank you.

This implementation is built on top of https://github.com/fizyr/keras-retinanet. The SKU110K dataset is provided in csv format compatible with the code CSV parser.

Dependencies include: keras, keras-resnet, six, scipy. Pillow, pandas, tensorflow-gpu, tqdm This repository requires Keras 2.2.4 or higher, and was tested using Python 3.6.5, Python 2.7.6 and OpenCV 3.1.

The output files will be saved under "$HOME"/Documents/SKU110K and have the same structure as in https://github.com/fizyr/keras-retinanet: The weight h5 files will are saved in the "snapshot" folder and the tensorboard log files are saved in the "logs" folder.

Note that we have made several upgrades to the baseline detector since the beginning of this research, so the latest version can actually achieve even higher results than the ones originally reported.

The EM-merger provided here is the stable version (not time-optimized). Some of the changes required for optimization are mentioned in the TO-DO comments.

Contributions to this project are welcome.

Usage

Move the unzipped SKU110K folder to "$HOME"/Documents

Set $PYTHONPATH to the repository root

e.g. from this repository: export PYTHONPATH=$(pwd)

train:

(1) Train the base model: python -u object_detector_retinanet/keras_retinanet/bin/train.py csv

(2) Train the IoU layer:

python -u object_detector_retinanet/keras_retinanet/bin/train_iou.py --weights WEIGHT_FILE csv where WEIGHT_FILE is the full path to the h5 file from step (1)

e.g.: python -u object_detector_retinanet/keras_retinanet/bin/train_iou.py --gpu 0 --weights "/home/ubuntu/Documents/SKU110K/snapshot/Thu_May__2_17:07:11_2019/resnet50_csv_10.h5" csv | tee train_iou_sku110k.log

(3) predict:

python -u object_detector_retinanet/keras_retinanet/bin/predict.py csv WEIGHT_FILE [--hard_score_rate=RATE] where WEIGHT_FILE is the full path to the h5 file from step (2), and 0<=RATE<=1 computes the confidence as a weighted average between soft and hard scores.

e.g: nohup env PYTHONPATH="/home/ubuntu/dev/SKU110K" python -u object_detector_retinanet/keras_retinanet/bin/predict.py --gpu 3 csv "/home/ubuntu/Documents/SKU110K/snapshot/Thu_May__2_17:10:30_2019/iou_resnet50_csv_07.h5" --hard_score_rate=0.5 | tee predict_sku110k.log

The results are saved in CSV format in the "results" folder and drawn in "res_images_iou" folder.

References

[1] Eran Goldman*, Roei Herzig*, Aviv Eisenschtat*, Jacob Goldberger, Tal Hassner, Precise Detection in Densely Packed Scenes, 2019.

[2] Tsung-Yi Lin, Priyal Goyal, Ross Girshick, Kaiming He, Piotr Dollar, Focal loss for dense object detection, 2018.

Citation

@inproceedings{goldman2019dense,
 author    = {Eran Goldman and Roei Herzig and Aviv Eisenschtat and Jacob Goldberger and Tal Hassner},
 title     = {Precise Detection in Densely Packed Scenes},
 booktitle = {Proc. Conf. Comput. Vision Pattern Recognition (CVPR)},
 year      = {2019}
}

sku110k_cvpr19's People

Contributors

roeiherz avatar eg4000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.