Git Product home page Git Product logo

yolov1's People

Contributors

jl749 avatar

Watchers

 avatar

yolov1's Issues

YOLO architecture

YOLO = single regression to predict (class probability + bounding boxes)

Pros

  • Fast and simple with high accuracy compared to any other real-time detection system
  • Network understands generalized object representation (e,g, trained on artwork dataset --> inference also works on the real images)
  • Hence, low FP (False-Positive, predicted Pos when actual is Neg)

Cons

  • Low accuracy especially on small objects
  • Cannot predict small objects clustered together
    (YOLO imposes strong spatial constraints on bounding box predictions since each grid cell only predicts two boxes and can only have one class. This spatial constraint limits the number of nearby objects that the model can predict. The model struggles with small objects that appear in groups, such as flocks of birds.)

Architecture

image
C = 20, S = 7, B = 2 video

Process

  1. S*S grid on the input image
  2. each grid cell contains B bounding boxes and for each bounding box contains a confidence score (0 in case of no obj)
    P(obj) * IOU(expected, pred) IOU = 0 ~ 1
    1 Bounding Box = (confidence, x, y, w, h) x, y = relative to the grid (0 ~ 1) ___ w, h = relative to the whole img (0 ~ 1)
    image
  3. each grid cell also contains C conditional class probability
    P(class _i_ | obj)
  4. (BATCH_SIZE, S, S, B*5+C)

Testing --> class specific confidence score
image

Inference

For each bounding boxes found --> confidence score * conditional class probability
P(obj) * IOU(expected, pred) * P(class _i_ | obj)
image

For every 98 class specific confidence score apply non-max suppression
image
--> set object class as well as bounding box location

dataset

use Pascal VOC dataset

example label txt file
[class_label, x, y, w, h]

11 0.27232142857142855 0.623 0.5386904761904762 0.306
14 0.8125 0.518 0.375 0.892
14 0.3214285714285714 0.447 0.369047619047619 0.5660000000000001
14 0.22767857142857142 0.6 0.3898809523809524 0.47600000000000003

x, y, w, h = 0~1 scaled position

train.csv, test.csv

000005.jpg,000005.txt
000007.jpg,000007.txt
000009.jpg,000009.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.