Git Product home page Git Product logo

yolov3's People

Contributors

jl749 avatar

Watchers

 avatar

yolov3's Issues

MeanAveragePrecision (mAP)

YOLOv3 architecture

image
image

YOLOv3 makes prediction across 3 different scales (13x13, 26x26, 52x52)<-- in case of 416x416 input.

The detection layer is used to make prediction at feature maps of three different sizes, having strides 32, 16, 8

416/32 = 13
416/16 = 26
416/8 = 52

In total predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10647 bounding boxes

detection is done by using a 1x1 kernel on the feature maps

Yolov3 uses independent logistic classifiers in place of the softmax function to determine the class of an input image. It also replaces the mean squared error with the binary cross-entropy loss, in simpler terms, the probability of object in the image and the class predictions are done using logistic regression.

more read

YOLO limitation

each grid cell predicts only one object.

assert targets[0][torch.where(targets[0][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 13, 13, 6) contains target labels
assert targets[1][torch.where(targets[1][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 26, 26, 6) contains target labels
assert targets[2][torch.where(targets[2][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 52, 52, 6) contains target labels

these 3 lines of code will throw assertion error.

if a grid cell (13x13, 26x26, 32x32) is already reserved for an object, new object cannot be assigned on it.
this is one of reason why YOLO is doing 3 scale predictions (even if objectA is missed on prediction one, prediction two or three can cover it)

YOLO dataset

pre-defined anchors (common obj ratio found via K-means)

anchors = tensor([[0.2800, 0.2200],  # pre defined
                  [0.3800, 0.4800],
                  [0.9000, 0.7800],
                  [0.0700, 0.1500],
                  [0.1500, 0.1100],
                  [0.1400, 0.2900],
                  [0.0200, 0.0300],
                  [0.0400, 0.0700],
                  [0.0800, 0.0600]])

label txt file

# class, x, y, w, h
8 0.764 0.6069277108433735 0.23600000000000002 0.3042168674698795
8 0.594 0.6159638554216867 0.188 0.29819277108433734
14 0.229 0.6445783132530121 0.166 0.45180722891566266
14 0.39 0.6430722891566265 0.168 0.4307228915662651
14 0.5650000000000001 0.5918674698795181 0.154 0.41867469879518077
14 0.787 0.5963855421686747 0.166 0.3855421686746988

all x, y, w, h scales are within 0~1 range (normalized)

VOCDataset

VOCDataset: torch.utils.data.Dataset overrides __getitem__ method to adjust label.txt x, y, w, h values and return appropriate scale's cell-relative coordinates (e.g. [0.9320, 0.4223, 3.0680, 2.6239] is the relative coor to the first anchor box in scale 0)

RETURN --> img:(C, W, H) && expected_bbox_info:( (3, 13, 13, 6), (3, 26, 26, 6), (3, 52, 52, 6) )

loop expected bboxes (txt file):
    coor_from_txt = [0.764, 0.6069277108433735, 0.23600000000000002, 0.3042168674698795]
    IoU_wh(coor_from_txt[2:4],
                          anchors)  # calculate IoU with width and height
    IoU_arg_sorted = [0, 5, 1, 4, 3, 2, 8, 7, 6]  # coor_from_txt most likely to match the first anchor box in scale0
    anchor_indices = IoU_arg_sorted

    highest IoU anchor box ratio = (0.28, 0.22) <-- from index 0
    index 0 means --> anchor belongs to the first prediction (3, 13, 13, 6) && first anchor box out of three
    
    now RESCALE...

image
there are 3 scales S=(13, 26, 52)
scale0 = first prediction (3, 13, 13, 6) 3 anchor boxes each (obj_prob, x, y, w, h, class)
scale1 = second prediction (3, 26, 26, 6)
scale3 = third prediction (3, 52, 52, 6)

for box in bboxes: # loop over expected bboxes
iou_anchors = iou(torch.tensor(box[2:4]), self.anchors) # iou between label boc and all the anchor box candidates
anchor_indices = iou_anchors.argsort(descending=True, dim=0)
x, y, width, height, class_label = box
has_anchor = [False] * 3 # each scale should have one anchor
for anchor_idx in anchor_indices: # highest IoU to lowest
scale_idx = anchor_idx // self.num_anchors_per_scale # idx // 3 --> let you know which scale you are looking at (small, medium. big)
anchor_on_scale = anchor_idx % self.num_anchors_per_scale # which anchor in that certain scale?
S = self.S[scale_idx] # 13, 26, 52
i, j = int(S * y), int(S * x) # which cell
anchor_taken = targets[scale_idx][anchor_on_scale, i, j, 0] # object prob
if not anchor_taken and not has_anchor[scale_idx]: # obj prob == 0 && has_anchor[scale_idx] == F
targets[scale_idx][anchor_on_scale, i, j, 0] = 1 # set object prob to 1
x_cell, y_cell = S * x - j, S * y - i # cell-wise x, y coordinate [0~1]
w_cell, h_cell = (
width * S,
height * S,
) # can be greater than 1 since it's relative to cell
box_coordinates = torch.tensor(
[x_cell, y_cell, w_cell, h_cell]
)
targets[scale_idx][anchor_on_scale, i, j, 1:5] = box_coordinates # set x,y,w,h [0~1] [0~S]
targets[scale_idx][anchor_on_scale, i, j, 5] = int(class_label)
has_anchor[scale_idx] = True # highest obj marked in this scale move on to the next scale
elif not anchor_taken and iou_anchors[anchor_idx] > self.ignore_iou_thresh: # obj prob == 1 and IoU higher than threshold
targets[scale_idx][anchor_on_scale, i, j, 0] = -1 # ignore prediction
return image, tuple(targets) # img, ( (3, 13, 13, 6), (3, 26, 26, 6), (3, 52, 52, 6) )

FIX mAP calculation bug

_device = pred_boxes[0].device
mAPs_per_class = [0] * num_classes
recalls_per_class = [0] * num_classes
precisions_per_class = [0] * num_classes
if all([True if tensor.nelement() == 0 else False for tensor in pred_boxes]): # no predictions
return mAPs_per_class, recalls_per_class, precisions_per_class
pred_boxes = [torch.cat([torch.tensor(i, device=_device).repeat(p.shape[0], 1), p], dim=1) for i, p in enumerate(pred_boxes)] # prepend prediction index
detections = torch.stack(list(chain.from_iterable(pred_boxes)))
detections = detections[torch.argsort(detections[:, 2], descending=True)] # sort by conf (descending)
ground_truths = torch.stack(list(chain.from_iterable(true_boxes)))
for c in range(num_classes):
# filter by class
detections_c = detections[detections[:, 1] == c]
ground_truths_c = ground_truths[ground_truths[:, -1] == c]
_label_counts_per_img = [torch.zeros(tb.shape[0], dtype=torch.bool) for tb in true_boxes]
TP = torch.zeros((len(detections_c)), dtype=torch.bool)
for i, pred in enumerate(detections_c): # for a single bbox (high conf --> low conf)
_img_idx = pred[0].long()
labels = true_boxes[_img_idx].to(_device) # compare labels and detections from the same img
if labels.shape[0] == 0: # empty label
continue
# find best matching GT label from iou_matrix
iou_matrix = torchvision.ops.box_iou(boxes1=labels[:, 0:4], boxes2=pred[3:7].unsqueeze(0))
max_overlap, max_idx = torch.max(iou_matrix, dim=0) # (label_count, pred_count) --> (pred_count,), pred_count is always 1
if max_overlap.gt(iou_threshold) and not _label_counts_per_img[_img_idx][max_idx.item()].is_nonzero():
TP[i] = True
_label_counts_per_img[_img_idx][max_idx.item()] = True # this GT label has been used
TP_cumsum = torch.cumsum(TP, dim=0)
FP_cumsum = torch.cumsum(~TP, dim=0)
_recalls = TP_cumsum / ground_truths_c.shape[0] # TP_cumsum / total_GT_boxes
_precisions = TP_cumsum / (TP_cumsum + FP_cumsum) # TP_cumsum / TP + FP
_precisions = torch.cat([torch.tensor([1]), _precisions]) # torch.tensor([0])
_recalls = torch.cat([torch.tensor([0]), _recalls]) # _recalls[-2:-1]
# DBUGGING =====================================================================================================
# import matplotlib.pyplot as plt
# plt.plot(_recalls, _precisions, c="blue")
# plt.xlabel("recall")
# plt.ylabel("precision")
# plt.title(f"mAP_{iou_threshold}")
# plt.grid(color="gray")
# plt.show()
# plt.close()
# ==============================================================================================================
mAP = torch.trapz(y=_precisions, x=_recalls)
mAPs_per_class[c] = mAP.item()
# TODO: make sure
precisions_per_class[c] = TP.sum().item() / TP.shape[0] # TP / TP + FP
recalls_per_class[c] = TP.sum().item() / ground_truths_c.shape[0] # TP / TP + FN
return mAPs_per_class, recalls_per_class, precisions_per_class

line 91 labels should be filtered by class too

train

train.csv contains 16551 samples | test.csv contains 4952 samples

BATCH_SIZE = 32
16551/32 โ‰“ 518 loops per epoch
4952/32 โ‰“ 155 loops per epoch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.