jl749 / yolov3 Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 385 KB

yolov3 implementation in pytorch (https://arxiv.org/pdf/1804.02767.pdf)

Python 100.00%

yolov3's People

Contributors

Watchers

yolov3's Issues

little bit of history

https://learnopencv.com/mean-average-precision-map-object-detection-model-evaluation-metric/

pycocotool (torchmetric is also based on pycocotool)

https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L378

mAP = torch.mean(precision_curve)
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L439

yolov5 (based on https://github.com/rafaelpadilla/Object-Detection-Metrics)

https://github.com/rafaelpadilla/Object-Detection-Metrics/blob/e3f29579afef10e8057bda1beb6154a3f354287c/lib/Evaluator.py#L127
https://github.com/ultralytics/yolov5/blob/e808f2267d0164edb7bc45588c4fcda68c3dd8cb/utils/metrics.py#L64

mAP = torch.trapz(precision_curve, recall_curve)

YOLOv3 architecture

YOLOv3 makes prediction across 3 different scales (13x13, 26x26, 52x52)<-- in case of 416x416 input.

The detection layer is used to make prediction at feature maps of three different sizes, having strides 32, 16, 8

416/32 = 13
416/16 = 26
416/8 = 52

In total predicts ((52 x 52) + (26 x 26) + 13 x 13)) x 3 = 10647 bounding boxes

detection is done by using a 1x1 kernel on the feature maps

Yolov3 uses independent logistic classifiers in place of the softmax function to determine the class of an input image. It also replaces the mean squared error with the binary cross-entropy loss, in simpler terms, the probability of object in the image and the class predictions are done using logistic regression.

YOLO anchor box

https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html

YOLOv3/yolov3/config.py

Lines 47 to 51 in 2d7e204

 ANCHORS = [ 

 [(0.28, 0.22), (0.38, 0.48), (0.9, 0.78)], 

 [(0.07, 0.15), (0.15, 0.11), (0.14, 0.29)], 

 [(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)], 

 ] # [0, 1] normalized

YOLO limitation

each grid cell predicts only one object.

YOLOv3/yolov3/datasets/pascal_VOC.py

Lines 111 to 113 in 4f9c176

 assert targets[0][torch.where(targets[0][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 13, 13, 6) contains target labels 

 assert targets[1][torch.where(targets[1][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 26, 26, 6) contains target labels 

 assert targets[2][torch.where(targets[2][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 52, 52, 6) contains target labels

these 3 lines of code will throw assertion error.

if a grid cell (13x13, 26x26, 32x32) is already reserved for an object, new object cannot be assigned on it.
this is one of reason why YOLO is doing 3 scale predictions (even if objectA is missed on prediction one, prediction two or three can cover it)

YOLO dataset

pre-defined anchors (common obj ratio found via K-means)

anchors = tensor([[0.2800, 0.2200],  # pre defined
                  [0.3800, 0.4800],
                  [0.9000, 0.7800],
                  [0.0700, 0.1500],
                  [0.1500, 0.1100],
                  [0.1400, 0.2900],
                  [0.0200, 0.0300],
                  [0.0400, 0.0700],
                  [0.0800, 0.0600]])

label txt file

# class, x, y, w, h
8 0.764 0.6069277108433735 0.23600000000000002 0.3042168674698795
8 0.594 0.6159638554216867 0.188 0.29819277108433734
14 0.229 0.6445783132530121 0.166 0.45180722891566266
14 0.39 0.6430722891566265 0.168 0.4307228915662651
14 0.5650000000000001 0.5918674698795181 0.154 0.41867469879518077
14 0.787 0.5963855421686747 0.166 0.3855421686746988

all x, y, w, h scales are within 0~1 range (normalized)

VOCDataset

VOCDataset: torch.utils.data.Dataset overrides __getitem__ method to adjust label.txt x, y, w, h values and return appropriate scale's cell-relative coordinates (e.g. [0.9320, 0.4223, 3.0680, 2.6239] is the relative coor to the first anchor box in scale 0)

RETURN --> img:(C, W, H) && expected_bbox_info:( (3, 13, 13, 6), (3, 26, 26, 6), (3, 52, 52, 6) )

loop expected bboxes (txt file):
    coor_from_txt = [0.764, 0.6069277108433735, 0.23600000000000002, 0.3042168674698795]
    IoU_wh(coor_from_txt[2:4],
                          anchors)  # calculate IoU with width and height
    IoU_arg_sorted = [0, 5, 1, 4, 3, 2, 8, 7, 6]  # coor_from_txt most likely to match the first anchor box in scale0
    anchor_indices = IoU_arg_sorted

    highest IoU anchor box ratio = (0.28, 0.22) <-- from index 0
    index 0 means --> anchor belongs to the first prediction (3, 13, 13, 6) && first anchor box out of three
    
    now RESCALE...

there are 3 scales S=(13, 26, 52)
scale0 = first prediction (3, 13, 13, 6) 3 anchor boxes each (obj_prob, x, y, w, h, class)
scale1 = second prediction (3, 26, 26, 6)
scale3 = third prediction (3, 52, 52, 6)

YOLOv3/yolov3/datasets/pascal_VOC.py

Lines 69 to 99 in a59f9f7

 for box in bboxes: # loop over expected bboxes 

 iou_anchors = iou(torch.tensor(box[2:4]), self.anchors) # iou between label boc and all the anchor box candidates 

 anchor_indices = iou_anchors.argsort(descending=True, dim=0) 

 x, y, width, height, class_label = box 

 has_anchor = [False] * 3 # each scale should have one anchor 

 for anchor_idx in anchor_indices: # highest IoU to lowest 

 scale_idx = anchor_idx // self.num_anchors_per_scale # idx // 3 --> let you know which scale you are looking at (small, medium. big) 

 anchor_on_scale = anchor_idx % self.num_anchors_per_scale # which anchor in that certain scale? 

 S = self.S[scale_idx] # 13, 26, 52 

 i, j = int(S * y), int(S * x) # which cell 

 anchor_taken = targets[scale_idx][anchor_on_scale, i, j, 0] # object prob 

 if not anchor_taken and not has_anchor[scale_idx]: # obj prob == 0 && has_anchor[scale_idx] == F 

 targets[scale_idx][anchor_on_scale, i, j, 0] = 1 # set object prob to 1 

 x_cell, y_cell = S * x - j, S * y - i # cell-wise x, y coordinate [0~1] 

 w_cell, h_cell = ( 

 width * S, 

 height * S, 

 ) # can be greater than 1 since it's relative to cell 

 box_coordinates = torch.tensor( 

 [x_cell, y_cell, w_cell, h_cell] 

 ) 

 targets[scale_idx][anchor_on_scale, i, j, 1:5] = box_coordinates # set x,y,w,h [0~1] [0~S] 

 targets[scale_idx][anchor_on_scale, i, j, 5] = int(class_label) 

 has_anchor[scale_idx] = True # highest obj marked in this scale move on to the next scale 

 elif not anchor_taken and iou_anchors[anchor_idx] > self.ignore_iou_thresh: # obj prob == 1 and IoU higher than threshold 

 targets[scale_idx][anchor_on_scale, i, j, 0] = -1 # ignore prediction 

 return image, tuple(targets) # img, ( (3, 13, 13, 6), (3, 26, 26, 6), (3, 52, 52, 6) )

FIX mAP calculation bug

YOLOv3/yolov3/utils/metric.py

Lines 66 to 128 in 496563b

 _device = pred_boxes[0].device 

 mAPs_per_class = [0] * num_classes 

 recalls_per_class = [0] * num_classes 

 precisions_per_class = [0] * num_classes 

 if all([True if tensor.nelement() == 0 else False for tensor in pred_boxes]): # no predictions 

 return mAPs_per_class, recalls_per_class, precisions_per_class 

 pred_boxes = [torch.cat([torch.tensor(i, device=_device).repeat(p.shape[0], 1), p], dim=1) for i, p in enumerate(pred_boxes)] # prepend prediction index 

 detections = torch.stack(list(chain.from_iterable(pred_boxes))) 

 detections = detections[torch.argsort(detections[:, 2], descending=True)] # sort by conf (descending) 

 ground_truths = torch.stack(list(chain.from_iterable(true_boxes))) 

 for c in range(num_classes): 

 # filter by class 

 detections_c = detections[detections[:, 1] == c] 

 ground_truths_c = ground_truths[ground_truths[:, -1] == c] 

 _label_counts_per_img = [torch.zeros(tb.shape[0], dtype=torch.bool) for tb in true_boxes] 

 TP = torch.zeros((len(detections_c)), dtype=torch.bool) 

 for i, pred in enumerate(detections_c): # for a single bbox (high conf --> low conf) 

 _img_idx = pred[0].long() 

 labels = true_boxes[_img_idx].to(_device) # compare labels and detections from the same img 

 if labels.shape[0] == 0: # empty label 

 continue 

 # find best matching GT label from iou_matrix 

 iou_matrix = torchvision.ops.box_iou(boxes1=labels[:, 0:4], boxes2=pred[3:7].unsqueeze(0)) 

 max_overlap, max_idx = torch.max(iou_matrix, dim=0) # (label_count, pred_count) --> (pred_count,), pred_count is always 1 

 if max_overlap.gt(iou_threshold) and not _label_counts_per_img[_img_idx][max_idx.item()].is_nonzero(): 

 TP[i] = True 

 _label_counts_per_img[_img_idx][max_idx.item()] = True # this GT label has been used 

 TP_cumsum = torch.cumsum(TP, dim=0) 

 FP_cumsum = torch.cumsum(~TP, dim=0) 

 _recalls = TP_cumsum / ground_truths_c.shape[0] # TP_cumsum / total_GT_boxes 

 _precisions = TP_cumsum / (TP_cumsum + FP_cumsum) # TP_cumsum / TP + FP 

 _precisions = torch.cat([torch.tensor([1]), _precisions]) # torch.tensor([0]) 

 _recalls = torch.cat([torch.tensor([0]), _recalls]) # _recalls[-2:-1] 

 # DBUGGING ===================================================================================================== 

 # import matplotlib.pyplot as plt 

 # plt.plot(_recalls, _precisions, c="blue") 

 # plt.xlabel("recall") 

 # plt.ylabel("precision") 

 # plt.title(f"mAP_{iou_threshold}") 

 # plt.grid(color="gray") 

 # plt.show() 

 # plt.close() 

 # ============================================================================================================== 

 mAP = torch.trapz(y=_precisions, x=_recalls) 

 mAPs_per_class[c] = mAP.item() 

 # TODO: make sure 

 precisions_per_class[c] = TP.sum().item() / TP.shape[0] # TP / TP + FP 

 recalls_per_class[c] = TP.sum().item() / ground_truths_c.shape[0] # TP / TP + FN 

 return mAPs_per_class, recalls_per_class, precisions_per_class

line 91 labels should be filtered by class too

train

train.csv contains 16551 samples | test.csv contains 4952 samples

BATCH_SIZE = 32
16551/32 ≓ 518 loops per epoch
4952/32 ≓ 155 loops per epoch

	ANCHORS = [
	[(0.28, 0.22), (0.38, 0.48), (0.9, 0.78)],
	[(0.07, 0.15), (0.15, 0.11), (0.14, 0.29)],
	[(0.02, 0.03), (0.04, 0.07), (0.08, 0.06)],
	] # [0, 1] normalized

	assert targets[0][torch.where(targets[0][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 13, 13, 6) contains target labels
	assert targets[1][torch.where(targets[1][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 26, 26, 6) contains target labels
	assert targets[2][torch.where(targets[2][..., 0] == 1)].shape[0] == len(bboxes) # make sure (3, 52, 52, 6) contains target labels

	for box in bboxes: # loop over expected bboxes
	iou_anchors = iou(torch.tensor(box[2:4]), self.anchors) # iou between label boc and all the anchor box candidates
	anchor_indices = iou_anchors.argsort(descending=True, dim=0)
	x, y, width, height, class_label = box
	has_anchor = [False] * 3 # each scale should have one anchor

	for anchor_idx in anchor_indices: # highest IoU to lowest
	scale_idx = anchor_idx // self.num_anchors_per_scale # idx // 3 --> let you know which scale you are looking at (small, medium. big)
	anchor_on_scale = anchor_idx % self.num_anchors_per_scale # which anchor in that certain scale?
	S = self.S[scale_idx] # 13, 26, 52
	i, j = int(S * y), int(S * x) # which cell
	anchor_taken = targets[scale_idx][anchor_on_scale, i, j, 0] # object prob
	if not anchor_taken and not has_anchor[scale_idx]: # obj prob == 0 && has_anchor[scale_idx] == F
	targets[scale_idx][anchor_on_scale, i, j, 0] = 1 # set object prob to 1

	x_cell, y_cell = S * x - j, S * y - i # cell-wise x, y coordinate [0~1]
	w_cell, h_cell = (
	width * S,
	height * S,
	) # can be greater than 1 since it's relative to cell
	box_coordinates = torch.tensor(
	[x_cell, y_cell, w_cell, h_cell]
	)
	targets[scale_idx][anchor_on_scale, i, j, 1:5] = box_coordinates # set x,y,w,h [0~1] [0~S]
	targets[scale_idx][anchor_on_scale, i, j, 5] = int(class_label)
	has_anchor[scale_idx] = True # highest obj marked in this scale move on to the next scale

	elif not anchor_taken and iou_anchors[anchor_idx] > self.ignore_iou_thresh: # obj prob == 1 and IoU higher than threshold
	targets[scale_idx][anchor_on_scale, i, j, 0] = -1 # ignore prediction

	return image, tuple(targets) # img, ( (3, 13, 13, 6), (3, 26, 26, 6), (3, 52, 52, 6) )

	_device = pred_boxes[0].device

	mAPs_per_class = [0] * num_classes
	recalls_per_class = [0] * num_classes
	precisions_per_class = [0] * num_classes

	if all([True if tensor.nelement() == 0 else False for tensor in pred_boxes]): # no predictions
	return mAPs_per_class, recalls_per_class, precisions_per_class

	pred_boxes = [torch.cat([torch.tensor(i, device=_device).repeat(p.shape[0], 1), p], dim=1) for i, p in enumerate(pred_boxes)] # prepend prediction index
	detections = torch.stack(list(chain.from_iterable(pred_boxes)))
	detections = detections[torch.argsort(detections[:, 2], descending=True)] # sort by conf (descending)

	ground_truths = torch.stack(list(chain.from_iterable(true_boxes)))

	for c in range(num_classes):
	# filter by class
	detections_c = detections[detections[:, 1] == c]
	ground_truths_c = ground_truths[ground_truths[:, -1] == c]

	_label_counts_per_img = [torch.zeros(tb.shape[0], dtype=torch.bool) for tb in true_boxes]
	TP = torch.zeros((len(detections_c)), dtype=torch.bool)

	for i, pred in enumerate(detections_c): # for a single bbox (high conf --> low conf)
	_img_idx = pred[0].long()
	labels = true_boxes[_img_idx].to(_device) # compare labels and detections from the same img
	if labels.shape[0] == 0: # empty label
	continue

	# find best matching GT label from iou_matrix
	iou_matrix = torchvision.ops.box_iou(boxes1=labels[:, 0:4], boxes2=pred[3:7].unsqueeze(0))
	max_overlap, max_idx = torch.max(iou_matrix, dim=0) # (label_count, pred_count) --> (pred_count,), pred_count is always 1

	if max_overlap.gt(iou_threshold) and not _label_counts_per_img[_img_idx][max_idx.item()].is_nonzero():
	TP[i] = True
	_label_counts_per_img[_img_idx][max_idx.item()] = True # this GT label has been used

	TP_cumsum = torch.cumsum(TP, dim=0)
	FP_cumsum = torch.cumsum(~TP, dim=0)
	_recalls = TP_cumsum / ground_truths_c.shape[0] # TP_cumsum / total_GT_boxes
	_precisions = TP_cumsum / (TP_cumsum + FP_cumsum) # TP_cumsum / TP + FP
	_precisions = torch.cat([torch.tensor([1]), _precisions]) # torch.tensor([0])
	_recalls = torch.cat([torch.tensor([0]), _recalls]) # _recalls[-2:-1]

	# DBUGGING =====================================================================================================
	# import matplotlib.pyplot as plt
	# plt.plot(_recalls, _precisions, c="blue")
	# plt.xlabel("recall")
	# plt.ylabel("precision")
	# plt.title(f"mAP_{iou_threshold}")
	# plt.grid(color="gray")
	# plt.show()
	# plt.close()
	# ==============================================================================================================

	mAP = torch.trapz(y=_precisions, x=_recalls)
	mAPs_per_class[c] = mAP.item()

	# TODO: make sure
	precisions_per_class[c] = TP.sum().item() / TP.shape[0] # TP / TP + FP
	recalls_per_class[c] = TP.sum().item() / ground_truths_c.shape[0] # TP / TP + FN

	return mAPs_per_class, recalls_per_class, precisions_per_class