strongwolf / dw Goto Github PK

View Code? Open in Web Editor NEW

136.0 4.0 17.0 48 KB

A Dual Weighting Label Assignment Scheme for Object Detection

License: Apache License 2.0

Python 98.86% Shell 1.14%

object-detection label-assignment dw

dw's Introduction

A Dual Weighting Label Assignment Scheme for Object Detection

This repo hosts the code for implementing the DW, as presented in our CVPR 2022 paper.

Introduction

Label assignment (LA), which aims to assign each training sample a positive (pos) and a negative (neg) loss weight, plays an important role in object detection. Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight. Such a mechanism limits the learning capacity of detectors. In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately. We first identify the key influential factors of pos/neg weights by analyzing the evaluation metrics in object detection, and then design the pos and neg weighting functions based on them. Specifically, the pos weight of a sample is determined by the consistency degree between its classification and localization scores, while the neg weight is decomposed into two terms: the probability that it is a neg sample and its importance conditioned on being a neg sample. Such a weighting strategy offers greater flexibility to distinguish between important and less important samples, resulting in a more effective object detector. Equipped with the proposed DW method, a single FCOS-ResNet-50 detector can reach 41.5 mAP on COCO under 1x schedule, outperforming other existing LA methods. It consistently improves the baselines on COCO by a large margin under various backbones without bells and whistles.

Installation

This DW implementation is based on MMDetection. Therefore the installation is the same as original MMDetection.
Please check get_started.md for installation. Make sure the version of MMDetection is larger than 2.18.0.

Results and Models

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia RTX 3090 GPUs (2 images per GPU).

Backbone	Style	DCN	MS train	Box refine	Lr schd	box AP (val)	Download
R-50	pytorch	N	N	N	1x	41.5	model \| log
R-50	pytorch	N	N	Y	1x	42.1	model \| log
R-50	pytorch	N	Y	Y	2x	44.8	model \| log
R-50	pytorch	Y	Y	Y	2x	47.9	model \| log
R-101	pytorch	N	Y	N	2x	46.1	model \| log

Notes:

The MS-train maximum scale range is 1333x[480:960] (range mode) and the inference scale keeps 1333x800.
DCN means using DCNv2 in both backbone and head.

Inference

Assuming you have put the COCO dataset into data/coco/ and have downloaded the models into the weights/, you can now evaluate the models on the COCO val2017 split:

bash dist_test.sh configs/dw_r50_fpn_1x_coco.py weights/r50_1x.pth 8 --eval bbox

Training

The following command line will train dw_r50_fpn_1x_coco on 8 GPUs:

bash dist_train.sh configs/dw_r50_fpn_1x_coco.py 8 --work-dir weights/r50_1x

Citation

@inproceedings{shuai2022DW,
  title={A Dual Weighting Label Assignment Scheme for Object Detection},
  author={Li, Shuai and He, Chenhang and Li, Ruihuang and Zhang, Lei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

dw's People

Contributors

Stargazers

Watchers

Forkers

scott-mao liyangfan0 cv-ip jie311 jt623 serissa chisyliu icecream-blue-sky kyle-fang jshilong xduwsk yuangela dingyi-yao wcfhf20220924 gjtz jewelc92 llw111

dw's Issues

The question about that the pos weight of each anchor for each instance is normalized by the sum of all pos weights within the candidate bag.

Hello, I take the liberty to ask.
The pos weight of each anchor for each instance is normalized by the sum of all pos weights within the candidate bag.

p_pos_weight = (torch.exp(5*p_pos) * p_pos * center_prior_weights) / (torch.exp(3*p_pos) * p_pos * center_prior_weights).sum(0, keepdim=True).clamp(min=EPS)

In this code, why is the numerator 5*p_pos but the denominator 3*p_pos?

Expect dw to bring more gain in yolov6！

hi, we are interested in your work, and you are welcome to add dw's work to our yolov6 for even greater gains
We actually tried it on yolov6n using the dw you open sourced on fcos. But the effect is not ideal. When the box-refine branch is used, it will drop by 1.1map, and when the box-refine branch is not used, it will drop by 1.6map. This may be due to your special design for the fcos network, or the yolov6n network is too lightweight. So I expect you to introduce more targeted dw on yolov6 to improve the effect.

CUDA error: device-side assert triggered

When I train the DW in coco, I have an unexpected problem, as following:
CUDA error: device-side assert triggered and it happen when I train some epochs.
I think It happen in
loc_loss = F.binary_cross_entropy and cls_loss = F.binary_cross_entropy, but I can't figure out why it happens.
Can you get me some helps?

About `neg_metrics` -> `p_neg_weight`

Hi @strongwolf,

I have one more question about the code. Looks like in this line you use non one-to-one correspondence between gt_labels and num_classes. So in case if ambiguous anchor some values in neg_metrics will be simply ignored. And what is more strange the ignored value depends on the order of labels in gt_labels. Like here the value 2 is ignored in the resulting tensor:

>>> t = torch.tensor([[1]])
>>> t[[0, 0], [0, 0]] = torch.tensor([2, 3])
>>> t
tensor([[3]])

Do I understand it correctly?

Porting DW to target tracking

Hello, I want to port DW to the target tracking algorithm SiamCAR, there is a problem with the shape parameter gt_labels in the loss function, I don't know how to define this, what is his shape. In SiamCAR, gt_labels is just the cls variable as shown in the figure. It is now present in the CenterPrior class, as shown in the image

here is the error and the position when error occur

oom问题

您好，请问在训练中GPU显存占用会一直增加知道oom，这个问题怎么解决呢？

About focal loss

Hi,

Thanks for your interesting research. I have a question about focal loss in formula (12) of the paper. Looks like in the code you simply use F.binary_cross_entropy instead of focal loss. Am I right? And do you have any specific reasons for it?

During training, the loss_cls_neg does not reduce, but the results appear good

Hi, i train dw in my dataset, but find the loss_cls_neg does not reduce, but increases. However, the test results seem good.
Is it normal for this situation?
How does this loss work?
I would like to try if removing this loss and check the results again.

About bbox refinement

bbox_pred = self.deform_sampling(decoded_bbox_preds.contiguous(), reg_offset.contiguous())

bbox_pred = F.relu(bbox2distance(points, bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)).reshape(b, h, w, 4).permute(0, 3, 1, 2).contiguous())

first line you sample in coarse bbox pred , i think this is final pred, why you decode again in second line?

Are mean and sigma in CenterPrior learnable?

In the class CenterPrior, there are definitions as follows:

       self.mean = nn.Parameter(torch.zeros(num_classes, 2), requires_grad=False)
       self.sigma = nn.Parameter(torch.ones(num_classes, 2)+0.11, requires_grad=False)

So in DW, these two parameters do not need to be learned? Is it different from autoassign?

weight design

Did I not understand the paper? Why is the Loss function in the code very different from that in the paper?
cls_loss in the code is only obtained by p_pos_weight weight, but this is not the case in the paper.

At inference, a neg prediction in the ranking list will not affect the recall but decrease the precision.

Is that the opposite? From the recall formula and precision formula, it is neg prediction that impacts the recall.

运行出现错误

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

I have a question about wneg=1-wpos

Why use neg_avg_factor = (1 - p_neg_weight).sum() ?

In loss function:

neg_avg_factor = (1 - p_neg_weight).sum()
...
cls_neg_loss = sum(cls_neg_loss_list) / neg_avg_factor

Why not use neg_avg_factor = (p_neg_weight).sum() ?

Why not limit the regression range for each location of each FPN level?

In FCOS, each FPN level is responsible for predicting objects of different sizes. But in DW, each object is predicted by all FPN levels, why?

If the DW can be applied to softmax classifier?

In the code, DW use sigmoid classifier by default. And I apply it to logo detection(num of categories is 352）. However, I find that there are a lot of FP( same position, different category). I wonder if it is due the use of sigmoid classifier. How could change the DW to use softmax classifier?

Why to choose different exponentials during the calculation of pos weight ?

DW/dw_head.py

Line 178 in 27f50e5

 p_pos_weight = (torch.exp(5*p_pos) * p_pos * center_prior_weights) / (torch.exp(3*p_pos) * p_pos * center_prior_weights).sum(0, keepdim=True).clamp(min=EPS) 

Why are the exponentials in this formula 5 and 3 respectively ?

Thought it would be the same

Is there any explanation?

The difference beteen Autoassign and this DW

Thank you for your great work! When I read your paper and code, found that its idea and some implementations have some similiar with Autoassign,but in paper didn't compare them thoroughly.Can you explain for me the difference beteen them,thanks! @strongwolf

box refine

橙色点的坐标是怎么来的，比如左边那个橙点，中心点是（i,j），为什么橙点的坐标是（j+△yl, i-△l+△xl）,而不是（i-△l+△xl，j+△yl）？

where is the code of the Probability of being a Negative Sample when IoU between 0.5 and 0.95?

t = lambda x: 1/(0.5alpha-1)*xalpha - 1/(0.5**alpha-1)
It looks like, but it doesn't seem like

原来博主是基粉

The multi-scale setting

Thanks for your wonderful work.
I have a question.
Are all multi-scale results of DW in the paper with 1333x[480:960] setting?

Why the bbox_pred need to multiply with the stride?

hi @strongwolf
I have some problems with this part of the code.
If I understand correctly, the regressor of the fcos network is the distance from the center grid point to the four sides. This distance is the distance on the original image. You did not introduce stride information, including the use of distance to decode box in loss. But you're multiplying the regressor by the step size on line 7. So I don't understand why the regressor is multiplied by the stride here. In this case, decoded_bbox_pred is right? Or the deform_sampling need to change in this way?

`def forward_single(self, x, scale, stride):
b, c, h, w = x.shape
cls_score, bbox_pred, cls_feat, reg_feat = super().forward_single(x)
centerness = self.conv_centerness(reg_feat)
bbox_pred = scale(bbox_pred).float()
bbox_pred = F.relu(bbox_pred)
bbox_pred *= stride
if self.with_reg_refine:
reg_dist = bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
points = self.prior_generator.single_level_grid_priors((h,w), self.strides.index(stride), dtype=x.dtype, device=x.device)
points = points.repeat(b, 1)
decoded_bbox_preds = distance2bbox(points, reg_dist).reshape(b, h, w, 4).permute(0, 3, 1, 2)
reg_offset = self.reg_offset(reg_feat)
bbox_pred_d = bbox_pred / stride
reg_offset = torch.stack([reg_offset[:,0], reg_offset[:,1] - bbox_pred_d[:, 0],
reg_offset[:,2] - bbox_pred_d[:, 1], reg_offset[:,3],
reg_offset[:,4], reg_offset[:,5] + bbox_pred_d[:, 2],
reg_offset[:,6] + bbox_pred_d[:, 3], reg_offset[:,7],], 1)
bbox_pred = self.deform_sampling(decoded_bbox_preds.contiguous(), reg_offset.contiguous())
bbox_pred = F.relu(bbox2distance(points, bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)).reshape(b, h, w, 4).permute(0, 3, 1, 2).contiguous())

    return cls_score, bbox_pred, centerness`

how to understand this

if all the training samples are equally treated, there will be a misalignment between the two heads: the location with the highest category score is usually not the best position for regressing the object boundary.

some issues about code

Hi, @strongwolf , there are some questions:

In this line ，p_pos_weight is normalized with difference µ whose value are 5 and 3. I wonder if there are some insights?
As illustrated in the code, IoU score is represented as an expentional function of the reg loss. What is strangest to me is that loc_loss is further computed by a binary_cross_entropy loss even we already get reg_loss in this line. Could you further explicate this?
Thanks a lot.

What is the role of CenterPrior？

I'm a little confused. I want to know what the role of CenterPrior is. Could you give me a few more details? I'm very interested in this paper。

label assignment in overlap situation

Hi, I am really enlightened by your excellent work. Here is my question:
How to deal with the situation when an anchor appears in anchor bag of multiple gt boxes?
I have not checked the code yet and I will check it as soon as possible but I am looking forward to your insights.
Thanks a lot.

Objectness

Sorry to bother you, I can't understand the meaning the variant "objectness", could you explan it? I would appreciate it very much if you can answer it !

about reg_loss

loc_loss = F.binary_cross_entropy(p_loc, torch.ones_like(p_loc), reduction='none') actually is equivalent to 5 * reg_loss
but,in the paper ,loc_loss should use giou_loss, but reg_loss has already used the giou_loss ，so What does 5 stand for？

Top-k method for selecting candidate bags

You've mentioned the Top-k method for selecting candidate bags in your paper, but it seems that there is only soft center prior method in this repo.

How can I change this repo to use the Top-k method ?

And, if I use Top-k method to select candidate bag, the calculation will be much smaller, Isn't it? Since I only need to computer the weights of bboxes inside bags, the number of weights can be reduced from num_points * num_gts to num_points. Is that so?

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

About the version of mmcv

I used mmcv==2.0.0 but it showed cannot import name 'Config' from 'mmcv'
Later I used mmcv==0.2.16 but it showed cannot import name 'DictAction' from 'mmcv'
Which version of mmcv should I use?

Can't see the train loss in the log, the code directly do validation process

Training in Custom Dataset

When I training in my custom dataset, - mmdet - INFO - Epoch [24][1350/1388] lr: 1.000e-04, eta: 0:00:57, time: 1.529, data_time: 0.023, memory: 7116, loss_cls_pos: 0.0950, loss_loc: 0.2806, loss_cls_neg: 0.1053, loss: 0.4809
the loss_cls_pos seems small than loss_cls_neg, how to only enlarge loss_cls_pos?

strongwolf / dw Goto Github PK

dw's Introduction

A Dual Weighting Label Assignment Scheme for Object Detection

Introduction

Installation

Results and Models

Inference

Training

Citation

dw's People

Contributors

Stargazers

Watchers

Forkers

dw's Issues

Welcome update to OpenMMLab 2.0

Recommend Projects

Recommend Topics

Recommend Org