Git Product home page Git Product logo

dw's Introduction

A Dual Weighting Label Assignment Scheme for Object Detection

This repo hosts the code for implementing the DW, as presented in our CVPR 2022 paper.

Introduction

Label assignment (LA), which aims to assign each training sample a positive (pos) and a negative (neg) loss weight, plays an important role in object detection. Existing LA methods mostly focus on the design of pos weighting function, while the neg weight is directly derived from the pos weight. Such a mechanism limits the learning capacity of detectors. In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately. We first identify the key influential factors of pos/neg weights by analyzing the evaluation metrics in object detection, and then design the pos and neg weighting functions based on them. Specifically, the pos weight of a sample is determined by the consistency degree between its classification and localization scores, while the neg weight is decomposed into two terms: the probability that it is a neg sample and its importance conditioned on being a neg sample. Such a weighting strategy offers greater flexibility to distinguish between important and less important samples, resulting in a more effective object detector. Equipped with the proposed DW method, a single FCOS-ResNet-50 detector can reach 41.5 mAP on COCO under 1x schedule, outperforming other existing LA methods. It consistently improves the baselines on COCO by a large margin under various backbones without bells and whistles.

Installation

  • This DW implementation is based on MMDetection. Therefore the installation is the same as original MMDetection.

  • Please check get_started.md for installation. Make sure the version of MMDetection is larger than 2.18.0.

Results and Models

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia RTX 3090 GPUs (2 images per GPU).

Backbone Style DCN MS
train
Box refine Lr
schd
box AP
(val)
    Download    
R-50 pytorch N N N 1x 41.5 model | log
R-50 pytorch N N Y 1x 42.1 model | log
R-50 pytorch N Y Y 2x 44.8 model | log
R-50 pytorch Y Y Y 2x 47.9 model | log
R-101 pytorch N Y N 2x 46.1 model | log

Notes:

  • The MS-train maximum scale range is 1333x[480:960] (range mode) and the inference scale keeps 1333x800.
  • DCN means using DCNv2 in both backbone and head.

Inference

Assuming you have put the COCO dataset into data/coco/ and have downloaded the models into the weights/, you can now evaluate the models on the COCO val2017 split:

bash dist_test.sh configs/dw_r50_fpn_1x_coco.py weights/r50_1x.pth 8 --eval bbox

Training

The following command line will train dw_r50_fpn_1x_coco on 8 GPUs:

bash dist_train.sh configs/dw_r50_fpn_1x_coco.py 8 --work-dir weights/r50_1x

Citation

@inproceedings{shuai2022DW,
  title={A Dual Weighting Label Assignment Scheme for Object Detection},
  author={Li, Shuai and He, Chenhang and Li, Ruihuang and Zhang, Lei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

dw's People

Contributors

strongwolf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dw's Issues

The question about that the pos weight of each anchor for each instance is normalized by the sum of all pos weights within the candidate bag.

Hello, I take the liberty to ask.
The pos weight of each anchor for each instance is normalized by the sum of all pos weights within the candidate bag.

p_pos_weight = (torch.exp(5*p_pos) * p_pos * center_prior_weights) / (torch.exp(3*p_pos) * p_pos * center_prior_weights).sum(0, keepdim=True).clamp(min=EPS)

In this code, why is the numerator 5*p_pos but the denominator 3*p_pos?

Expect dw to bring more gain in yolov6!

hi, we are interested in your work, and you are welcome to add dw's work to our yolov6 for even greater gainshttps://github.com/meituan/YOLOv6
We actually tried it on yolov6n using the dw you open sourced on fcos. But the effect is not ideal. When the box-refine branch is used, it will drop by 1.1map, and when the box-refine branch is not used, it will drop by 1.6map. This may be due to your special design for the fcos network, or the yolov6n network is too lightweight. So I expect you to introduce more targeted dw on yolov6 to improve the effect.

CUDA error: device-side assert triggered

When I train the DW in coco, I have an unexpected problem, as following:
CUDA error: device-side assert triggered and it happen when I train some epochs.
I think It happen in
loc_loss = F.binary_cross_entropy and cls_loss = F.binary_cross_entropy, but I can't figure out why it happens.
Can you get me some helps?

About `neg_metrics` -> `p_neg_weight`

Hi @strongwolf,

I have one more question about the code. Looks like in this line you use non one-to-one correspondence between gt_labels and num_classes. So in case if ambiguous anchor some values in neg_metrics will be simply ignored. And what is more strange the ignored value depends on the order of labels in gt_labels. Like here the value 2 is ignored in the resulting tensor:

>>> t = torch.tensor([[1]])
>>> t[[0, 0], [0, 0]] = torch.tensor([2, 3])
>>> t
tensor([[3]])

Do I understand it correctly?

Porting DW to target tracking

Hello, I want to port DW to the target tracking algorithm SiamCAR, there is a problem with the shape parameter gt_labels in the loss function, I don't know how to define this, what is his shape. In SiamCAR, gt_labels is just the cls variable as shown in the figure. It is now present in the CenterPrior class, as shown in the image
图片
here is the error and the position when error occur
图片

图片

oom问题

您好,请问在训练中GPU显存占用会一直增加知道oom,这个问题怎么解决呢?

About focal loss

Hi,

Thanks for your interesting research. I have a question about focal loss in formula (12) of the paper. Looks like in the code you simply use F.binary_cross_entropy instead of focal loss. Am I right? And do you have any specific reasons for it?

About bbox refinement

bbox_pred = self.deform_sampling(decoded_bbox_preds.contiguous(), reg_offset.contiguous())

bbox_pred = F.relu(bbox2distance(points, bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)).reshape(b, h, w, 4).permute(0, 3, 1, 2).contiguous())

first line you sample in coarse bbox pred , i think this is final pred, why you decode again in second line?

Are mean and sigma in CenterPrior learnable?

In the class CenterPrior, there are definitions as follows:

       self.mean = nn.Parameter(torch.zeros(num_classes, 2), requires_grad=False)
       self.sigma = nn.Parameter(torch.ones(num_classes, 2)+0.11, requires_grad=False)

So in DW, these two parameters do not need to be learned? Is it different from autoassign?

weight design

Did I not understand the paper? Why is the Loss function in the code very different from that in the paper?
cls_loss in the code is only obtained by p_pos_weight weight, but this is not the case in the paper.

运行出现错误

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:

If the DW can be applied to softmax classifier?

In the code, DW use sigmoid classifier by default. And I apply it to logo detection(num of categories is 352). However, I find that there are a lot of FP( same position, different category). I wonder if it is due the use of sigmoid classifier. How could change the DW to use softmax classifier?
alfaromeo5

box refine

1664331512893
橙色点的坐标是怎么来的,比如左边那个橙点,中心点是(i,j),为什么橙点的坐标是(j+△yl, i-△l+△xl),而不是(i-△l+△xl,j+△yl)?

The multi-scale setting

Thanks for your wonderful work.
I have a question.
Are all multi-scale results of DW in the paper with 1333x[480:960] setting?

Why the bbox_pred need to multiply with the stride?

hi @strongwolf
I have some problems with this part of the code.
If I understand correctly, the regressor of the fcos network is the distance from the center grid point to the four sides. This distance is the distance on the original image. You did not introduce stride information, including the use of distance to decode box in loss. But you're multiplying the regressor by the step size on line 7. So I don't understand why the regressor is multiplied by the stride here. In this case, decoded_bbox_pred is right? Or the deform_sampling need to change in this way?

`def forward_single(self, x, scale, stride):
b, c, h, w = x.shape
cls_score, bbox_pred, cls_feat, reg_feat = super().forward_single(x)
centerness = self.conv_centerness(reg_feat)
bbox_pred = scale(bbox_pred).float()
bbox_pred = F.relu(bbox_pred)
bbox_pred *= stride
if self.with_reg_refine:
reg_dist = bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)
points = self.prior_generator.single_level_grid_priors((h,w), self.strides.index(stride), dtype=x.dtype, device=x.device)
points = points.repeat(b, 1)
decoded_bbox_preds = distance2bbox(points, reg_dist).reshape(b, h, w, 4).permute(0, 3, 1, 2)
reg_offset = self.reg_offset(reg_feat)
bbox_pred_d = bbox_pred / stride
reg_offset = torch.stack([reg_offset[:,0], reg_offset[:,1] - bbox_pred_d[:, 0],
reg_offset[:,2] - bbox_pred_d[:, 1], reg_offset[:,3],
reg_offset[:,4], reg_offset[:,5] + bbox_pred_d[:, 2],
reg_offset[:,6] + bbox_pred_d[:, 3], reg_offset[:,7],], 1)
bbox_pred = self.deform_sampling(decoded_bbox_preds.contiguous(), reg_offset.contiguous())
bbox_pred = F.relu(bbox2distance(points, bbox_pred.permute(0, 2, 3, 1).reshape(-1, 4)).reshape(b, h, w, 4).permute(0, 3, 1, 2).contiguous())

    return cls_score, bbox_pred, centerness`

how to understand this

if all the training samples are equally treated, there will be a misalignment between the two heads: the location with the highest category score is usually not the best position for regressing the object boundary.

some issues about code

Hi, @strongwolf , there are some questions:

  1. In this line ,p_pos_weight is normalized with difference µ whose value are 5 and 3. I wonder if there are some insights?
  2. As illustrated in the code, IoU score is represented as an expentional function of the reg loss. What is strangest to me is that loc_loss is further computed by a binary_cross_entropy loss even we already get reg_loss in this line. Could you further explicate this?
    Thanks a lot.

What is the role of CenterPrior?

I'm a little confused. I want to know what the role of CenterPrior is. Could you give me a few more details? I'm very interested in this paper。

label assignment in overlap situation

Hi, I am really enlightened by your excellent work. Here is my question:
How to deal with the situation when an anchor appears in anchor bag of multiple gt boxes?
I have not checked the code yet and I will check it as soon as possible but I am looking forward to your insights.
Thanks a lot.

Objectness

Sorry to bother you, I can't understand the meaning the variant "objectness", could you explan it? I would appreciate it very much if you can answer it !

about reg_loss

loc_loss = F.binary_cross_entropy(p_loc, torch.ones_like(p_loc), reduction='none') actually is equivalent to 5 * reg_loss
but,in the paper ,loc_loss should use giou_loss, but reg_loss has already used the giou_loss ,so What does 5 stand for?

Top-k method for selecting candidate bags

You've mentioned the Top-k method for selecting candidate bags in your paper, but it seems that there is only soft center prior method in this repo.

How can I change this repo to use the Top-k method ?

And, if I use Top-k method to select candidate bag, the calculation will be much smaller, Isn't it? Since I only need to computer the weights of bboxes inside bags, the number of weights can be reduced from num_points * num_gts to num_points. Is that so?

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

About the version of mmcv

I used mmcv==2.0.0 but it showed cannot import name 'Config' from 'mmcv'
Later I used mmcv==0.2.16 but it showed cannot import name 'DictAction' from 'mmcv'
Which version of mmcv should I use?

Training in Custom Dataset

When I training in my custom dataset, - mmdet - INFO - Epoch [24][1350/1388] lr: 1.000e-04, eta: 0:00:57, time: 1.529, data_time: 0.023, memory: 7116, loss_cls_pos: 0.0950, loss_loc: 0.2806, loss_cls_neg: 0.1053, loss: 0.4809
the loss_cls_pos seems small than loss_cls_neg, how to only enlarge loss_cls_pos?

' DWHead is not in the models registry'"

KeyError: 'DWHead is not in the models registry', How to fix this error? shoud I copy dw_head.py to mmdet/models/dense_heads/ and add from .dw_head import DWHead into init.py? I have tried, and it works

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.