jshilong / sepc Goto Github PK
View Code? Open in Web Editor NEWScale-equalizing Pyramid Convolution for object detection(CVPR2020)
License: Apache License 2.0
Scale-equalizing Pyramid Convolution for object detection(CVPR2020)
License: Apache License 2.0
I use PConv in Faster RCNN to train my custom dataset and the mAP is only about 20, but I can get 90 mAP using FPN only. In Table 4 of the paper, you have done a comparative experiment on faster RCNN. Is there any error in my code? Looking forward for your response, thank you! The configs as fllow:
`
neck=[dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs=True,
num_outs=5),
dict(
type='SEPC',
out_channels=256,
Pconv_num=4,
pconv_deform=False,
lcconv_deform=False,
iBN=False, # when open, please set imgs/gpu >= 4
)]
`
I change the sepc code to cater Faster RCNN as fllow:
`
@NECKS.register_module
class SEPC(nn.Module):
def __init__(self,
in_channels=[256] * 5,
out_channels=256,
num_outs=5,
pconv_deform=False,
lcconv_deform=False,
iBN=False,
Pconv_num=4,
):
super(SEPC, self).__init__()
assert isinstance(in_channels, list)
self.in_channels = in_channels
self.out_channels = out_channels
self.num_ins = len(in_channels)
self.num_outs = num_outs
assert num_outs == 5
self.fp16_enabled = False
self.iBN = iBN
self.Pconvs = nn.ModuleList()
for i in range(Pconv_num):
self.Pconvs.append(PConvModule(in_channels[i], out_channels, iBN=self.iBN, part_deform=pconv_deform))
self.lconv = sepc_conv(256, 256, kernel_size=3, dilation=1, part_deform=lcconv_deform)
# self.cconv = sepc_conv(256, 256, kernel_size=3, dilation=1, part_deform=lcconv_deform)
self.relu = nn.ReLU()
if self.iBN:
self.lbn = nn.BatchNorm2d(256)
# self.cbn = nn.BatchNorm2d(256)
self.init_weights()
# default init_weights for conv(msra) and norm in ConvModule
def init_weights(self):
# for str in ["l", "c"]:
for str in ["l"]:
m = getattr(self, str + "conv")
init.normal_(m.weight.data, 0, 0.01)
if m.bias is not None:
m.bias.data.zero_()
@auto_fp16()
def forward(self, inputs):
assert len(inputs) == len(self.in_channels)
x = inputs
for pconv in self.Pconvs:
x = pconv(x)
# cls = [self.cconv(level, item) for level, item in enumerate(x)]
loc = [self.lconv(level, item) for level, item in enumerate(x)]
if self.iBN:
# cls = iBN(cls, self.cbn)
loc = iBN(loc, self.lbn)
# outs = [[self.relu(s), self.relu(l)] for s, l in zip(cls, loc)]
outs = [self.relu(s) for s in loc]
return tuple(outs)
`
I want to get the same feature size as the original image. Can your pyramid network output it?Is there any code to extract and save the feature map?thank you
I saw this line in many config files: iBN=True, # when open, please set imgs/gpu >= 4
Is this related to model performance? E.g., if I set imgs/gpu = 2, is there gonna be a noticeable drop in performance?
Hi @jshilong ,
I am impressed by your comparison table. Thanks for publishing this repository.
I search your paper title but cannot find it.
How can I read your paper?
Thanks for your works!BUT I can not find SelfRetinaHead,only found SepcRetinaHead.Where is SelfRetinaHead?
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
@jshilong Thanks for your amazing works!
When I run the code, I can't find the SelfRetainaHead. Could you tell me the path? Thank you very much!
Great work!
Hello,do you have SEPC module based on PYTORCH?
when i add sepc to fcos,there is a problem that "TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not list",do you have the error?can you tell me how to solve it ? thank you !
Thanks for your great work and code.
I'm curious that how about using four stacked Pconv with shared parameters to further decrease computations and parameters. Have you ever tried?
When I only use PConv to fuse three levels of features (top level and bottom level fuse two levels of features), the AP of small objects improves, but the AP of large objects decreases, do you have any suggestions? (only use PConv without iBN and deform convolution)
when I train my own model using your configure file and test, error happens:
SEPC/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
TypeError: forward() missing 1 required positional argument: 'img_metas'
How to deal with this problem?
Many thanks.
Can sepc_freeanchor be trained with multiple GPUs?
when I run : ./dist_train.sh /home/ps/D/mmdetection/SEPC/sepc/exp/freeanchor/sepc_freeanchor.py 8
An exception was thrown:KeyError: 'SEPC is not in the neck registry'
How can this problem be solved?
Hi~
Thanks for your great work!
I wonder what is the RetinaNet baseline you use in your experiment. In the Table 1 of the paper, for R-50-1x, the baseline of RetinaNet is AP = 35.7, similar to the "Focal Loss for Dense Object Detection". But in mmdet toolbox, the Box AP for RetinaNet is 36+ for a long time.
So I want to make sure the RetinaNet baseline you use in your experiment.
Thanks!
Thank you for great work!
Do you try add ATSS head to your experiments. How it compare with FreeAnchor+SEPC (SelfFreeAnchorRetinaHead)
Thanks for the great work and code!
I want to understand the details of the SEPC so I check the code for the details of the new module. I found that in line37 of the spec.py the code is
self.lconv = spec_conv(256, 256, kernel_size=3, dilation=1, part_deform=lcconv_deform)
The padding value is seem not specialized here. So my question is whether the padding is zero in lconv and cconv? If it is true, padding=0 in lconv and cconv is important for SEPC?
I use the pretrained model"mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth" to test,but it tells me"ERROR: Found no prediction for ground truth /media/wxy/000F8E4B0002F751/SEPC/mmdetection/data/cityscapes/gtFine/test/berlin/berlin_000000_000019_gtFine_instanceIds.png"
the following text are cut from my linux terminal
python tools/test.py /media/wxy/000F8E4B0002F751/SEPC/mmdetection/configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py /media/wxy/000F8E4B0002F751/SEPC/mmdetection/configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth --eval=cityscapes
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1081/1081, 3.3 task/s, elapsed: 328s, ETA: 0s
Evaluating in Cityscapes style
[>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1081/1081, 3.2 task/s, elapsed: 340s, ETA: 0sEvaluating results under /tmp/tmprpi2chka/results ...
ERROR: Found no prediction for ground truth /media/wxy/000F8E4B0002F751/SEPC/mmdetection/data/cityscapes/gtFine/test/berlin/berlin_000000_000019_gtFine_instanceIds.png
evaluation = dict(interval=1, metric='mAP')
Does interval = 1 mean that each epoch is evaluated? Why was it not evaluated during training.
thanks。
Hi there, thanks for sharing the code, the work is very inspiring!
I have a problem when using Pconv in FPN. When I train in single scale, the result is quite normal, but when I train in multi-scale, for example, set following HTC config dict(
type='Resize',
img_scale=[(1600, 400), (1600, 1400)],
multiscale_mode='range',
keep_ratio=True),
the evaluation result is 0. Would you please share the multiscale training config?
Hello, thank you for sharing. I want to ask you if you test the configuration file settings of faster RCNN to add Pconv, as shown below.
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
style='pytorch'),
# neck=dict(
# type='FPN',
# in_channels=[256, 512, 1024, 2048],
# out_channels=256,
# num_outs=5),
neck=[dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs=True,
num_outs=5),
dict(
type='SEPC',
out_channels=256,
Pconv_num=4,
pconv_deform=False,
lcconv_deform=False,
iBN=False,
)
],
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='SharedFCBBoxHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))
Hi~
Thanks for the inpsiring work.
From the released code, the Pconv is employed after FPN is contructed, i.e., after p3~p3 are generated. So the Pconv is only used in the retinanet head. I'm curious that how about using pconv during the FPN construction. Have you ever try that?
Hi,
The open source code not contains the pyramid convolution? I only found the scale-equalizing pyramid convolution.
When I read your paper, the SEPC only used to replace head, why not use it to restructure FPN (In the paper, pyramid convolution used to restructure FPN).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.