wxinlong / solo Goto Github PK
View Code? Open in Web Editor NEWSOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.
License: Other
SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.
License: Other
Hello,
If my understanding is correct, I see the SOLOv2 only uses a single combined feature map (Fig3 in the paper), which is different from SOLOv1 that uses multi-scale FPN features.
Is the proposed dynamic head applied to this single combined feature map?
Thanks
如题
I trained decoupled_solo_light_dcn_release_r50_fpn_8gpu_3x.py on my own data with default config (only changed path to data and number of classes). When I tried to run the model on video, it gave 8-9 fps on average instead of 20 that was in the paper. What is the reason?
To calculate avg FPS I just ran inference_detector() for every frame in the video without any post-processing.
I use a picture to detect, the model is Decoupled_SOLO_Light_R50_3x and the default config. the inference time cost two much time,about 0.6 sec,why?
Hi Xinlong, I'm wondering if results in Table 1 of solov2 paper all trained with 6x, or only Mask R-CNN* is trained with 6x?
Hi! Xinlong, could you please share how long is the LVIS solov2 model trained? I suppose it is 6x, right? Thanks!
I don't understand how to put "two or more layers of CoordConv" in your network in Table4.
e.i. I want to know what's the position of another CoordConv in this network.
我不知道您在做table4的实验的时候CoordConv都加在那哪了,为什么可以加1个以上的CoordConv?
I use --eval bbox, but the result has no boxes, am I missing something ?
Hi! XinLong, i noted in #33, the point nms is only used to reduce memory usage, but when I try to remove it, the performance drop to near 0, could you please give some advice?
an oserror occurred when i run "python inference_demo.py":
Traceback (most recent call last):
File "inference_demo.py", line 12, in
model = init_detector(config_file, checkpoint_file, device='cuda:0')
File "/home/adt/Documents/alg/big-xing/SOLO/mmdet/apis/inference.py", line 38, in init_detector
checkpoint = load_checkpoint(model, checkpoint)
File "/home/adt/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 168, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ../checkpoints/DECOUPLED_SOLO_R50_3x.pth is not a checkpoint file
when i try run command:python demo/webcam_demo.py configs/solo/solo_r50_fpn_8gpu_3x.py checkpoints/SOLO_R50_3x.pth
i got the error :
Press "Esc", "q" or "Q" to exit.
Traceback (most recent call last):
File "demo/webcam_demo.py", line 47, in
main()
File "demo/webcam_demo.py", line 43, in main
show_result(img, result, model.CLASSES, score_thr=args.score_thr, wait_time=1)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in show_result
for i, bbox in enumerate(bbox_result)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in
for i, bbox in enumerate(bbox_result)
AttributeError: 'tuple' object has no attribute 'shape'
Environment
just follow the install.md : pip install -v -e .[all]
Hello,
I get the following runtime error.
The error can be reproduced by the following steps:
RuntimeError: cuda runtime error (98) : unrecognized error code at mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss_cuda.cu:128
loss_cate = self.loss_cate(flatten_cate_preds, flatten_cate_labels, avg_factor=num_ins + 1)
File "/home/user/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 79, in forward
avg_factor=avg_factor)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 37, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred, target, gamma, alpha)
File "/home/user/ssd2/solo_pano/mmdet/ops/sigmoid_focal_loss/sigmoid_focal_loss.py", line 19, in forward
gamma, alpha)
What might be the problem?
#######################################################
local 1 environment
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.105
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+56db9d2
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.1
########################################################
local 2 environment
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+2c951b9
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.0
I see the Cuda version is different between local 1 and local 2.
Can it be the reason?
hi, xinlong, does the code support verification in training? eg. set workflow = [('train', 1),('val', 1)] in config.py to enable validation in training? Thx!
HI~I am reading source code of SOLO and I am not sure what strides are in the following code and what role has he taken? I'd appreciate it if you could give me a hint
# process.
inds = (cate_preds > cfg.score_thr)
# category scores.
cate_scores = cate_preds[inds]
if len(cate_scores) == 0:
return None
# category labels.
inds = inds.nonzero()
cate_labels = inds[:, 1]
# strides.
size_trans = cate_labels.new_tensor(self.seg_num_grids).pow(2).cumsum(0)
strides = cate_scores.new_ones(size_trans[-1])
n_stage = len(self.seg_num_grids)
strides[:size_trans[0]] *= self.strides[0]
for ind_ in range(1, n_stage):
strides[size_trans[ind_ - 1]:size_trans[ind_]] *= self.strides[ind_]
strides = strides[inds[:, 0]]
# masks.
seg_preds = seg_preds[inds[:, 0]]
seg_masks = seg_preds > cfg.mask_thr
sum_masks = seg_masks.sum((1, 2)).float()
# filter.
keep = sum_masks > strides
if keep.sum() == 0:
return None`
top = max(top_box, coord_h-1) down = min(down_box, coord_h+1) left = max(coord_w-1, left_box) right = min(right_box, coord_w+1)
why don't directly set "top" to "coord_h-1", "down" to "coord_h+1"? There are still few positive samples.
I have a question about how the code of this repo is forwarding a batch.
If I am correctly understood, seems like it's taking out a single image related instances from batch, compute, and gather loss at the end.
Would the training time take a bit longer than forwarding the whole batch as one?
Is there a difference of performance if batch is forwarded as a whole by adding some zero padding?
Hi Xinlong, seems README does not support multi-gpu test?
When I run the train script, I found the input image's size is changing.
Step 12/12 : RUN pip install --no-cache-dir -e .
---> Running in 10680a3a2218
Obtaining file:///SOLO
ERROR: Command errored out with exit status 1:
command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SOLO/setup.py'"'"'; __file__='"'"'/SOLO/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
cwd: /SOLO/
Complete output (8 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/SOLO/setup.py", line 251, in <module>
sources=['src/compiling_info.cpp']),
File "/SOLO/setup.py", line 103, in make_cuda_ext
raise EnvironmentError('CUDA is required to compile MMDetection!')
OSError: CUDA is required to compile MMDetection!
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Hi Xinlong, I'm wondering if you could share what random seed is used for the 35.8AP Resnet50 model?
Hi! Xinlong, there are added codes lines for additional initialization of the convo
SOLO/mmdet/models/anchor_heads/solo_head.py
Lines 107 to 116 in d5398a0
I'm curious if the performance would drop a lot if those initialization are removed?
I train with 4 GPUs by the command
./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 4
But the process can not proceede after reaching
2020-05-02 15:26:02,261 - mmdet - INFO - load model from: torchvision://resnet50
However, when I train with 1 GPU on another machine, the resnet50 can be loaded properly.
Could you help me out?
模型最后加个1x,2x。是啥意思啊。
训练自己数据的时候遇到一些报错,调试发现不理解这个原理。
您可以简单讲解下吗?
2020-05-05 00:42:16,806 - mmdet - INFO - Start running, host: huangzhipeng@k8s-deploy-rod9ow-1567512049745-7b4474f8b7-b8k8k, work_dir: /nfs/project/huangzhipeng/tools/opensorce/SOLO/work_dirs/decoupled_solo_release_r50_fpn_8gpu_3x
2020-05-05 00:42:16,808 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "./tools/train.py", line 125, in
main()
File "./tools/train.py", line 121, in main
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 250, in dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 79, in batch_processor
loss, log_vars = parse_losses(losses)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 56, in parse_losses
dist.all_reduce(loss_value.div(dist.get_world_size()))
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 902, in all_reduce
work = _default_pg.allreduce([tensor], opts)
RuntimeError: Socket Timeout
I got the training on the coco dataset working as expected.
However, when I try to set the --validate flag, as recommended in the documentation, the training fails as soon as it starts to do the first validation step.
The command
./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2
works while
./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2 --validate
produces the following error
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 58/58, 40.1 task/s, elapsed: 1s, ETA: 0s
Traceback (most recent call last):
File "./tools/train.py", line 125, in <module>
main()
File "./tools/train.py", line 121, in main
timestamp=timestamp)
File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 250, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 278, in train
self.call_hook('after_train_epoch')
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 231, in call_hook
getattr(hook, fn_name)(self)
File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 64, in after_train_epoch
self.evaluate(runner, results)
File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 124, in evaluate
result_files = results2json(self.dataset, results, tmp_file)
File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 224, in results2json
json_results = det2json(dataset, results)
File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 153, in det2json
for i in range(bboxes.shape[0]):
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/klauskofler/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py', '--launcher', 'pytorch', '--validate']' returned non-zero exit status 1.
It seems to me, that the validation wants to calculate KPIs based on the bounding boxes produced by the network, while the network does not produce any bounding boxes. Is this behavior expected or am I doing something wrong?
Thanks for your great work! In Table-3 of SOLO paper, why scale of FPN levels overlaps with each other? And Why do you re-scale P2 to size of P3? Thanks!
Thanks for your great job! I have a question about SOLO.Can SOLO handle the situation that more than one object falls into the same grid?
Hey @WXinlong, well done and thanks for sharing. I have a question about SoloV1 since SoloV2 is not published yet.
-- which one of the pretrained models is suitable to get a high Fps ( around ~30Fps) on 1080p videos? Is it possible to reach to ~30 fps with the lightweight models?
Thanks!
torch 1.5 build error.
Why is the mask branch 2H * 2W instead of H*W?
Hello, thanks for the paper.
I read over the two papers SoLo V2 and Cond Inst from your lab. But I barely see the difference between the two methods, except for the Matrix NMS.
If I understand correctly, the output of mask branch is no longer the cell location categories as in Solo-V1, (HxWxS^2), so it no longer inherit the key idea of SoLo.
Would you help to point out their difference, and a compare their performance in term of Speed?
Thank you.
For example, what‘s the differences between “Decoupled_SOLO_R50_3x” and “DECOUPLED_SOLO_LIGHT_R50_3x”?
It seems on my 2 G memory edge device, SOLO can not be run, is there anyway to shrink down the memory usage?
dear @WXinlong , I'm working on reimplementing the SOLO using maskrcnn-benchmark
, and got some problems. I hope you could give me some advice. I found that most parts of your code still be used without any big modifications. when dealing with some small errors(for instance, maskrcnn-benchmark
uses a class(SegmentationMask)
to distribute our mask
image, while seg_mask
is represented as an image in SOLO)
In your implementation, you calculate the coordinate of the mask mass
by calling an existed function.
center_h, center_w = ndimage.measurements.center_of_mass(seg_mask)
how to calculate this coordinate if I have a SegmentationMask class
not an image? thank you in advance.
Sorry to bother, but I'm not familiar with mmdetection, so what if I want to inference a single image instead of the coco test set?
Thanks in advance~
I have trained SOLO with cityscape datasets, without modifying any settings except for datasets
part. Here is my result. Obviously, solo performs very badly on the cityscape. Maybe I need to tune some hyper-parameters. Do you have any cues?
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.075
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.157
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.068
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.063
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.166
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.096
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.158
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.168
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.142
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.335
Hi, I am confused about some params in test config.
test_cfg = dict(
nms_pre=500,
score_thr=0.1,
mask_thr=0.5,
update_thr=0.05,
kernel='gaussian', # gaussian/linear
sigma=2.0,
max_per_img=100)
what is the update_thr, kernel, sigma mean here?
Can't I adjust NMS_threshold in the NMS operation like in maskrcnn?
When I try to train this model, this error happens.
../mmdet/models/anchor_heads/solo_head.py", line 203, in loss
num_ins = flatten_ins_ind_labels.sum()
RuntimeError: "sum_cuda" not implemented for 'Bool'
i have run 'python setup.py build_ext --inplace'
Does anyone try the SOLO model for thin, dotted line like this?
For the above image with one object (line), assume that I have many small polygons for that line.
I observed that the SOLO model did not perform well (the scores are very low).
Any suggestions on improving the data, or modifying the hyper-parameters?
Thanks in advance!
Hi,
Could you please explain what does the points_nms do during evaluation? I didnot find it in the paper, unfortunately. Thank you very much!
Hi Xinlong, I'm wondering if the inference speed is reported as the xxx task/s after evaluation is done:
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 6.5 task/s, elapsed: 767s, ETA: 0s
I run the test_ins.py file, so here, the inference speed is 6.5 fps?
Hello, I successfully installed the soso based on mmdet v1.0.0 and compiled well.
I can train the model with 1 GPU.
However, the runtime error occurs when I try to train the model with multiple GPUs.
The error looks like:
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
return comm.broadcast_coalesced(tensors, devices)
The script I am using is : ./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 8
The dataset I used is coco2017
Environment
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
GCC: gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.4.2
MMDetection: 1.0.0+unknown
MMDetection Compiler: GCC 8.2
MMDetection CUDA Compiler: 10.1
你好,请问训练自己的数据集还要改哪些地方呢,我的数据集在训练的时候没报错,但在测试的时候报错了,还有类别名和类别数目没改没关系吗?
期待您的解答,谢谢!
Thanks for sharing your nice code.
I meet the following error when I attempt to train the SOLO using multiple GPUs.
(i.e., tools/dist_train.sh configs/solo/decoupled_solo_r50_fpn_8gpu_3x.py 8)
Environment
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.5.0
OpenCV: 4.1.0
MMCV: 0.4.2
MMDetection: 1.0.0+925cc7c
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.0
Error traceback
Traceback (most recent call last):
File "tools/train.py", line 125, in
main()
File "tools/train.py", line 121, in main
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 253, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 263, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 78, in batch_processor
losses = model(**data)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 464, in forward
self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
; (2) making sure all forward
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward
function. Please include the loss function and the structure of the return value of forward
of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /opt/conda/conda-bld/pytorch_1579040055865/work/torch/csrc/distributed/c10d/reducer.cpp:514)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f1a30bf9627 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocatorat::Tensor > const&) + 0x7b7 (0x7f1a672cc557 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0xa39bd1 (0x7f1a672b7bd1 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x28ba06 (0x7f1a66b09a06 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyMethodDef_RawFastCallKeywords + 0x264 (0x56045afec114 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #5: _PyCFunction_FastCallKeywords + 0x21 (0x56045afec231 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #6: _PyEval_EvalFrameDefault + 0x52cf (0x56045b050e8f in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #7: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #8: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #9: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #10: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #13: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #15: + 0x17512a (0x56045b00012a in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #16: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #19: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x56045afa6805 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #24: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #27: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x6a0 (0x56045b04c260 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0xc30 (0x56045afa6030 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #30: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #32: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x56045afeb68b in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x416 (0x56045b04bfd6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #37: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #38: PyEval_EvalCodeEx + 0x44 (0x56045afa65f4 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #39: PyEval_EvalCode + 0x1c (0x56045afa661c in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #40: + 0x21c974 (0x56045b0a7974 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #41: PyRun_FileExFlags + 0xa1 (0x56045b0b1cf1 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #42: PyRun_SimpleFileExFlags + 0x1c3 (0x56045b0b1ee3 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #43: + 0x227f95 (0x56045b0b2f95 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #44: _Py_UnixMain + 0x3c (0x56045b0b30bc in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #45: __libc_start_main + 0xf0 (0x7f1a6c753830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #46: + 0x1d0990 (0x56045b05b990 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
Why is the mask branch 2H * 2W instead of H*W?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.