wxinlong / solo Goto Github PK

View Code? Open in Web Editor NEW

1.7K 32.0 306.0 6.07 MB

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

License: Other

Python 92.93% Dockerfile 0.03% C++ 2.34% Cuda 4.44% Shell 0.07% Cython 0.19%

solo instance-segmentation object-detection pytorch solov2

solo's People

Contributors

Stargazers

Watchers

Forkers

gztangde gdwu1427 xuewengeophysics collector-m tiantianwang gzzgz j0x7c4 elfpattern ljjyxz123 taokong maxadda irfanicmll jacke121 yoyomimi jacklongking yinan-zhao xiaoye77 fbqwings monkeyjohn zumbalamambo 123wk45678 covering simmimourya wojiaoyanmin lxcxuebao liuwenhaha lxtgh zombie0117 zzzzzz0407 karim-ahmed brookjona unsky chisyliu cverlpeng xiaolaodi gxingjie20 youtang1993 visionresearch kshaonan labimage dyy0205 gengxqa dbofseuofhust qiuhuan imfinethankyou lotayou twangnh killyseason sporterman gladcolor zspmilan styjb definezyp ma-xu deppmeng arui1 perfyperfect johnbhlm justimyhxu vincezengqiang klsdjft guofenggitlearning wanxinjun sudohello abcxs groundwalker7 yagami5 cv-ip zehuichen123 openseg-group huaijialin pppyykknen indigopyj holygen ottolu 17bsjk dedety2 zhaodanlw8666 xyy19920105 gehongpeng tony-hou helloricky123 wencheng256 linhandai hustllz mornydew aceforest yghstill eric-thu segust mepleleo yingmuying rosyapril dcnhan hiroki-kyoto syc10-09 xueping-ni aliushn 349214897 dreamerdoremi

solo's Issues

Question on SOLOv2

Hello,

If my understanding is correct, I see the SOLOv2 only uses a single combined feature map (Fig3 in the paper), which is different from SOLOv1 that uses multi-scale FPN features.

Is the proposed dynamic head applied to this single combined feature map?

Thanks

9 fps of decoupled_solo_light_dcn_release_r50_fpn_8gpu_3 on GTX 1080 Ti

I trained decoupled_solo_light_dcn_release_r50_fpn_8gpu_3x.py on my own data with default config (only changed path to data and number of classes). When I tried to run the model on video, it gave 8-9 fps on average instead of 20 that was in the paper. What is the reason?
To calculate avg FPS I just ran inference_detector() for every frame in the video without any post-processing.

the inference time cost two much?

I use a picture to detect, the model is Decoupled_SOLO_Light_R50_3x and the default config. the inference time cost two much time,about 0.6 sec,why?

questions about training schedule

Hi Xinlong, I'm wondering if results in Table 1 of solov2 paper all trained with 6x, or only Mask R-CNN* is trained with 6x?

LVIS training schedule

Hi! Xinlong, could you please share how long is the LVIS solov2 model trained? I suppose it is 6x, right? Thanks!

Question about CoordConv

I don't understand how to put "two or more layers of CoordConv" in your network in Table4.
e.i. I want to know what's the position of another CoordConv in this network.
我不知道您在做table4的实验的时候CoordConv都加在那哪了，为什么可以加1个以上的CoordConv？

how to eval bbox

I use --eval bbox, but the result has no boxes, am I missing something ?

removing point_nms leads to near 0 AP

Hi! XinLong, i noted in #33, the point nms is only used to reduce memory usage, but when I try to remove it, the performance drop to near 0, could you please give some advice?

OSError: ../checkpoints/DECOUPLED_SOLO_R50_3x.pth is not a checkpoint file

an oserror occurred when i run "python inference_demo.py":
Traceback (most recent call last):
File "inference_demo.py", line 12, in
model = init_detector(config_file, checkpoint_file, device='cuda:0')
File "/home/adt/Documents/alg/big-xing/SOLO/mmdet/apis/inference.py", line 38, in init_detector
checkpoint = load_checkpoint(model, checkpoint)
File "/home/adt/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 168, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ../checkpoints/DECOUPLED_SOLO_R50_3x.pth is not a checkpoint file

AttributeError: 'tuple' object has no attribute 'shape'

when i try run command：python demo/webcam_demo.py configs/solo/solo_r50_fpn_8gpu_3x.py checkpoints/SOLO_R50_3x.pth

i got the error :
Press "Esc", "q" or "Q" to exit.
Traceback (most recent call last):
File "demo/webcam_demo.py", line 47, in
main()
File "demo/webcam_demo.py", line 43, in main
show_result(img, result, model.CLASSES, score_thr=args.score_thr, wait_time=1)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in show_result
for i, bbox in enumerate(bbox_result)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in
for i, bbox in enumerate(bbox_result)
AttributeError: 'tuple' object has no attribute 'shape'

Environment
just follow the install.md : pip install -v -e .[all]

RuntimeError occurs after the code modification

Hello,

I get the following runtime error.

The error can be reproduced by the following steps:

pull the code from the original SOLO repo (let's say we do this at local1)
build the code (i.e., python setup.py develop)
code modifying/executing works well at this point... I then push to my own git
pull the code from my own git to another local (let's say we do this at local2)
At this point, if I build the code and attempt to execute the training script, the following runtime error occurs.

RuntimeError: cuda runtime error (98) : unrecognized error code at mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss_cuda.cu:128
loss_cate = self.loss_cate(flatten_cate_preds, flatten_cate_labels, avg_factor=num_ins + 1)
File "/home/user/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 79, in forward
avg_factor=avg_factor)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 37, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred, target, gamma, alpha)
File "/home/user/ssd2/solo_pano/mmdet/ops/sigmoid_focal_loss/sigmoid_focal_loss.py", line 19, in forward
gamma, alpha)

What might be the problem?

#######################################################
local 1 environment

sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.105
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+56db9d2
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.1

########################################################
local 2 environment

sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+2c951b9
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.0

I see the Cuda version is different between local 1 and local 2.
Can it be the reason?

Validation in training？

hi, xinlong, does the code support verification in training? eg. set workflow = [('train', 1),('val', 1)] in config.py to enable validation in training? Thx!

questions about get_seg_single

HI~I am reading source code of SOLO and I am not sure what strides are in the following code and what role has he taken? I'd appreciate it if you could give me a hint

    # process.
    inds = (cate_preds > cfg.score_thr)
    # category scores.
    cate_scores = cate_preds[inds]
    if len(cate_scores) == 0:
        return None
    # category labels.
    inds = inds.nonzero()
    cate_labels = inds[:, 1]

    # strides.
    size_trans = cate_labels.new_tensor(self.seg_num_grids).pow(2).cumsum(0)
    strides = cate_scores.new_ones(size_trans[-1])
    n_stage = len(self.seg_num_grids)
    strides[:size_trans[0]] *= self.strides[0]
    for ind_ in range(1, n_stage):
        strides[size_trans[ind_ - 1]:size_trans[ind_]] *= self.strides[ind_]
    strides = strides[inds[:, 0]]

    # masks.
    seg_preds = seg_preds[inds[:, 0]]
    seg_masks = seg_preds > cfg.mask_thr
    sum_masks = seg_masks.sum((1, 2)).float()

    # filter.
    keep = sum_masks > strides
    if keep.sum() == 0:
        return None`

question about center region code

top = max(top_box, coord_h-1) down = min(down_box, coord_h+1) left = max(coord_w-1, left_box) right = min(right_box, coord_w+1)

why don't directly set "top" to "coord_h-1", "down" to "coord_h+1"? There are still few positive samples.

Question about GPU batch computation

I have a question about how the code of this repo is forwarding a batch.

If I am correctly understood, seems like it's taking out a single image related instances from batch, compute, and gather loss at the end.

Would the training time take a bit longer than forwarding the whole batch as one?
Is there a difference of performance if batch is forwarded as a whole by adding some zero padding?

multi gpu test

Hi Xinlong, seems README does not support multi-gpu test?

How can I closed the data augumentation when I trains a model?

When I run the train script, I found the input image's size is changing.

build docker image error

Step 12/12 : RUN pip install --no-cache-dir -e .
 ---> Running in 10680a3a2218
Obtaining file:///SOLO
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SOLO/setup.py'"'"'; __file__='"'"'/SOLO/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
         cwd: /SOLO/
    Complete output (8 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/SOLO/setup.py", line 251, in <module>
        sources=['src/compiling_info.cpp']),
      File "/SOLO/setup.py", line 103, in make_cuda_ext
        raise EnvironmentError('CUDA is required to compile MMDetection!')
    OSError: CUDA is required to compile MMDetection!
    No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

about random seed

Hi Xinlong, I'm wondering if you could share what random seed is used for the 35.8AP Resnet50 model?

how important is the added initialization for cones

Hi! Xinlong, there are added codes lines for additional initialization of the convo

SOLO/mmdet/models/anchor_heads/solo_head.py

Lines 107 to 116 in d5398a0

 def init_weights(self): 

 for m in self.ins_convs: 

 normal_init(m.conv, std=0.01) 

 for m in self.cate_convs: 

 normal_init(m.conv, std=0.01) 

 bias_ins = bias_init_with_prob(0.01) 

 for m in self.solo_ins_list: 

 normal_init(m, std=0.01, bias=bias_ins) 

 bias_cate = bias_init_with_prob(0.01) 

 normal_init(self.solo_cate, std=0.01, bias=bias_cate)

I'm curious if the performance would drop a lot if those initialization are removed?

Get stuck on loading resnet50

I train with 4 GPUs by the command
./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 4
But the process can not proceede after reaching
2020-05-02 15:26:02,261 - mmdet - INFO - load model from: torchvision://resnet50

However, when I train with 1 GPU on another machine, the resnet50 can be loaded properly.
Could you help me out?

模型名称的含义,1x,2x

模型最后加个1x,2x。是啥意思啊。

可以简单讲解下gt_areas和hit_indices的计算原理吗？

训练自己数据的时候遇到一些报错，调试发现不理解这个原理。
您可以简单讲解下吗？

train my own data error

2020-05-05 00:42:16,806 - mmdet - INFO - Start running, host: huangzhipeng@k8s-deploy-rod9ow-1567512049745-7b4474f8b7-b8k8k, work_dir: /nfs/project/huangzhipeng/tools/opensorce/SOLO/work_dirs/decoupled_solo_release_r50_fpn_8gpu_3x
2020-05-05 00:42:16,808 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "./tools/train.py", line 125, in
main()
File "./tools/train.py", line 121, in main
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 250, in dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 79, in batch_processor
loss, log_vars = parse_losses(losses)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 56, in parse_losses
dist.all_reduce(loss_value.div(dist.get_world_size()))
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 902, in all_reduce
work = _default_pg.allreduce([tensor], opts)
RuntimeError: Socket Timeout

Training fails if the --validate flag is set

I got the training on the coco dataset working as expected.
However, when I try to set the --validate flag, as recommended in the documentation, the training fails as soon as it starts to do the first validation step.

The command

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2

works while

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2 --validate

produces the following error

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 58/58, 40.1 task/s, elapsed: 1s, ETA:     0s

Traceback (most recent call last):
  File "./tools/train.py", line 125, in <module>
    main()
  File "./tools/train.py", line 121, in main
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 103, in train_detector
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 250, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 278, in train
    self.call_hook('after_train_epoch')
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 231, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 64, in after_train_epoch
    self.evaluate(runner, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 124, in evaluate
    result_files = results2json(self.dataset, results, tmp_file)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 224, in results2json
    json_results = det2json(dataset, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 153, in det2json
    for i in range(bboxes.shape[0]):
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/klauskofler/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py', '--launcher', 'pytorch', '--validate']' returned non-zero exit status 1.

It seems to me, that the validation wants to calculate KPIs based on the bounding boxes produced by the network, while the network does not produce any bounding boxes. Is this behavior expected or am I doing something wrong?

Scale setting of FPN assignment

Thanks for your great work! In Table-3 of SOLO paper, why scale of FPN levels overlaps with each other? And Why do you re-scale P2 to size of P3? Thanks!

more than one object falls into the same grid

Thanks for your great job! I have a question about SOLO.Can SOLO handle the situation that more than one object falls into the same grid?

Fps on 1080 video

Hey @WXinlong, well done and thanks for sharing. I have a question about SoloV1 since SoloV2 is not published yet.
-- which one of the pretrained models is suitable to get a high Fps ( around ~30Fps) on 1080p videos? Is it possible to reach to ~30 fps with the lightweight models?

Thanks!

doesn't support torch 1.5

torch 1.5 build error.

about mask branch

Why is the mask branch 2H * 2W instead of H*W？

What the difference between SoLo V2 and CondInst (https://arxiv.org/pdf/2003.05664.pdf)

Hello, thanks for the paper.
I read over the two papers SoLo V2 and Cond Inst from your lab. But I barely see the difference between the two methods, except for the Matrix NMS.

If I understand correctly, the output of mask branch is no longer the cell location categories as in Solo-V1, (HxWxS^2), so it no longer inherit the key idea of SoLo.

Would you help to point out their difference, and a compare their performance in term of Speed?
Thank you.

what do u mean by “Light-weight models”

For example， what‘s the differences between “Decoupled_SOLO_R50_3x” and “DECOUPLED_SOLO_LIGHT_R50_3x”？

Minimal GPU memory it need?

It seems on my 2 G memory edge device, SOLO can not be run, is there anyway to shrink down the memory usage?

some problems when implementing on maskrcnn-benchmark

dear @WXinlong , I'm working on reimplementing the SOLO using maskrcnn-benchmark, and got some problems. I hope you could give me some advice. I found that most parts of your code still be used without any big modifications. when dealing with some small errors(for instance, maskrcnn-benchmark uses a class(SegmentationMask) to distribute our mask image, while seg_mask is represented as an image in SOLO)

In your implementation, you calculate the coordinate of the mask mass by calling an existed function.
center_h, center_w = ndimage.measurements.center_of_mass(seg_mask)
how to calculate this coordinate if I have a SegmentationMask class not an image? thank you in advance.

How to inference on a single image？

Sorry to bother, but I'm not familiar with mmdetection, so what if I want to inference a single image instead of the coco test set?
Thanks in advance~

bad results on cityscape datasets

I have trained SOLO with cityscape datasets, without modifying any settings except for datasets part. Here is my result. Obviously, solo performs very badly on the cityscape. Maybe I need to tune some hyper-parameters. Do you have any cues?

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.075
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.068
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.166
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.096
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.158
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.168
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.142
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.335

What is the kernel and sigma in test_cfg?

Hi, I am confused about some params in test config.

test_cfg = dict(
    nms_pre=500,
    score_thr=0.1,
    mask_thr=0.5,
    update_thr=0.05,
    kernel='gaussian',  # gaussian/linear
    sigma=2.0,
    max_per_img=100)

what is the update_thr, kernel, sigma mean here?
Can't I adjust NMS_threshold in the NMS operation like in maskrcnn?

Question about training a model

When I try to train this model, this error happens.

../mmdet/models/anchor_heads/solo_head.py", line 203, in loss
num_ins = flatten_ins_ind_labels.sum()
RuntimeError: "sum_cuda" not implemented for 'Bool'

i have run 'python setup.py build_ext --inplace'

Does anyone try the SOLO model for thin, dotted line?

Does anyone try the SOLO model for thin, dotted line like this?

For the above image with one object (line), assume that I have many small polygons for that line.
I observed that the SOLO model did not perform well (the scores are very low).

Any suggestions on improving the data, or modifying the hyper-parameters?
Thanks in advance!

Question about "points_nms" in solo_head.py

Hi,

Could you please explain what does the points_nms do during evaluation? I didnot find it in the paper, unfortunately. Thank you very much!

About inference speed

Hi Xinlong, I'm wondering if the inference speed is reported as the xxx task/s after evaluation is done:
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 6.5 task/s, elapsed: 767s, ETA: 0s

I run the test_ins.py file, so here, the inference speed is 6.5 fps?

RuntimeError: all tensors must be on devices[0] when training with multiple GPUs

Hello, I successfully installed the soso based on mmdet v1.0.0 and compiled well.
I can train the model with 1 GPU.
However, the runtime error occurs when I try to train the model with multiple GPUs.

The error looks like:
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
return comm.broadcast_coalesced(tensors, devices)

The script I am using is : ./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 8
The dataset I used is coco2017

Environment
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
GCC: gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.4.2
MMDetection: 1.0.0+unknown
MMDetection Compiler: GCC 8.2
MMDetection CUDA Compiler: 10.1

训练自己的数据集

你好，请问训练自己的数据集还要改哪些地方呢，我的数据集在训练的时候没报错，但在测试的时候报错了，还有类别名和类别数目没改没关系吗？
期待您的解答，谢谢！

RuntimeError with multi GPU training

Thanks for sharing your nice code.

I meet the following error when I attempt to train the SOLO using multiple GPUs.
(i.e., tools/dist_train.sh configs/solo/decoupled_solo_r50_fpn_8gpu_3x.py 8)

Environment
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.1.0
MMCV: 0.4.2
MMDetection: 1.0.0+925cc7c
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.0

Error traceback
Traceback (most recent call last):
File "tools/train.py", line 125, in
main()
File "tools/train.py", line 121, in main
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 253, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 263, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 78, in batch_processor
losses = model(**data)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 464, in forward
self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /opt/conda/conda-bld/pytorch_1579040055865/work/torch/csrc/distributed/c10d/reducer.cpp:514)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f1a30bf9627 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocatorat::Tensor > const&) + 0x7b7 (0x7f1a672cc557 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0xa39bd1 (0x7f1a672b7bd1 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x28ba06 (0x7f1a66b09a06 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyMethodDef_RawFastCallKeywords + 0x264 (0x56045afec114 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #5: _PyCFunction_FastCallKeywords + 0x21 (0x56045afec231 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #6: _PyEval_EvalFrameDefault + 0x52cf (0x56045b050e8f in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #7: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #8: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #9: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #10: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #13: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #15: + 0x17512a (0x56045b00012a in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #16: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #19: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x56045afa6805 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #24: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #27: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x6a0 (0x56045b04c260 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0xc30 (0x56045afa6030 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #30: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #32: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x56045afeb68b in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x416 (0x56045b04bfd6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #37: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #38: PyEval_EvalCodeEx + 0x44 (0x56045afa65f4 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #39: PyEval_EvalCode + 0x1c (0x56045afa661c in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #40: + 0x21c974 (0x56045b0a7974 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #41: PyRun_FileExFlags + 0xa1 (0x56045b0b1cf1 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #42: PyRun_SimpleFileExFlags + 0x1c3 (0x56045b0b1ee3 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #43: + 0x227f95 (0x56045b0b2f95 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #44: _Py_UnixMain + 0x3c (0x56045b0b30bc in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #45: __libc_start_main + 0xf0 (0x7f1a6c753830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #46: + 0x1d0990 (0x56045b05b990 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)

	def init_weights(self):
	for m in self.ins_convs:
	normal_init(m.conv, std=0.01)
	for m in self.cate_convs:
	normal_init(m.conv, std=0.01)
	bias_ins = bias_init_with_prob(0.01)
	for m in self.solo_ins_list:
	normal_init(m, std=0.01, bias=bias_ins)
	bias_cate = bias_init_with_prob(0.01)
	normal_init(self.solo_cate, std=0.01, bias=bias_cate)