Git Product home page Git Product logo

solo's People

Contributors

borda avatar donnyyou avatar dxist avatar erotemic avatar eugenelawrence avatar gfjiangly avatar gt-zhangacer avatar hellock avatar innerlee avatar korabelnikov avatar lindahua avatar liushuchun avatar mattdawkins avatar melikovk avatar michaelisc avatar myownskyw7 avatar oceanpang avatar patrick-llgc avatar sovrasov avatar taokong avatar ternaus avatar thangvubk avatar tyomj avatar wondervictor avatar wswday avatar wxinlong avatar yhcao6 avatar youkaichao avatar zhihuagao avatar zwwwayne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solo's Issues

Question on SOLOv2

Hello,

If my understanding is correct, I see the SOLOv2 only uses a single combined feature map (Fig3 in the paper), which is different from SOLOv1 that uses multi-scale FPN features.

Is the proposed dynamic head applied to this single combined feature map?

Thanks

9 fps of decoupled_solo_light_dcn_release_r50_fpn_8gpu_3 on GTX 1080 Ti

I trained decoupled_solo_light_dcn_release_r50_fpn_8gpu_3x.py on my own data with default config (only changed path to data and number of classes). When I tried to run the model on video, it gave 8-9 fps on average instead of 20 that was in the paper. What is the reason?
To calculate avg FPS I just ran inference_detector() for every frame in the video without any post-processing.

the inference time cost two much?

I use a picture to detect, the model is Decoupled_SOLO_Light_R50_3x and the default config. the inference time cost two much time,about 0.6 sec,why?

LVIS training schedule

Hi! Xinlong, could you please share how long is the LVIS solov2 model trained? I suppose it is 6x, right? Thanks!

Question about CoordConv

I don't understand how to put "two or more layers of CoordConv" in your network in Table4.
e.i. I want to know what's the position of another CoordConv in this network.
我不知道您在做table4的实验的时候CoordConv都加在那哪了,为什么可以加1个以上的CoordConv?

how to eval bbox

I use --eval bbox, but the result has no boxes, am I missing something ?

removing point_nms leads to near 0 AP

Hi! XinLong, i noted in #33, the point nms is only used to reduce memory usage, but when I try to remove it, the performance drop to near 0, could you please give some advice?

OSError: ../checkpoints/DECOUPLED_SOLO_R50_3x.pth is not a checkpoint file

an oserror occurred when i run "python inference_demo.py":
Traceback (most recent call last):
File "inference_demo.py", line 12, in
model = init_detector(config_file, checkpoint_file, device='cuda:0')
File "/home/adt/Documents/alg/big-xing/SOLO/mmdet/apis/inference.py", line 38, in init_detector
checkpoint = load_checkpoint(model, checkpoint)
File "/home/adt/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/checkpoint.py", line 168, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ../checkpoints/DECOUPLED_SOLO_R50_3x.pth is not a checkpoint file

AttributeError: 'tuple' object has no attribute 'shape'

when i try run command:python demo/webcam_demo.py configs/solo/solo_r50_fpn_8gpu_3x.py checkpoints/SOLO_R50_3x.pth

i got the error :
Press "Esc", "q" or "Q" to exit.
Traceback (most recent call last):
File "demo/webcam_demo.py", line 47, in
main()
File "demo/webcam_demo.py", line 43, in main
show_result(img, result, model.CLASSES, score_thr=args.score_thr, wait_time=1)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in show_result
for i, bbox in enumerate(bbox_result)
File "/home/haomeng/PycharmProjects/SOLO/mmdet/apis/inference.py", line 155, in
for i, bbox in enumerate(bbox_result)
AttributeError: 'tuple' object has no attribute 'shape'

Environment
just follow the install.md : pip install -v -e .[all]

RuntimeError occurs after the code modification

Hello,

I get the following runtime error.

The error can be reproduced by the following steps:

  1. pull the code from the original SOLO repo (let's say we do this at local1)
  2. build the code (i.e., python setup.py develop)
  3. code modifying/executing works well at this point... I then push to my own git
  4. pull the code from my own git to another local (let's say we do this at local2)
  5. At this point, if I build the code and attempt to execute the training script, the following runtime error occurs.

RuntimeError: cuda runtime error (98) : unrecognized error code at mmdet/ops/sigmoid_focal_loss/src/sigmoid_focal_loss_cuda.cu:128
loss_cate = self.loss_cate(flatten_cate_preds, flatten_cate_labels, avg_factor=num_ins + 1)
File "/home/user/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 79, in forward
avg_factor=avg_factor)
File "/home/user/ssd2/solo_pano/mmdet/models/losses/focal_loss.py", line 37, in sigmoid_focal_loss
loss = _sigmoid_focal_loss(pred, target, gamma, alpha)
File "/home/user/ssd2/solo_pano/mmdet/ops/sigmoid_focal_loss/sigmoid_focal_loss.py", line 19, in forward
gamma, alpha)

What might be the problem?

#######################################################
local 1 environment

sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.105
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+56db9d2
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.1

########################################################
local 2 environment

sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda-10.0
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: GeForce GTX 1080 Ti
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.2.16
MMDetection: 1.0.0+2c951b9
MMDetection Compiler: GCC 7.5
MMDetection CUDA Compiler: 10.0

I see the Cuda version is different between local 1 and local 2.
Can it be the reason?

Validation in training?

hi, xinlong, does the code support verification in training? eg. set workflow = [('train', 1),('val', 1)] in config.py to enable validation in training? Thx!

questions about get_seg_single

HI~I am reading source code of SOLO and I am not sure what strides are in the following code and what role has he taken? I'd appreciate it if you could give me a hint

    # process.
    inds = (cate_preds > cfg.score_thr)
    # category scores.
    cate_scores = cate_preds[inds]
    if len(cate_scores) == 0:
        return None
    # category labels.
    inds = inds.nonzero()
    cate_labels = inds[:, 1]

    # strides.
    size_trans = cate_labels.new_tensor(self.seg_num_grids).pow(2).cumsum(0)
    strides = cate_scores.new_ones(size_trans[-1])
    n_stage = len(self.seg_num_grids)
    strides[:size_trans[0]] *= self.strides[0]
    for ind_ in range(1, n_stage):
        strides[size_trans[ind_ - 1]:size_trans[ind_]] *= self.strides[ind_]
    strides = strides[inds[:, 0]]

    # masks.
    seg_preds = seg_preds[inds[:, 0]]
    seg_masks = seg_preds > cfg.mask_thr
    sum_masks = seg_masks.sum((1, 2)).float()

    # filter.
    keep = sum_masks > strides
    if keep.sum() == 0:
        return None`

question about center region code

top = max(top_box, coord_h-1) down = min(down_box, coord_h+1) left = max(coord_w-1, left_box) right = min(right_box, coord_w+1)

why don't directly set "top" to "coord_h-1", "down" to "coord_h+1"? There are still few positive samples.

Question about GPU batch computation

I have a question about how the code of this repo is forwarding a batch.

If I am correctly understood, seems like it's taking out a single image related instances from batch, compute, and gather loss at the end.

Would the training time take a bit longer than forwarding the whole batch as one?
Is there a difference of performance if batch is forwarded as a whole by adding some zero padding?

multi gpu test

Hi Xinlong, seems README does not support multi-gpu test?

build docker image error

Step 12/12 : RUN pip install --no-cache-dir -e .
 ---> Running in 10680a3a2218
Obtaining file:///SOLO
    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/SOLO/setup.py'"'"'; __file__='"'"'/SOLO/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
         cwd: /SOLO/
    Complete output (8 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/SOLO/setup.py", line 251, in <module>
        sources=['src/compiling_info.cpp']),
      File "/SOLO/setup.py", line 103, in make_cuda_ext
        raise EnvironmentError('CUDA is required to compile MMDetection!')
    OSError: CUDA is required to compile MMDetection!
    No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

about random seed

Hi Xinlong, I'm wondering if you could share what random seed is used for the 35.8AP Resnet50 model?

how important is the added initialization for cones

Hi! Xinlong, there are added codes lines for additional initialization of the convo

def init_weights(self):
for m in self.ins_convs:
normal_init(m.conv, std=0.01)
for m in self.cate_convs:
normal_init(m.conv, std=0.01)
bias_ins = bias_init_with_prob(0.01)
for m in self.solo_ins_list:
normal_init(m, std=0.01, bias=bias_ins)
bias_cate = bias_init_with_prob(0.01)
normal_init(self.solo_cate, std=0.01, bias=bias_cate)

I'm curious if the performance would drop a lot if those initialization are removed?

Get stuck on loading resnet50

I train with 4 GPUs by the command
./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 4
But the process can not proceede after reaching
2020-05-02 15:26:02,261 - mmdet - INFO - load model from: torchvision://resnet50

However, when I train with 1 GPU on another machine, the resnet50 can be loaded properly.
Could you help me out?

train my own data error

2020-05-05 00:42:16,806 - mmdet - INFO - Start running, host: huangzhipeng@k8s-deploy-rod9ow-1567512049745-7b4474f8b7-b8k8k, work_dir: /nfs/project/huangzhipeng/tools/opensorce/SOLO/work_dirs/decoupled_solo_release_r50_fpn_8gpu_3x
2020-05-05 00:42:16,808 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "./tools/train.py", line 125, in
main()
File "./tools/train.py", line 121, in main
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 250, in dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 79, in batch_processor
loss, log_vars = parse_losses(losses)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 56, in parse_losses
dist.all_reduce(loss_value.div
(dist.get_world_size()))
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 902, in all_reduce
work = _default_pg.allreduce([tensor], opts)
RuntimeError: Socket Timeout

Training fails if the --validate flag is set

I got the training on the coco dataset working as expected.
However, when I try to set the --validate flag, as recommended in the documentation, the training fails as soon as it starts to do the first validation step.

The command

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2

works while

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2 --validate

produces the following error

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 58/58, 40.1 task/s, elapsed: 1s, ETA:     0s

Traceback (most recent call last):
  File "./tools/train.py", line 125, in <module>
    main()
  File "./tools/train.py", line 121, in main
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 103, in train_detector
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 250, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 278, in train
    self.call_hook('after_train_epoch')
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 231, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 64, in after_train_epoch
    self.evaluate(runner, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 124, in evaluate
    result_files = results2json(self.dataset, results, tmp_file)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 224, in results2json
    json_results = det2json(dataset, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 153, in det2json
    for i in range(bboxes.shape[0]):
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/klauskofler/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py', '--launcher', 'pytorch', '--validate']' returned non-zero exit status 1.

It seems to me, that the validation wants to calculate KPIs based on the bounding boxes produced by the network, while the network does not produce any bounding boxes. Is this behavior expected or am I doing something wrong?

Scale setting of FPN assignment

Thanks for your great work! In Table-3 of SOLO paper, why scale of FPN levels overlaps with each other? And Why do you re-scale P2 to size of P3? Thanks!

Fps on 1080 video

Hey @WXinlong, well done and thanks for sharing. I have a question about SoloV1 since SoloV2 is not published yet.
-- which one of the pretrained models is suitable to get a high Fps ( around ~30Fps) on 1080p videos? Is it possible to reach to ~30 fps with the lightweight models?

Thanks!

What the difference between SoLo V2 and CondInst (https://arxiv.org/pdf/2003.05664.pdf)

Hello, thanks for the paper.
I read over the two papers SoLo V2 and Cond Inst from your lab. But I barely see the difference between the two methods, except for the Matrix NMS.

If I understand correctly, the output of mask branch is no longer the cell location categories as in Solo-V1, (HxWxS^2), so it no longer inherit the key idea of SoLo.

Would you help to point out their difference, and a compare their performance in term of Speed?
Thank you.

Minimal GPU memory it need?

It seems on my 2 G memory edge device, SOLO can not be run, is there anyway to shrink down the memory usage?

some problems when implementing on maskrcnn-benchmark

dear @WXinlong , I'm working on reimplementing the SOLO using maskrcnn-benchmark, and got some problems. I hope you could give me some advice. I found that most parts of your code still be used without any big modifications. when dealing with some small errors(for instance, maskrcnn-benchmark uses a class(SegmentationMask) to distribute our mask image, while seg_mask is represented as an image in SOLO)

In your implementation, you calculate the coordinate of the mask mass by calling an existed function.
center_h, center_w = ndimage.measurements.center_of_mass(seg_mask)
how to calculate this coordinate if I have a SegmentationMask class not an image? thank you in advance.

How to inference on a single image?

Sorry to bother, but I'm not familiar with mmdetection, so what if I want to inference a single image instead of the coco test set?
Thanks in advance~

bad results on cityscape datasets

I have trained SOLO with cityscape datasets, without modifying any settings except for datasets part. Here is my result. Obviously, solo performs very badly on the cityscape. Maybe I need to tune some hyper-parameters. Do you have any cues?

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.075
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.068
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.166
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.096
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.158
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.168
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.008
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.142
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.335

What is the kernel and sigma in test_cfg?

Hi, I am confused about some params in test config.

test_cfg = dict(
    nms_pre=500,
    score_thr=0.1,
    mask_thr=0.5,
    update_thr=0.05,
    kernel='gaussian',  # gaussian/linear
    sigma=2.0,
    max_per_img=100)

what is the update_thr, kernel, sigma mean here?
Can't I adjust NMS_threshold in the NMS operation like in maskrcnn?

Question about training a model

When I try to train this model, this error happens.

../mmdet/models/anchor_heads/solo_head.py", line 203, in loss
num_ins = flatten_ins_ind_labels.sum()
RuntimeError: "sum_cuda" not implemented for 'Bool'

i have run 'python setup.py build_ext --inplace'

Does anyone try the SOLO model for thin, dotted line?

Does anyone try the SOLO model for thin, dotted line like this?

images

For the above image with one object (line), assume that I have many small polygons for that line.
I observed that the SOLO model did not perform well (the scores are very low).

Any suggestions on improving the data, or modifying the hyper-parameters?
Thanks in advance!

About inference speed

Hi Xinlong, I'm wondering if the inference speed is reported as the xxx task/s after evaluation is done:
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 6.5 task/s, elapsed: 767s, ETA: 0s

I run the test_ins.py file, so here, the inference speed is 6.5 fps?

RuntimeError: all tensors must be on devices[0] when training with multiple GPUs

Hello, I successfully installed the soso based on mmdet v1.0.0 and compiled well.
I can train the model with 1 GPU.
However, the runtime error occurs when I try to train the model with multiple GPUs.

The error looks like:
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
return comm.broadcast_coalesced(tensors, devices)

The script I am using is : ./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py 8
The dataset I used is coco2017

Environment
sys.platform: linux
Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
CUDA available: True
GCC: gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.2.0
MMCV: 0.4.2
MMDetection: 1.0.0+unknown
MMDetection Compiler: GCC 8.2
MMDetection CUDA Compiler: 10.1

训练自己的数据集

你好,请问训练自己的数据集还要改哪些地方呢,我的数据集在训练的时候没报错,但在测试的时候报错了,还有类别名和类别数目没改没关系吗?
期待您的解答,谢谢!

RuntimeError with multi GPU training

Thanks for sharing your nice code.

I meet the following error when I attempt to train the SOLO using multiple GPUs.
(i.e., tools/dist_train.sh configs/solo/decoupled_solo_r50_fpn_8gpu_3x.py 8)

Environment
sys.platform: linux
Python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
CUDA available: True
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.5.0
OpenCV: 4.1.0
MMCV: 0.4.2
MMDetection: 1.0.0+925cc7c
MMDetection Compiler: GCC 5.4
MMDetection CUDA Compiler: 10.0

Error traceback
Traceback (most recent call last):
File "tools/train.py", line 125, in
main()
File "tools/train.py", line 121, in main
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 253, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/ubuntu/SOLO/mmcv/mmcv/runner/runner.py", line 263, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/ubuntu/SOLO/mmdet/apis/train.py", line 78, in batch_processor
losses = model(**data)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 464, in forward
self.reducer.prepare_for_backward([])
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /opt/conda/conda-bld/pytorch_1579040055865/work/torch/csrc/distributed/c10d/reducer.cpp:514)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f1a30bf9627 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocatorat::Tensor > const&) + 0x7b7 (0x7f1a672cc557 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #2: + 0xa39bd1 (0x7f1a672b7bd1 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x28ba06 (0x7f1a66b09a06 in /home/ubuntu/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: _PyMethodDef_RawFastCallKeywords + 0x264 (0x56045afec114 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #5: _PyCFunction_FastCallKeywords + 0x21 (0x56045afec231 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #6: _PyEval_EvalFrameDefault + 0x52cf (0x56045b050e8f in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #7: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #8: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #9: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #10: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #13: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #14: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #15: + 0x17512a (0x56045b00012a in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #16: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #18: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #19: _PyFunction_FastCallDict + 0x400 (0x56045afa6a30 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #20: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #22: _PyFunction_FastCallDict + 0x1d5 (0x56045afa6805 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #23: _PyObject_Call_Prepend + 0x63 (0x56045afc1943 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #24: PyObject_Call + 0x6e (0x56045afb4b9e in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x1e35 (0x56045b04d9f5 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #26: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #27: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x6a0 (0x56045b04c260 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0xc30 (0x56045afa6030 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #30: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #32: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #33: _PyFunction_FastCallKeywords + 0x387 (0x56045afeb917 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x14e6 (0x56045b04d0a6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #35: _PyFunction_FastCallKeywords + 0xfb (0x56045afeb68b in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x416 (0x56045b04bfd6 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #37: _PyEval_EvalCodeWithName + 0x2f9 (0x56045afa56f9 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #38: PyEval_EvalCodeEx + 0x44 (0x56045afa65f4 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #39: PyEval_EvalCode + 0x1c (0x56045afa661c in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #40: + 0x21c974 (0x56045b0a7974 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #41: PyRun_FileExFlags + 0xa1 (0x56045b0b1cf1 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #42: PyRun_SimpleFileExFlags + 0x1c3 (0x56045b0b1ee3 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #43: + 0x227f95 (0x56045b0b2f95 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #44: _Py_UnixMain + 0x3c (0x56045b0b30bc in /home/ubuntu/anaconda3/envs/mmdet/bin/python)
frame #45: __libc_start_main + 0xf0 (0x7f1a6c753830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #46: + 0x1d0990 (0x56045b05b990 in /home/ubuntu/anaconda3/envs/mmdet/bin/python)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.