fishyuli / balancedgroupsoftmax Goto Github PK

View Code? Open in Web Editor NEW

351.0 351.0 63.0 2.43 MB

CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

License: Apache License 2.0

Python 89.69% C++ 3.46% Cuda 6.77% Shell 0.07%

balancedgroupsoftmax's People

Contributors

Stargazers

Watchers

balancedgroupsoftmax's Issues

evaluate each bins ap

thanks for your work.
I have a question about eval mAP.I want to know how to evaluate each groups ap?
Looking forward to your reply.

Is there a Balanced Group Softmax implementation that works with detectron2

If not, how can I go about implementing Balanced Group Softmax for detectron2?

OSError: ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth is not a checkpoint file

OSError: ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth is not a checkpoint file
Hi, I want to know how to solve this problem

A simple question regarding design choice of BAGS

Hi
Thanks for sharing your wonderful project.

I have a simple question regarding design choice of BAGS. It would be appreciated if you would answer.

Q) Why didn't you try normalizing weight norm of classifier instead of grouping the category in the paper? Have you ever tested it before?

one stage

Thank you for sharing your work.i want to asked. I see that your papers are all implemented on the two stage, and are also adapted on the one stage?

I tried to train your model with 2 Titan Xp GPUS, but I got an error.
It was okay to train your model with a single GPU with train.py.
I just modified the pretrained model directory in the config file.

With Python 3.6.9
torch 1.3.1
mmcv 0.2.14
torchvision 0.4.2.
mmdet 1.0.rc0

This is my error message

Traceback (most recent call last):
  File "./tools/train.py", line 169, in <module>
    main()
  File "./tools/train.py", line 165, in main
    logger=logger)
  File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 58, in train_detector
    _dist_train(model, dataset, cfg, validate=validate)
  File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 205, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 358, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 260, in train
    for i, data_batch in enumerate(data_loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 682, in __init__
    w.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', './tools/train.py', '--local_rank=1', 'configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Please give me some advice. Thank you.

About pre-trained models

Hello, I can't download this ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth file right now, I dare ask if you have downloaded it

Which file includes GroupSoftmax?

Hi, I found this work very interesting. However, I still cannot find the loss function that includes GroupSoftmax.
Could you please tell me which file I should read?

TTA hurts performance

I've noticed that both in paper and published codes, authors use single resolution for testing performance, however, after fine-tuning RoIHead (in the mean time, Backbone, FPN, RPN are frozen) using BAGS, and test with [(800, 3333), (1000, 3333), (1200, 3333)] (flip is set to True), it's worse than testing with (800, 1333), more specifically, BBox AP drops 0.4 but Mask AP increases 0.3 but still worse than the model trained without BAGS fine-tuning.

OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file

I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Missing of implementation of GroupSoftmax

This is a great project for long-tail visual recognition.
I wonder the current repo is complete or not?
There is missing of implementation of GroupSoftmax

Thanks.

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1091)>
What parts of the code need to be modified to fix this bug？

why i can not find the correct version of mmcv 0.2.14 and mmdetection 1.0.rc0

the version of mmcv 0.2.14, whcih reffer to in the readme, and the version of mmdetection 1.0.rc0, which reffer to in the version.py can not find in the official web.

batchsize or image_size problem

my config:
imgs_per_gpu=8,
workflow: [('train', 1), ('val', 1)]

error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0.

Weight Norm

Where is the definition of weight norm in your paper? How did you calculate weight norm of each category?

How to train my own dataset?

Thank you for your great project. But how to train my own dataset?

Error when training process ModuleNotFoundError: No module named 'mmdet'

(mmdet) D:\BalancedGroupSoftmax>python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py
Traceback (most recent call last):
File "tools/train.py", line 8, in
from mmdet import version
ModuleNotFoundError: No module named 'mmdet'

I got this error when I did the training, is there something wrong? I am looking forward for your repply

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

The problem of batch size

Thanks for sharing your code. When I run your code with one GPU, I set the batch size to 1. However, I find the performance decreases. Could you give me some advice? Thank you very much.

ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

Traceback (most recent call last):
File "tools/train.py", line 9, in
from mmdet.apis import (get_root_logger, init_dist, set_random_seed,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/init.py", line 2, in
from .inference import (inference_detector, init_detector, show_result,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/inference.py", line 11, in
from mmdet.core import get_classes
File "/home/qth/BalancedGroupSoftmax/mmdet/core/init.py", line 6, in
from .post_processing import * # noqa: F401, F403
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/init.py", line 1, in
from .bbox_nms import multiclass_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/bbox_nms.py", line 3, in
from mmdet.ops.nms import nms_wrapper
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/init.py", line 7, in
from .nms import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/init.py", line 1, in
from .nms_wrapper import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/nms_wrapper.py", line 4, in
from . import nms_cpu, nms_cuda
ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

When I follow the steps of README.md and use "CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py" to train the model, I meet the above problems, can you tell me how to solve it?

pull request porting to detectron2

Train problem

Firstly,Thanks for your great work ,when I train in my own data, I found the return of LoadAnnotations._load_bboxes is None. the result is that only read img no stop but not train .
Looking forward to your reply.Thanks

How to get access COCO-LT?

I hope we can access COCO-LT to shorten time to validate some method.
Could you plz upload the code to sift out COCO-LT from COCO dataset?

Question about the sample out in RFS

hi,
I have a question about the implementation detail in RFS , in the class DistributedGroupSampler_addrepeat_sampleout
defined in sampler.py file, there is a hard coding self.num_to_sample_out= [6000, 17000] , I read the code , it seems that you randomly exclude some images which need not to repeat resampling more than once and the exclusion number are 6000 and 17000 respectively for different group flag

if it is ,can you explain the reason why it is needed to do this exlusion and how to decide the exclusion number? thanks a lot

Problem with dist_train

I meet the same problem with #2 , but I don't know how to deal with it.
(plz tell me in detail! I am almost running out of my mind)

fishyuli / balancedgroupsoftmax Goto Github PK

balancedgroupsoftmax's People

Contributors

Stargazers

Watchers

Forkers

balancedgroupsoftmax's Issues

Welcome update to OpenMMLab 2.0

Recommend Projects

Recommend Topics

Recommend Org