fishyuli / balancedgroupsoftmax Goto Github PK
View Code? Open in Web Editor NEWCVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.
License: Apache License 2.0
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.
License: Apache License 2.0
thanks for your work.
I have a question about eval mAP.I want to know how to evaluate each groups ap?
Looking forward to your reply.
If not, how can I go about implementing Balanced Group Softmax for detectron2?
OSError: ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth is not a checkpoint file
Hi, I want to know how to solve this problem
Hi
Thanks for sharing your wonderful project.
I have a simple question regarding design choice of BAGS. It would be appreciated if you would answer.
Q) Why didn't you try normalizing weight norm of classifier instead of grouping the category in the paper? Have you ever tested it before?
Thank you for sharing your work.i want to asked. I see that your papers are all implemented on the two stage, and are also adapted on the one stage?
I tried to train your model with 2 Titan Xp GPUS, but I got an error.
It was okay to train your model with a single GPU with train.py.
I just modified the pretrained model directory in the config file.
With Python 3.6.9
torch 1.3.1
mmcv 0.2.14
torchvision 0.4.2.
mmdet 1.0.rc0
This is my error message
Traceback (most recent call last):
File "./tools/train.py", line 169, in <module>
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 205, in _dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 358, in run
epoch_runner(data_loaders[i], **kwargs)
File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 260, in train
for i, data_batch in enumerate(data_loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 278, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 682, in __init__
w.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', './tools/train.py', '--local_rank=1', 'configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.
Please give me some advice. Thank you.
Hello, I can't download this ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth file right now, I dare ask if you have downloaded it
Hi, I found this work very interesting. However, I still cannot find the loss function that includes GroupSoftmax.
Could you please tell me which file I should read?
I've noticed that both in paper and published codes, authors use single resolution for testing performance, however, after fine-tuning RoIHead (in the mean time, Backbone, FPN, RPN are frozen) using BAGS, and test with [(800, 3333), (1000, 3333), (1200, 3333)] (flip is set to True), it's worse than testing with (800, 1333), more specifically, BBox AP drops 0.4 but Mask AP increases 0.3 but still worse than the model trained without BAGS fine-tuning.
I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.
This is a great project for long-tail visual recognition.
I wonder the current repo is complete or not?
There is missing of implementation of GroupSoftmax
Thanks.
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1091)>
What parts of the code need to be modified to fix this bug?
the version of mmcv 0.2.14, whcih reffer to in the readme, and the version of mmdetection 1.0.rc0, which reffer to in the version.py can not find in the official web.
my config:
imgs_per_gpu=8,
workflow: [('train', 1), ('val', 1)]
error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0.
Where is the definition of weight norm in your paper? How did you calculate weight norm of each category?
Thank you for your great project. But how to train my own dataset?
(mmdet) D:\BalancedGroupSoftmax>python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py
Traceback (most recent call last):
File "tools/train.py", line 8, in
from mmdet import version
ModuleNotFoundError: No module named 'mmdet'
I got this error when I did the training, is there something wrong? I am looking forward for your repply
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
Thanks for sharing your code. When I run your code with one GPU, I set the batch size to 1. However, I find the performance decreases. Could you give me some advice? Thank you very much.
Traceback (most recent call last):
File "tools/train.py", line 9, in
from mmdet.apis import (get_root_logger, init_dist, set_random_seed,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/init.py", line 2, in
from .inference import (inference_detector, init_detector, show_result,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/inference.py", line 11, in
from mmdet.core import get_classes
File "/home/qth/BalancedGroupSoftmax/mmdet/core/init.py", line 6, in
from .post_processing import * # noqa: F401, F403
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/init.py", line 1, in
from .bbox_nms import multiclass_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/bbox_nms.py", line 3, in
from mmdet.ops.nms import nms_wrapper
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/init.py", line 7, in
from .nms import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/init.py", line 1, in
from .nms_wrapper import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/nms_wrapper.py", line 4, in
from . import nms_cpu, nms_cuda
ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory
When I follow the steps of README.md and use "CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py" to train the model, I meet the above problems, can you tell me how to solve it?
Firstly,Thanks for your great work ,when I train in my own data, I found the return of LoadAnnotations._load_bboxes is None. the result is that only read img no stop but not train .
Looking forward to your reply.Thanks
I hope we can access COCO-LT to shorten time to validate some method.
Could you plz upload the code to sift out COCO-LT from COCO dataset?
hi,
I have a question about the implementation detail in RFS , in the class DistributedGroupSampler_addrepeat_sampleout
defined in sampler.py file, there is a hard coding self.num_to_sample_out= [6000, 17000] , I read the code , it seems that you randomly exclude some images which need not to repeat resampling more than once and the exclusion number are 6000 and 17000 respectively for different group flag
if it is ,can you explain the reason why it is needed to do this exlusion and how to decide the exclusion number? thanks a lot
I meet the same problem with #2 , but I don't know how to deal with it.
(plz tell me in detail! I am almost running out of my mind)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.