Git Product home page Git Product logo

balancedgroupsoftmax's People

Contributors

fishyuli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

balancedgroupsoftmax's Issues

evaluate each bins ap

thanks for your work.
I have a question about eval mAP.I want to know how to evaluate each groups ap?
Looking forward to your reply.

A simple question regarding design choice of BAGS

Hi
Thanks for sharing your wonderful project.

I have a simple question regarding design choice of BAGS. It would be appreciated if you would answer.

Q) Why didn't you try normalizing weight norm of classifier instead of grouping the category in the paper? Have you ever tested it before?

one stage

Thank you for sharing your work.i want to asked. I see that your papers are all implemented on the two stage, and are also adapted on the one stage?

dist_train not working

I tried to train your model with 2 Titan Xp GPUS, but I got an error.
It was okay to train your model with a single GPU with train.py.
I just modified the pretrained model directory in the config file.

With Python 3.6.9
torch 1.3.1
mmcv 0.2.14
torchvision 0.4.2.
mmdet 1.0.rc0

This is my error message

Traceback (most recent call last):
  File "./tools/train.py", line 169, in <module>
    main()
  File "./tools/train.py", line 165, in main
    logger=logger)
  File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 58, in train_detector
    _dist_train(model, dataset, cfg, validate=validate)
  File "/home/wogns98/BalancedGroupSoftmax/mmdet/apis/train.py", line 205, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 358, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 260, in train
    for i, data_batch in enumerate(data_loader):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 682, in __init__
    w.start()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', './tools/train.py', '--local_rank=1', 'configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Please give me some advice. Thank you.

About pre-trained models

Hello, I can't download this ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth file right now, I dare ask if you have downloaded it

Which file includes GroupSoftmax?

Hi, I found this work very interesting. However, I still cannot find the loss function that includes GroupSoftmax.
Could you please tell me which file I should read?

TTA hurts performance

I've noticed that both in paper and published codes, authors use single resolution for testing performance, however, after fine-tuning RoIHead (in the mean time, Backbone, FPN, RPN are frozen) using BAGS, and test with [(800, 3333), (1000, 3333), (1200, 3333)] (flip is set to True), it's worse than testing with (800, 1333), more specifically, BBox AP drops 0.4 but Mask AP increases 0.3 but still worse than the model trained without BAGS fine-tuning.

OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file

I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Missing of implementation of GroupSoftmax

This is a great project for long-tail visual recognition.
I wonder the current repo is complete or not?
There is missing of implementation of GroupSoftmax

Thanks.

batchsize or image_size problem

my config:
imgs_per_gpu=8,
workflow: [('train', 1), ('val', 1)]

error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0.

Weight Norm

Where is the definition of weight norm in your paper? How did you calculate weight norm of each category?

Error when training process ModuleNotFoundError: No module named 'mmdet'

(mmdet) D:\BalancedGroupSoftmax>python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py
Traceback (most recent call last):
File "tools/train.py", line 8, in
from mmdet import version
ModuleNotFoundError: No module named 'mmdet'

I got this error when I did the training, is there something wrong? I am looking forward for your repply

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

The problem of batch size

Thanks for sharing your code. When I run your code with one GPU, I set the batch size to 1. However, I find the performance decreases. Could you give me some advice? Thank you very much.

ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

Traceback (most recent call last):
File "tools/train.py", line 9, in
from mmdet.apis import (get_root_logger, init_dist, set_random_seed,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/init.py", line 2, in
from .inference import (inference_detector, init_detector, show_result,
File "/home/qth/BalancedGroupSoftmax/mmdet/apis/inference.py", line 11, in
from mmdet.core import get_classes
File "/home/qth/BalancedGroupSoftmax/mmdet/core/init.py", line 6, in
from .post_processing import * # noqa: F401, F403
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/init.py", line 1, in
from .bbox_nms import multiclass_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/core/post_processing/bbox_nms.py", line 3, in
from mmdet.ops.nms import nms_wrapper
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/init.py", line 7, in
from .nms import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/init.py", line 1, in
from .nms_wrapper import nms, soft_nms
File "/home/qth/BalancedGroupSoftmax/mmdet/ops/nms/nms_wrapper.py", line 4, in
from . import nms_cpu, nms_cuda
ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

When I follow the steps of README.md and use "CUDA_VISIBLE_DEVICES=1 python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py" to train the model, I meet the above problems, can you tell me how to solve it?

Train problem

Firstly,Thanks for your great work ,when I train in my own data, I found the return of LoadAnnotations._load_bboxes is None. the result is that only read img no stop but not train .
Looking forward to your reply.Thanks

How to get access COCO-LT?

I hope we can access COCO-LT to shorten time to validate some method.
Could you plz upload the code to sift out COCO-LT from COCO dataset?

Question about the sample out in RFS

hi,
I have a question about the implementation detail in RFS , in the class DistributedGroupSampler_addrepeat_sampleout
defined in sampler.py file, there is a hard coding self.num_to_sample_out= [6000, 17000] , I read the code , it seems that you randomly exclude some images which need not to repeat resampling more than once and the exclusion number are 6000 and 17000 respectively for different group flag
image

if it is ,can you explain the reason why it is needed to do this exlusion and how to decide the exclusion number? thanks a lot

Problem with dist_train

I meet the same problem with #2 , but I don't know how to deal with it.
(plz tell me in detail! I am almost running out of my mind)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.