Git Product home page Git Product logo

oicr's People

Contributors

ppengtang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oicr's Issues

Check failed: result == ncclSuccess (2 vs. 0) unhandled system error

Hi Peng Tang,
I tried running your multi-gpu PCL code, and got the error
F0118 23:24:48.788218 37264 parallel.cpp:135] Check failed: result == ncclSuccess (2 vs. 0) unhandled system error. *** Check failure stack trace: *** I tried cuda9 and cuda8, got the same problem. Have you encountered this? Do you have any suggestions?

Thanks in advance.

Out of memory in training VGG16-based models

Hello.
I'd like to ask just a quick question.

When I set up all the requirements and run the training code of oicr below using single Titan X, Check failed: error == cudaSuccess (2 vs. 0) out of memory error occurs.

$ ./tools/train_net.py --gpu 0 --solver models/VGG16/solver.prototxt --weights data/imagenet_models/VGG16.v2.caffemodel --iters 70000

It seems that this network consumes almost 25G memories and cannot be trained on single Titan X.
Is it correct?
Thanks in advance!

out of memory problem

Hi Peng Tang

Thanks for your great work. When I run the training code using:

./tools/train_net.py --gpu 1 --solver models/VGG16/solver.prototxt \
	--weights data/imagenet_models/$VGG16_model_name --iters 70000

I got following problem:
train

I'm using TITAN XP with 12 G memory. And I find before the error, the memory was almost full:

gpu

Even I change the batch size by decreasing " iter_size" of the solver.prototxt from 2 to 1, the problem still exists.

train2

I would appreciate if you can help to figure out the problem. Thanks!

Evalutaing on VOC 2012 detection

How did you evaluate results on VOC 2012? Evaluation server is still available for submission? I created a login and didn't do any submission so far. Also, it shows that only 2 submissions are possible(at the competition time). Is it the same case now?

matlab version n custom data training

Hello! Thank you very much for sharing this nice work.
First, could you please tell me which Matlab version you use(maybe recent version would not work well with caffe).
Second, I have some custom dataset and I want to train PCL with it(my data also has VOC label format).
Then, I guess I need to prepare proposal boxes file generated by selective search algorithm. Can you let me know which code did you use? Fast/er RCNN repo also have that source? maybe I need to use Matlab?

Please give me some hints.

I'm considering merge your work with mine to improve my network.

Thank you in advance!

what is the final avg loss value?

I'm trying to modify your code and retrain it with VOC 2007 first and VOC2012 and COCO later.
Currently, I'm using 2 GPU(each 8GB, so I changed the 5 multiscale factors to 4 (480, 576, 688, 864) and MAX_SIZE to 1024) and from around 24000 iterations, the loss values were dropped to around 0.1xxx to 0.2xxx( it starts from around 4.xx to 5.xx ).
I wonder this result is correct or not.

Can anyone share the training loss results?

Thank you~!

10:43.547068 Iteration 4260, lr = 0.0001
11:03.822432 32388 solver.cpp:219] Iteration 4280 (0.986387 iter/s, 20.276s/20 iters), loss = 0.171675
11:03.822456 Train net output #0: loss_midn = 0.243829 (* 1 = 0.243829 loss)
11:03.822463 Train net output #1: loss_refine = 2.31813e-19 (* 1 = 2.31813e-19 loss)
11:03.822466 Train net output #2: loss_refine1 = 0.00681272 (* 1 = 0.00681272 loss)
11:03.822470 Train net output #3: loss_refine2 = 0.00233187 (* 1 = 0.00233187 loss)
11:03.822859 Iteration 4280, lr = 0.0001
11:22.718451 32388 solver.cpp:219] Iteration 4300 (1.05841 iter/s, 18.8962s/20 iters), loss = 0.17244
11:22.718477 Train net output #0: loss_midn = 0.0021711 (* 1 = 0.0021711 loss)
11:22.718483 Train net output #1: loss_refine = 0.00475247 (* 1 = 0.00475247 loss)
11:22.718487 Train net output #2: loss_refine1 = 0.0212138 (* 1 = 0.0212138 loss)
11:22.718490 Train net output #3: loss_refine2 = 0.0162532 (* 1 = 0.0162532 loss)
11:22.829041 Iteration 4300, lr = 0.0001
11:41.827881 32388 solver.cpp:219] Iteration 4320 (1.04659 iter/s, 19.1096s/20 iters), loss = 0.196953
11:41.827906 Train net output #0: loss_midn = 0.0358219 (* 1 = 0.0358219 loss)
11:41.827913 Train net output #1: loss_refine = 2.13161e-16 (* 1 = 2.13161e-16 loss)
11:41.827916 Train net output #2: loss_refine1 = 0.0165497 (* 1 = 0.0165497 loss)
11:41.827919 Train net output #3: loss_refine2 = 0.0209345 (* 1 = 0.0209345 loss)
11:41.828313 Iteration 4320, lr = 0.0001
12:01.396237 32388 solver.cpp:219] Iteration 4340 (1.02205 iter/s, 19.5686s/20 iters), loss = 0.215037
12:01.396260 Train net output #0: loss_midn = 0.0575677 (* 1 = 0.0575677 loss)
12:01.396266 Train net output #1: loss_refine = 0.00299502 (* 1 = 0.00299502 loss)
12:01.396270 Train net output #2: loss_refine1 = 0.00881063 (* 1 = 0.00881063 loss)
12:01.396273 Train net output #3: loss_refine2 = 0.0100217 (* 1 = 0.0100217 loss)
12:01.396672 Iteration 4340, lr = 0.0001
12:20.180405 32388 solver.cpp:219] Iteration 4360 (1.06471 iter/s, 18.7844s/20 iters), loss = 0.181949
12:20.180433 Train net output #0: loss_midn = 0.142086 (* 1 = 0.142086 loss)
12:20.180439 Train net output #1: loss_refine = 0.00437473 (* 1 = 0.00437473 loss)
12:20.180444 Train net output #2: loss_refine1 = 0.0119189 (* 1 = 0.0119189 loss)
12:20.180447 Train net output #3: loss_refine2 = 0.0133803 (* 1 = 0.0133803 loss)
12:20.532896 Iteration 4360, lr = 0.0001
12:41.684222 32388 solver.cpp:219] Iteration 4380 (0.930057 iter/s, 21.5041s/20 iters), loss = 0.215015
12:41.684250 Train net output #0: loss_midn = 0.00031576 (* 1 = 0.00031576 loss)
12:41.684257 Train net output #1: loss_refine = 0.00440917 (* 1 = 0.00440917 loss)
12:41.684259 Train net output #2: loss_refine1 = 0.00945 (* 1 = 0.00945 loss)
12:41.684263 Train net output #3: loss_refine2 = 0.00943446 (* 1 = 0.00943446 loss)
12:41.684662 Iteration 4380, lr = 0.0001
13:00.646263 32388 solver.cpp:219] Iteration 4400 (1.05473 iter/s, 18.9622s/20 iters), loss = 0.236493
13:00.646291 Train net output #0: loss_midn = 0.152448 (* 1 = 0.152448 loss)
13:00.646297 Train net output #1: loss_refine = 4.01682e-14 (* 1 = 4.01682e-14 loss)
13:00.646301 Train net output #2: loss_refine1 = 0.00930832 (* 1 = 0.00930832 loss)
13:00.646306 Train net output #3: loss_refine2 = 0.0112133 (* 1 = 0.0112133 loss)
13:00.646688 Iteration 4400, lr = 0.0001
13:19.318856 32388 solver.cpp:219] Iteration 4420 (1.07108 iter/s, 18.6728s/20 iters), loss = 0.200657
13:19.318886 Train net output #0: loss_midn = 0.00179278 (* 1 = 0.00179278 loss)
13:19.318891 Train net output #1: loss_refine = 0.00438066 (* 1 = 0.00438066 loss)
13:19.318894 Train net output #2: loss_refine1 = 0.0138651 (* 1 = 0.0138651 loss)
13:19.318898 Train net output #3: loss_refine2 = 0.0102132 (* 1 = 0.0102132 loss)
13:19.532181 Iteration 4420, lr = 0.0001

how to get OICR_ENS

Hi,

Thanks for releasing your code. Would you also provide details about how to get OIRC_ENS? In the paper, you mentioned you just sum up scores produced by VGG_M and VGG16 models. What do you do with the combined score when you train a Fast-RCNN model? The combined scores may be bigger than 1.

Thanks!

no results

Hi Peng Tang,

After running your training code successfully, I got the trained model with 70000 iterations. However, when I try do the testing by using

./tools/test_net.py --gpu 1 --def models/VGG16/test.prototxt \ --net output/default/voc_2007_trainval/vgg16_oicr_iter_70000.caffemodel \ --imdb voc_2007_trainval

I got the following error:

image

I also checked the detections.pkl file and found each component of the output is
array([], shape=(0, 5), dtype=float32)

It' seems that the learned model doesn't work with no output. I would appreciate if you can help to check the error. Thanks!

How is the pretrained VGG loaded?

The pre-trained VGG contains parameters of fc6 and fc7. I want to know whether they are loaded before oicr training?

If so, how is this VGG pre-trained? The input size of fc6 is different from the original VGG.

how to test CorLoc, could you show me the code?

I want to test CorLoc, could you give me the test code?
Also i found the training is time-consuming, i trained 70000 iterations on VGG16 with Titan X. It costs me 3 days to finish the whole training period, is it right?

请教您一个问题

刚开始接触弱监督,您这篇文章中,使用VOC数据集只使用了图片级的label,那是不是意思是里面只有框没有标签,还是说啥都没有,只有一个图片标签

How critical is the heap-based evaluation?

I'm trying to re-implement your model under detectron2 to build on for my own research (trying to have it using an up-to-date version of pytorch, this seemed like the easiest route). So far the VGG16-based model is peaking at ~30% accuracy, rather than the reported 41.2%. I have tried a couple different sets of weights exported from Caffe as well as the Pytorch pretrained weights and they all land at the same point (although Caffe weights are prone to turning to NaN and crashing). I have tried re-initialising the dilated convs and keeping the pretrained weights, and I have still been unable to reproduce the results.

I don't have the heap-based heuristics since they aren't in the paper, so I figured it was not essential. I am wondering if I might be mistaken?

Segmentation fault (core dumped)

Caffe compiled successfully, but , when I was training, I was given an error:
Segmentation fault (core dumped)
Do you have any suggestions?

Also, can you tell me the version of the key dependencies?
protobuf opencv numpy cython scikit-image easydict...

Also, which python version?

Minor issues during repro

  1. Had to rename lib/utils/bbox.pyx to lib/utils/cython_bbox.pyx and lib/utils/bbox.pyx to lib/utils/cython_bbox.pyx (and fix setup.py) in order to match extension names in lib/setup.py. Otherwise I was getting an import error reported at https://stackoverflow.com/questions/8024805/cython-compiled-c-extension-importerror-dynamic-module-does-not-define-init-fu.

  2. Add import google.protobuf.text_format to lib/fast-rcnn/train.py in order to avoid error reported in rbgirshick/py-faster-rcnn#198

Question about multiple losses

Hi!

Thanks for making the code available! I have a few minor questions:

  1. Do I understand correctly that iterative retraining procedure is not performed and the refinement is implemented by combining L_b (loss_midn) and L_r (loss_oicr)?

  2. Could you please comment on differences and purpose of loss_oicr, loss_oicr1, loss_oicr2 loss combination? Do they represent iterative refinement in some way? Are the results of using only loss_oicr represented in Figure 4 at tick "1 refinement time"?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.