Git Product home page Git Product logo

pytorch_melm's Introduction

pytorch_MLEM

News. This repo supports pytorch-1.0 and higher version now!!! I borrowed code from mmdetection and also some implementation idea.

This is a simplified version of MELM with context in pytorch for the paper《Min-Entropy Latent Model for Weakly Supervised Object Detection》,which is a accepted paper in CVPR2018 and TPAMI.

This implementation is based on Winfrand's which is the official version based on torch7 and lua. This implementation is also based on ruotianluo's pytorch-faster-rcnn.

And trained on PASCAL_VOC 2007 trainval and tested on PASCAL_VOC test with VGG16 backbone, I got a performance mAP 47.98 a little better than the paper's result

If you find MELM useful and use this code, please cite our paper:

@inproceedings{wan2018min,
  title={Min-Entropy Latent Model for Weakly Supervised Object Detection},
  author={Wan, Fang and Wei, Pengxu and Jiao, Jianbin and Han, Zhenjun and Ye, Qixiang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1297--1306},
  year={2018}
}
@article{wan2019Pami,
  author    = {Fang Wan and 
               Pengxu Wei and
               Jianbin Jiao and
               Zhenjun Han and 
               Qixiang Ye},
  title     = {Min-Entropy Latent Model for Weakly Supervised Object Detection},
  journal   = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume       = {DOI:10.1109/TPAMI.2019.2898858},
  year      = {2019}
}

Prerequisites

  • Nvidia GPU 1080Ti
  • Ubuntu 16.04 LTS
  • python 3.6
  • pytorch 0.4 is required. For pytorch 1.0 or higher version, please go to the pytorch1.0 version.
  • tensorflow, tensorboard and tensorboardX for visualizing training and validation curve.

Installation

  1. Clone the repository
git clone https://github.com/vasgaowei/pytorch_MELM.git
  1. Compile the modules(nms, roi_pooling, roi_ring_pooling and roi_align)
cd pytorch_MELM/lib
bash make.sh

Setup the data

  1. Download the training, validation, test data and the VOCdevkit
cd pytorch_MELM/
mkdir data
cd data/
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
  1. Extract all of these tars into one directory named VOCdevkit
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar
  1. Create symlinks for PASCAL VOC dataset or just rename the VOCdevkit to VOCdevkit2007
cd pytorch_MELM/data
ln -s VOCdevkit VOCdevkit2007
  1. It should have this basic structure
$VOCdevkit2007/                     # development kit
$VOCdevkit2007/VOC2007/             # VOC utility code
$VOCdevkit2007/VOCcode/             # image sets, annodations, etc

And for PASCAL VOC 2010 and PASCAL VOC 2012, just following the similar steps.

Download the pre-trained ImageNet models

Downloa the pre-trained ImageNet models from https://drive.google.com/drive/folders/0B1_fAEgxdnvJSmF3YUlZcHFqWTQ or download from https://drive.google.com/drive/folders/1FV6ZOHOxLMQjE4ujTNOObI7lN8USH0v_?usp=sharing and put in in the data/imagenet_weights and rename it vgg16.pth. The folder has the following form.

$ data/imagenet_weights/vgg16.pth
$ data/imagenet_weights/res50.pth

Download the Selective Search proposals for PASCAL VOC 2007

Download it from: https://dl.dropboxusercontent.com/s/orrt7o6bp6ae0tc/selective_search_data.tgz and unzip it and the final folder has the following form

$ data/selective_search_data/voc_2007_train.mat
$ data/selective_search_data/voc_2007_test.mat
$ data/selective_search_data/voc_2007_trainval.mat

Train your own model

For vgg16 backbone, we can train the model using the following commands

./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16

And for test, we can using the following commands

./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16

Visualizing some detection results

I have pretrained MLEM_pytorch model on PASCAL VOC 2007 based on vgg16 backbone and you can download it from https://drive.google.com/drive/folders/1FV6ZOHOxLMQjE4ujTNOObI7lN8USH0v_?usp=sharing and put it in the folder output vgg16/voc_2007_trainval/default/vgg16_MELM.pth and run the following commands.

cd pytorch_MELM
python ./tools/demo.py --net vgg16 --dataset pascal_voc

Also you can visualize training and validation curve.

tensorboard --logdir tensorboard/vgg16/voc_2007_trainval/

pytorch_melm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pytorch_melm's Issues

pytorch 1.0 doesn't work

Good work!
Can you add a readme for pytorch1.0?
There are no layer_util in lib. so I can't bash make.sh.

cross_entropy=0, loss_box=0

I follow your step to train a vgg16 WSOD model, but get cross_entropy=0 and loss_box=0 from the tensorboard.
Furthermore, the detection mAP I got was only 40.25%:

AP for aeroplane = 0.5127
AP for bicycle = 0.5598
AP for bird = 0.3524
AP for boat = 0.2071
AP for bottle = 0.1507
AP for bus = 0.6139
AP for car = 0.6589
AP for cat = 0.3212
AP for chair = 0.1487
AP for cow = 0.4756
AP for diningtable = 0.2165
AP for dog = 0.4216
AP for horse = 0.3439
AP for motorbike = 0.6621
AP for person = 0.0943
AP for pottedplant = 0.1811
AP for sheep = 0.4907
AP for sofa = 0.4062
AP for train = 0.6274
AP for tvmonitor = 0.6053
Mean AP = 0.4025

the classification AP is as follow:

AP for aeroplane = 0.9760
AP for bicycle = 0.9668
AP for bird = 0.9578
AP for boat = 0.9417
AP for bottle = 0.7712
AP for bus = 0.9181
AP for car = 0.9706
AP for cat = 0.9628
AP for chair = 0.7361
AP for cow = 0.9004
AP for diningtable = 0.8104
AP for dog = 0.9484
AP for horse = 0.9580
AP for motorbike = 0.9435
AP for person = 0.9875
AP for pottedplant = 0.7991
AP for sheep = 0.9106
AP for sofa = 0.7590
AP for train = 0.9622
AP for tvmonitor = 0.9072

Did I do something wrong?
Could you give me your advice?
Thanks.

File "./MELM-master/tools/../lib/nets/network.py", line 384, in get_refine_supervision roi_weights[:, 0] = max_box_score[gt_assignment, 0] ValueError: could not broadcast input array from shape (761) into shape (500)

Hi, following the ReadMe, whatever it just doesn't work and I don't know how to fix it. The problem is

  • voc_2007_trainval ss roidb loaded from /data3/CV_WN/MELM-master/data/cache/voc_2007_trainval_selective_search_roidb.pkl
    done
    Preparing training data...
    done
    10022 roidb entries
    Output will be saved to /data3/CV_WN/MELM-master/output/vgg16/voc_2007_trainval/default
    TensorFlow summaries will be saved to /data3/CV_WN/MELM-master/tensorboard/vgg16/voc_2007_trainval/default
    Loaded dataset voc_2007_test for training
    Set proposal method: selective_search
    Preparing training data...
    voc_2007_test ss roidb loaded from /data3/CV_WN/MELM-master/data/cache/voc_2007_test_selective_search_roidb.pkl
    done
    4952 validation roidb entries
    Filtered 0 roidb entries: 10022 -> 10022
    Filtered 0 roidb entries: 4952 -> 4952
    Solving...
    Loading initial model weights from data/imagenet_weights/vgg16.pth
    Loaded.
    Traceback (most recent call last):
    File "./tools/trainval_net.py", line 130, in
    max_iters=args.max_iters)
    File "/data3/CV_WN/MELM-master/tools/../lib/model/train_val.py", line 357, in train_net
    sw.train_model(max_iters)
    File "/data3/CV_WN/MELM-master/tools/../lib/model/train_val.py", line 265, in train_model
    self.net.train_step_with_summary(blobs, self.optimizer)
    File "/data3/CV_WN/MELM-master/tools/../lib/nets/network.py", line 727, in train_step_with_summary
    self.forward(blobs['data'], blobs['image_level_labels'],blobs['im_info'], blobs['gt_boxes'], blobs['ss_boxes'])
    File "/data3/CV_WN/MELM-master/tools/../lib/nets/network.py", line 633, in forward
    self._add_losses() # compute losses
    File "/data3/CV_WN/MELM-master/tools/../lib/nets/network.py", line 250, in _add_losses
    self._image_gt_summaries['image_level_label'])
    File "/data3/CV_WN/MELM-master/tools/../lib/nets/network.py", line 384, in get_refine_supervision
    roi_weights[:, 0] = max_box_score[gt_assignment, 0]
    ValueError: could not broadcast input array from shape (761) into shape (500)

Thanks for all have you done and look forward to your early reply to this problem.

Pretrained Weight gives worse performance

I tried very hard to implement MELM. I reinstalled my CUDA and Pytorch, and solved tons of bugs since this code could never work out on Pytorch 1.0 even I read the tutorial and issues many times. Finally, this github code was successfully executed on Pytorch 0.4.0. However, the pretrained weight gave a really worse performance. Since this repository borrows a lot of codes from another Fast-RCNN repository, it's really hard to understand the code. If someone ever successfully reproduced the mAP 0.45 on VOC data, I'll appreciate it if you would like to share some experience. If the author can make this code easier to understand, I'll appreciate it deeply too.

Reproducibility results

Hello, thank you for your hard work. I would like to ask.What is the mAP of your experimental results?Is it the same as the result in the paper?

RoIRingPoolFunction

why are you using RoIRingPoolFunction instead of RoIAlign?

And also in the lib/nets/network.py
pool5_roi = self._roi_ring_pool_layer(net_conv, rois, 0., 1.0)
pool5_context = self._roi_ring_pool_layer(net_conv, rois, 1.0, 1.8)
pool5_frame = self._roi_ring_pool_layer(net_conv, rois, scale_inner = 1.0 / 1.8, scale_outer = 1.0)

what is different between this three parts of roi ?

Thank you!

Mainly part rewrite the code

Could you provide where did you rewrite the code?
It's hard to track the implement version of mmdetection.
It will help me a lot.
Thank you for your great job!

pool5_roi = self._roi_ring_pool_layer(net_conv, rois, 0., 1.0) pool5_context = self._roi_ring_pool_layer(net_conv, rois, 1.0, 1.8) pool5_frame = self._roi_ring_pool_layer(net_conv, rois, scale_inner = 1.0 / 1.8, scale_outer = 1.0)

pool5_roi = self._roi_ring_pool_layer(net_conv, rois, 0., 1.0)
pool5_context = self._roi_ring_pool_layer(net_conv, rois, 1.0, 1.8)
pool5_frame = self._roi_ring_pool_layer(net_conv, rois, scale_inner = 1.0 / 1.8, scale_outer = 1.0)

these codes are from network.py file row 584 to row 586, i guess they are used to generate three different output by making different scale parameter and using the RoIRingPoolFunction structure. but i have no idea about how does it work and why do you set the parameter to 0, 1.0, 1.8, 1.0/1.8. thank you a lot.

AssertionError: size of input tensor and input format are different

With a series of file index mismatch problems in your code-file "train_faster_rcnn.sh" being debugged.
I successfully got to the following running process:
'"

  • python ./tools/trainval_net.py --weight data/imagenet_weights/vgg16.pth --imdb voc_2007_trainval --imdbval voc_2007_test --iters 100000 --cfg experiments/cfgs/vgg16.yml --net vgg16 --set ANCHOR_SCALES '[8,16,32]' ANCHOR_RATIOS '[0.5,1,2]' TRAIN.STEPSIZE '[50000]'
    '"
    But a new problem just comes up:
    "'
    (omit)
    Solving...
    Loading initial model weights from data/imagenet_weights/vgg16.pth
    Loaded.
    Traceback (most recent call last):
    File "./tools/trainval_net.py", line 130, in
    max_iters=args.max_iters)
    File "/data/pytorch_MELM/tools/../lib/model/train_val.py", line 356, in train_net
    sw.train_model(max_iters)
    File "/data/pytorch_MELM/tools/../lib/model/train_val.py", line 264, in train_model
    self.net.train_step_with_summary(blobs, self.optimizer)
    File "/data/pytorch_MELM/tools/../lib/nets/network.py", line 724, in train_step_with_summary
    summary = self._run_summary_op()
    File "/data/pytorch_MELM/tools/../lib/nets/network.py", line 534, in _run_summary_op
    summaries.append(self._add_gt_image_summary())
    File "/data/pytorch_MELM/tools/../lib/nets/network.py", line 67, in _add_gt_image_summary
    return tb.summary.image('GROUND_TRUTH', image[0].astype('float32').swapaxes(1,0).swapaxes(2,0)/255.0)
    File "/usr/local/lib/python3.6/dist-packages/tensorboardX/summary.py", line 211, in image
    tensor = convert_to_HWC(tensor, dataformats)
    File "/usr/local/lib/python3.6/dist-packages/tensorboardX/utils.py", line 98, in convert_to_HWC
    assert(len(tensor.shape) == len(input_format)), "size of input tensor and input format are different"
    AssertionError: size of input tensor and input format are different
    "'
    I don't know if it is because I changed the context in the "train_faster_rcnn.sh".
    Whatever it just doesn't work and I am a noob don;t know how to fix it.
    Thanks for all have you done and look forward to your early reply to this problem.

pool5_roi' referenced before assignment

if cfg.POOLING_MODE == 'crop':
  pool5 = self._crop_pool_layer(net_conv, rois)
else:
  pool5_roi = self._roi_ring_pool_layer(net_conv, rois, 0., 1.0)
  pool5_context = self._roi_ring_pool_layer(net_conv, rois, 1.0, 1.8)
  pool5_frame = self._roi_ring_pool_layer(net_conv, rois, scale_inner = 1.0 / 1.8, scale_outer = 1.0)

if self._mode == 'TRAIN':
  torch.backends.cudnn.benchmark = True # benchmark because now the input size are fixed
#print('pool5 ', pool5.shape)
fc7_roi = self._head_to_tail(pool5_roi)
fc7_context = self._head_to_tail(pool5_context)
fc7_frame = self._head_to_tail(pool5_frame)

Hello, I didn't see your code here too clearly, pool5_roi has no assignment, how can I use it?

Complie wrong

make.sh: 第 27 行: cd: layer_utils/roi_align/src/cuda: 没有那个文件或目录
Compiling crop_and_resize kernels by nvcc...
gcc: error: crop_and_resize_kernel.cu: 没有那个文件或目录
gcc: warning: ‘-x c++’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
python: can't open file 'build.py': [Errno 2] No such file or directory
make.sh: 第 35 行: cd: nms/src/cuda: 没有那个文件或目录
Compiling nms kernels by nvcc...
gcc: error: nms_kernel.cu: 没有那个文件或目录
gcc: warning: ‘-x c++’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
python: can't open file 'build.py': [Errno 2] No such file or directory

About corloc

Hi~Thanks for your nice code. Could you please provide the code for calculating corloc?

So many issues on PyTorch1.0

I just clone the proj and switch it to pytorch1.0 branch. While compiling nms, lots of error occur, what astonish me is that nms
needs libcudart.so.9.0…… Well, we are on CUDA10.0 bro.

After I've solved all the errors, it came with [1] 122287 segmentation fault (core dumped) python ./tools/demo.py --net vgg16 --dataset pascal_voc…… quit it bro.

What should I change in your 'network' if I want to test the performance of your model on the other dataset?

What should I change in your 'network' if I want to test the performance of your model on the other dataset?
I looked through your network.forward() part, the code is as follow:

def forward(self, image, image_level_label ,im_info, gt_boxes=None, ss_boxes=None, mode='TRAIN'):
#print('forward ss_boxes ', ss_boxes.shape)
self._image_gt_summaries['image'] = image
self._image_gt_summaries['image_level_label'] = image_level_label
self._image_gt_summaries['gt_boxes'] = gt_boxes
self._image_gt_summaries['im_info'] = im_info
self._image_gt_summaries['ss_boxes'] = ss_boxes

self._image = torch.from_numpy(image.transpose([0,3,1,2]).copy()).to(self._device)
self._image_level_label = torch.from_numpy(image_level_label) if image_level_label is not None else None
self._im_info = im_info # No need to change; actually it can be an list
self._gt_boxes = torch.from_numpy(gt_boxes).to(self._device) if gt_boxes is not None else None

self._mode = mode

rois, cls_prob, det_prob, bbox_pred ,cls_det_prob_product ,det_cls_prob = self._predict(ss_boxes)

bbox_pred = bbox_pred[:,:80]

if mode == 'TEST':
  stds = bbox_pred.data.new(cfg.TRAIN.BBOX_NORMALIZE_STDS).repeat(self._num_classes).unsqueeze(0).expand_as(bbox_pred)
  means = bbox_pred.data.new(cfg.TRAIN.BBOX_NORMALIZE_MEANS).repeat(self._num_classes).unsqueeze(0).expand_as(bbox_pred)
  self._predictions["bbox_pred"] = bbox_pred.mul(stds).add(means)
else:
  self._add_losses() # compute losses

And I want to konw the meaning of the sentence "bbox_pred = bbox_pred[:,:80]".
What is the meaning of the number 80 in this sentence?
Does it have relationship with the "num_classes"?
For example, your model is tested on the dataset pascal voc2007, there are 20 object classes(without the 'background' class) in this dataset.
What if my dataset has only 8 object classes without 'background' ?

And expect for this part, is there any module in your code I need to change if I want to test the model on other dataset?

Look forward to your early reply.
Thanks for your patience.

Did you use the ‘gt_bbox’ annotation in your model?

I looked all through your code.
Then I found that during the generation of the 'roidb', you used the 'gt_bbox' annotation saved in the file: pytorch_MELM/data/VOCdevkit2007/VOC2007/Annotations.
But the original model tends be a "Weakly Supervised Detection Model" which just using the image label annotation without the 'gt_bbox' I believe.

comolie wrong

Compiling crop_and_resize kernels by nvcc...
gcc: error: crop_and_resize_kernel.cu: 没有那个文件或目录
gcc: warning: ‘-x c++’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
python: can't open file 'build.py': [Errno 2] No such file or directory
make.sh: 第 35 行: cd: nms/src/cuda: 没有那个文件或目录
Compiling nms kernels by nvcc...
gcc: error: nms_kernel.cu: 没有那个文件或目录
gcc: warning: ‘-x c++’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
python: can't open file 'build.py': [Errno 2] No such file or directory

How to get the loss in local min-entropy ?

I saw your code in lib/net/network.py self._add_losses:

roi_weights = torch.tensor(roi_weights).cuda()
roi_labels = torch.tensor(roi_labels, dtype=roi_weights.dtype).cuda()
#roi_labels = torch.mul(roi_labels, roi_weights)
refine_loss_1 = - torch.sum(torch.mul(roi_labels, torch.log(refine_prob_1[keep_inds]))) / roi_labels.shape[0]

But i saw the paper said the loss is that:
(8(P$J2B`F5Z8(ED%T{{5

Is there something wrong?

Are the refine_loss_1 and refine_loss_2 defined according to Accumulated Recurrent Learning?

I read Fang Wan's paper and your code.And in your code:

loss = cls_det_loss / 20 + refine_loss_1*0.1 + refine_loss_2*0.1

I think cls_det_loss / 20 is a global entropy models, and the two refine_loss are local entropy models, 0.1 is the regularization weight. And refine_loss_1 and refine_loss_2 are in different object localization branches,which is according the "Accumulated Recurrent Learning" in paper.Are these right?

By the way, I also want to know the bbox_pred whether used in Train mode.I see you explain the mean of "bbox_pred = bbox_pred[:,:80]" in #9. But I'm still a little confused, because I print the bbox_pred when training , and I find the values are all 0. So the bbox_pred is only used in Test mode?

Look forward your reply,thank you.

myself datasets applied on the MELM

I applied my own datasets on the MELM, after finishing training, when I test, showed error.

FileNotFoundError: [Errno 2] No such file or directory: '/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/MELM/data/VOCdevkit2007/results/VOC2007/Main/comp4_954d8c91-a804-4b43-af6b-edfb7a2fd43a_det_test_normal bolt.txt'

The CorLoc of this code

hello, I am sorry to bother you again, When I used this code, I found I can't test the CorLoc metric, could you tell me about this? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.