jiwoon-ahn / irn Goto Github PK

View Code? Open in Web Editor NEW

519.0 19.0 100.0 1.29 MB

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

License: MIT License

Python 100.00%

cvpr2019 pytorch deep-learning

irn's People

Contributors

Stargazers

Watchers

Forkers

lphxx6222712 shh233 zhouleisjtu peterzhousz djiajunustc dlyzhou lzd0825 ceniii guoleisun min-sheng tauhidstanford lzb863 hyzcn staceycy dlwbm123 eileen2014 tjddus9597 nanwangac abexit anastaria udonda waynemu95 jaringau tor4z guangluye shuai-xie lizhengtust chrockey dami23 pinglmlcv halimiqi zhanghongyan6553 xiaaoo-zz issamlaradji mikigom ai-chen flyfoxs suhaspillai nicolopinci yanglunwen aymaneleya chy19930328 jakecastelli gcv9htd dbofseuofhust mhandzq lxmwust jnyborg stevenxmy kenneth-x willforcv pjh4993 cv-ip joniea biqiwhu scott870430 yuntai sung-hoiank jianqingzheng platygator pichenze pritesh-aidash jason-george janalexanderpersonal yuvaramsingh94 johnnylu305 baiyun715 sodiqadewole ryobi-soemoe luweishuang ml-edu suhyunyoon wikiy223 cenkbircanoglu drenego wxphb swartben rejoicesyc linqingkuang mingxiangl lerolynn lianchengmingjue jooern81 lyan-ing eacient shwatanap alexto wow056 awei-97 mt-cly z556lab 4three2one codwest cz0316 zhang373 kyumin0411 liyiheng123 gwakcy0 csxuwu

irn's Issues

How to calculate loss?

Hi, I have some questions on how to calculate loss. As all labels are generated according to centres( figure 3 in paper), how to determine These centres in Images?

For comparison with AffinityNet implementation details in your paper

Hi, in your paper, you have such part:
Comparison to AffinityNet: For a fair comparison, we
modified AffinityNet [1] by replacing its backbone with
ResNet50 as in our IRNet. Then we compare IRNet with
the modified AffinityNet in terms of the accuracy of
pseudo segmentation labels (Table 2) and performance of
DeepLab [5] trained with these pseudo labels (Table 4).

Could you provide more implementation details for this part using this repro?? Looking forward to your reply~

How many epochs for IRN ?

I have train CAM for 5 epochs and get 48.3 miou,
but map and miou is low for sem_seg and ins_seg, when I train it 3 epochs.

Every thing not changed except I modify train_irn.py in line40,
model = torch.nn.DataParallel(model).cuda()

Help For the CAMs

I'am so sorry to bother you @jiwoon-ahn , I have a little trouble in the code you shared here.
At first step, generating class attention maps , I convert the *.npy file ,and find the picture like the fig.1.
What tasks should I do to get the results in your paper in 2018cvpr, like fig.2.
And the classification network only train about 5 epoches, I don't know whether it is enough.(I really don't know the reason, sincerely for help)
I'm looking forward to your letter. Thanks a lot.

How to get the initial displacement field?

Thanks for your attention! I am confused about the initial displacement field.

In the Figure 5 of your paper, the "center" image is an initial displacement field. What is meaning of different colors about it? And how we get it? Does it has any relationship with the CAM of the corresponding image?

Looking forward to your reply.

comparation with Affinity

Hi,
In train_irn step, I remove the dispalce loss part and remains only boundary loss.
I notice boundary loss is similar to the AffinityNet which you published in CVPR18 even the detail has some differents. But the semantic mIoU only 37+% which is even worse than CAM result(50%)，comared to Affinity result(59%)。
So I confuse the reason for such gap in same idea, similar loss. Have you some suggests? THX

How to apply CRF postprocessing at final stage, after making sem_seg_labels?

Pre-trained models

Hi, first of all, thanks for the amazing work

I was wandering if you intend to provide pretrained models, mainly cam reset and Irnet

thanks

Log files for training

Hi,
Can you share the log files for your training? I am unable to reproduce the performance of IRN reported in the paper using the default hyper-parameters (also mentioned here [Link]).

For instance segmentation, instead of 37.7 [email protected], I am getting the following:

step.eval_ins_seg: Wed Aug 14 09:55:44 2019
0.5iou: {'ap': array([0.0402722 , 0.        , 0.04831983, 0.02532846, 0.01264213,
       0.21497569, 0.13079764, 0.06767052, 0.00229753, 0.08129419,
       0.01570647, 0.05994737, 0.03092302, 0.26370536, 0.02019956,
       0.02099569, 0.0646912 , 0.16558015, 0.23535844, 0.1566734 ]), 'map': 0.08286894241843508}

and for semantic segmentation, instead of 66.5 mIOU, I am getting:

step.eval_sem_seg: Wed Aug 14 10:15:06 2019
0.12114407058527121 0.08625727491374735
0.2459830480445712 0.30624211370783205
{'iou': array([0.79259865, 0.43975817, 0.27018399, 0.42519734, 0.34189571,
       0.43639392, 0.57453956, 0.48851971, 0.41510347, 0.26892431,
       0.54274295, 0.37697739, 0.40495999, 0.47331797, 0.5605337 ,
       0.51401678, 0.39511615, 0.63538235, 0.40350322, 0.50775112,
       0.48067896]), 'miou': 0.4641950199739483}

Thanks.

about the function of “Instance Map”

I think it's OK to use "CAM" and "Pairwise Affinities" capturing instance segmentation masks.
Because the "Instance Map" purpose is to distinguish instances, and "Pairwise Affinities" also has this function.
And only using these two modules can make the algorithm simple. Can you tell me why "Instance Map" can't be ignored? Thank you for your reply！

As shown in the figure below.

CAM_to_irlabel and train_irn

Hello, I would like to ask how to set the parameters for cam_to_irlabel, train_irn, and make_seg_labels. After using these methods, the performance improvement has been minimal. I have tried many parameters, but found that the performance does not change much.

deeplab-v2 and CRF

Hi Jiwoon,
the original deeplab-v2 with VGG16 and ResNet-101 have somewhat different architectures (e.g. design of ASPP module). I was wondering, in your implementation with ResNet-50, did you use ResNet-101 as the reference, or the VGG-based one? Also, from A.2 it seems that you used CRF to compute the upper bound. Did you also use CRF after fully-supervised training on the pseudo labels?
Thanks in advance,
Nikita

get AssertionError when eval_ins_seg.py

Traceback (most recent call last):
File "run_sample.py", line 119, in
step.eval_ins_seg.run(args)
File "/home/maskrcnn-benchmark/irn/step/eval_ins_seg.py", line 10, in run
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/irn/step/eval_ins_seg.py", line 10, in
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/chainer_experimental/datasets/sliceable/getter_dataset.py", line 89, in get_example_by_keys
cache[getter_index] = self._gettersgetter_index
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_instance_segmentation_dataset.py", line 66, in _get_annotations
label_img, inst_img)
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_utils.py", line 55,in image_wise_to_instance_wise
assert lbl != -1
AssertionError

about the paper

Congratulation! may I ask when will you upload the paper on arxiv? I've been keeping track of weakly supervised learning now

Training on own dataset

thanks for the great work, I wish to train the network on Berkeley deep drive dataset where we have 2d bounding boxes in json files.
what would be the steps? is there a data converter available? I am trying to get the dataset in pascal voc 2012 format

thanks

About train_aug.txt

Congratulations! This is really good work!

As I was running your code, I find that train_aug.txt file was used to train CAM. I wonder where is this file comes from? And why not directly use VOC2012 trainval set?

Thanks a lot!

Performance is poor after re-train a Mask RCNN

Hi,
I took the instance-level pseudo labels generated by running `make_ins_seg_labels.py' and kept the instance mask whose score is higher than 0.
Then, I transfered these labels from *.npy to cocostyle json annotation and trained the standard Mask R-CNN with ResNet-50-FPN.
However, the performance I've get is:

Specifically, box mAP of AP50 is 45.8, segmentation mAP of AP50 is 22.6.
I noticed that the instance number in pseudo label is about 2/3 of the gt instance number for `train_aug' set.
Did I miss something to reimplement the performance of Mask R-CNN with pseudo label?

Thanks a lot!

Inter-pixel relation mining. Point neighborhood.

Hi! Thanks for the great work!
Why You take only a half of circle in get_search_paths_dst method of PathIndex class:

        for x in range(1, max_radius):
            search_dirs.append((0, x))

        for y in range(1, max_radius):
            for x in range(-max_radius + 1, max_radius):
                if x * x + y * y < max_radius ** 2:
                    search_dirs.append((y, x))

Maybe I miss something? Thanks for the explanation! :)

using own dataset

I am trying to adjust the code to my own dataset. However, I am really struggling since I am not a pro at python.

How can I generate cls_labels.npy for a different dataset? The script make_cls_labels.py does not work. Plus, it makes use of .xml files. Is there an easier way to generate a dictionary with image level labels?

cls_labels_dict = np.load('voc12/cls_labels.npy', allow_pickle=True).item()
print(cls_labels_dict) # 2011003271: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)}

Also, my images don't share the same naming conventions as VOC12, so this part of the code creates a ton of problems:
def decode_int_filename(int_filename):
s = str(int(int_filename))

int32 error

Change it to the following
`def load_img_name_list(dataset_path):

img_name_list = np.loadtxt(dataset_path, dtype=str)
img_name_list = np.array(img_name_list, dtype=float)
return img_name_list`

Training is so slow after first epoch

Hello,

We were using a custom dataset for this repo. Training CAM is too slow. After the first epoch, it shows an estimated finish time of 2.5 days later.

Our training dataset has 8960 images. The batch size is 4.

Have you ever faced this problem? Thank you.

Time cost of generating one pseudo instance mask

Hi,

After testing the IRNet, I found it takes about 3 seconds to generate one pseudo instance mask on my machine.
I searched around and found no one mentioned the efficiency here, or even in the WSIS community.
Or maybe I missed some paper/post.

I understand for the final goal the inference time matters, not the time of generating one pseudo instance mask.
But is there any way that I can make it faster? Why people don't care about this?

Thanks

When will the code be released?

Hi Jiwoon Ahn, congratulations! I'm really interested in your code and can't wait to try it out. So when are you going to release the code? Thank you! The paper was great!

run_samples.py ValueError

load_img_name_list function does not work well on the voc2012 dataset.

How do you get the result image?

Thanks to open your implementation!

I want to know how to save the visualization image like https://github.com/jiwoon-ahn/irn/blob/master/outline.jpg

thanks.

About the visualzation of edge map

Could you release the code of visualization in the paper?

COCO training code

"Training code for MS-COCO" is on the TODOs.
Any plans to release this code soonish so as to include in ECCV2020 experiments ?

RuntimeError: CUDA out of memory.

I don't know why it's out OOM , Windows platform

How to process test data?

Hi,
For train/val data, CAMs firstly filter by GT classification labels, then get final segmentation by argmax after norming remained CAMs.
But How to handle with test data? Should I generate test classification label to do similar filter? or multiply cls probabilty with corresponding CAM?

[ OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).

In step.make_cam, the following error occurred:

[ OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.

Has anyone else seen the same issue?

Please kindly advice on how to fix this one.

Thanks a lot

BTW, I was running sampy.py on NVDIA-SMI 410.79, Driver Version: 410.79, CUDA version 10.0

About 'x = x[0] + x[1].flip(-1)' in resnet50_cam.py.

What is the meaning or function of 'x = x[0] + x[1].flip(-1)' ?

About the L_fg^D loss

Thank you for the very good work!
I have a question about the L_fg^D loss, in section 4.3, why the difference of (i, j) in D denotes D(x_i)-D(x_j) rather than D(x_j)-D(x_i)?
I'm very confused about this point, looking forward to your reply.
Thank you very much

I have ran your code, but the results are not good as yours.

I have ran your code, but the results are not good as yours. So do you have some special skills to run the code? Thanks.

Instance segmentation + training dataset (0.5AP): mine 35.7, yours 37.7;
Semantic segmentation + training dataset (miou): mine 66.0, yours 66.5;

When training Deeplabv2, did you use any pre-trained model?

When you were training Deeplabv2 using the pseudo-labels produced by your method, had the Deeplabv2 been pretrained before hand or just raw? Thanks.

Do you mind sharing deeplabv2 training code directly to us? Thanks a lot!

I tried training deeplabv2 with the pseudo labels but got significantly lower performance than reported number... It would be really helpful if you're willing to make the deeplabv2 training code public! Would you do that? Thanks a lot!

About every time the results are unstable

在run_sample.py中，加入seed，具体代码如下：
import argparse
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
from misc import pyutils
import torch
import numpy as np
import random

def setup_seed(seed):
print("random seed is set to", seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True

if name == 'main':

parser = argparse.ArgumentParser()

# Environment
parser.add_argument("--num_workers", default=os.cpu_count()//2, type=int)
parser.add_argument("--voc12_root", default="/disk4/xxx/2022-02-08-wang-peak/irn-master/VOC2012", type=str,
                    help="Path to VOC 2012 Devkit, must contain ./JPEGImages as subdirectory.")

# Dataset
parser.add_argument("--train_list", default="voc12/train_aug.txt", type=str)
parser.add_argument("--val_list", default="voc12/val.txt", type=str)
parser.add_argument("--infer_list", default="voc12/train.txt", type=str,
                    help="voc12/train_aug.txt to train a fully supervised model, "
                         "voc12/train.txt or voc12/val.txt to quickly check the quality of the labels.")
parser.add_argument("--chainer_eval_set", default="train", type=str)
parser.add_argument("--seed", default=15, type=int)

# Class Activation Map
parser.add_argument("--cam_network", default="net.resnet50_cam", type=str)
parser.add_argument("--cam_crop_size", default=512, type=int)
parser.add_argument("--cam_batch_size", default=16, type=int)
parser.add_argument("--cam_num_epoches", default=5, type=int)
parser.add_argument("--cam_learning_rate", default=0.1, type=float)
parser.add_argument("--cam_weight_decay", default=1e-4, type=float)
parser.add_argument("--cam_eval_thres", default=0.15, type=float)
parser.add_argument("--cam_scales", default=(1.0, 0.5, 1.5, 2.0),
                    help="Multi-scale inferences")

# Mining Inter-pixel Relations
parser.add_argument("--conf_fg_thres", default=0.30, type=float)
parser.add_argument("--conf_bg_thres", default=0.05, type=float)

# Inter-pixel Relation Network (IRNet)
parser.add_argument("--irn_network", default="net.resnet50_irn", type=str)
parser.add_argument("--irn_crop_size", default=512, type=int)
parser.add_argument("--irn_batch_size", default=32, type=int)
parser.add_argument("--irn_num_epoches", default=3, type=int)
parser.add_argument("--irn_learning_rate", default=0.1, type=float)
parser.add_argument("--irn_weight_decay", default=1e-4, type=float)

# Random Walk Params
parser.add_argument("--beta", default=10)
parser.add_argument("--exp_times", default=8,
                    help="Hyper-parameter that controls the number of random walk iterations,"
                         "The random walk is performed 2^{exp_times}.")
parser.add_argument("--ins_seg_bg_thres", default=0.25)
parser.add_argument("--sem_seg_bg_thres", default=0.25)

# Output Path
parser.add_argument("--log_name", default="sample_train_eval", type=str)
parser.add_argument("--cam_weights_name", default="sess/res50_cam.pth", type=str)
parser.add_argument("--irn_weights_name", default="sess/res50_irn.pth", type=str)
parser.add_argument("--cam_out_dir", default="result/cam", type=str)
parser.add_argument("--ir_label_out_dir", default="result/ir_label", type=str)
parser.add_argument("--sem_seg_out_dir", default="result/sem_seg", type=str)
parser.add_argument("--ins_seg_out_dir", default="result/ins_seg", type=str)

# Step
parser.add_argument("--train_cam_pass", default=True)
parser.add_argument("--make_cam_pass", default=True)
parser.add_argument("--eval_cam_pass", default=True)
parser.add_argument("--cam_to_ir_label_pass", default=False)
parser.add_argument("--train_irn_pass", default=False)
parser.add_argument("--make_ins_seg_pass", default=False)
parser.add_argument("--eval_ins_seg_pass", default=False)
parser.add_argument("--make_sem_seg_pass", default=False)
parser.add_argument("--eval_sem_seg_pass", default=False)

args = parser.parse_args()
setup_seed(args.seed)
os.makedirs("sess", exist_ok=True)
os.makedirs(args.cam_out_dir, exist_ok=True)
os.makedirs(args.ir_label_out_dir, exist_ok=True)
os.makedirs(args.sem_seg_out_dir, exist_ok=True)
os.makedirs(args.ins_seg_out_dir, exist_ok=True)

pyutils.Logger(args.log_name + '.log')
print(vars(args))

if args.train_cam_pass is True:
    import step.train_cam

    timer = pyutils.Timer('step.train_cam:')
    step.train_cam.run(args)

if args.make_cam_pass is True:
    import step.make_cam

    timer = pyutils.Timer('step.make_cam:')
    step.make_cam.run(args)

if args.eval_cam_pass is True:
    import step.eval_cam

    timer = pyutils.Timer('step.eval_cam:')
    step.eval_cam.run(args)

if args.cam_to_ir_label_pass is True:
    import step.cam_to_ir_label

    timer = pyutils.Timer('step.cam_to_ir_label:')
    step.cam_to_ir_label.run(args)

if args.train_irn_pass is True:
    import step.train_irn

    timer = pyutils.Timer('step.train_irn:')
    step.train_irn.run(args)

if args.make_ins_seg_pass is True:
    import step.make_ins_seg_labels

    timer = pyutils.Timer('step.make_ins_seg_labels:')
    step.make_ins_seg_labels.run(args)

if args.eval_ins_seg_pass is True:
    import step.eval_ins_seg

    timer = pyutils.Timer('step.eval_ins_seg:')
    step.eval_ins_seg.run(args)

if args.make_sem_seg_pass is True:
    import step.make_sem_seg_labels

    timer = pyutils.Timer('step.make_sem_seg_labels:')
    step.make_sem_seg_labels.run(args)

if args.eval_sem_seg_pass is True:
    import step.eval_sem_seg

    timer = pyutils.Timer('step.eval_sem_seg:')
    step.eval_sem_seg.run(args)

Asking about the Mask-Rcnn training strategy

Hi, Jiwoon Ahn
After transforming the pseudo label to the COCO-style annotations, I trained the Mask R-CNN with ResNet-50-FPN .

But the performance i got is slightly lower than the report ，mAP50 is 45.0.

I 'd like to ask you about the mask-rcnn training strategy, what kind data augmentation you adopt.

Thank you !

Why does displacement work for instances?

How does the displacement branch optimize for more than a single instance? The number of instances is missing from the groundtruth.

sqrt in CAM

jiwoon-ahn/psa#30

Would you please explain the reason why sqrt is used in generating CAM?

cam_to_ir_label

@jiwoon-ahn What does the cam_to_ir_label.py do exactly? Does it create a binary mask?

How to adjust the value of 'conf_fg_thres' 、’conf_bg_thres‘ 、’beta‘ and 'exp_times'

In my model, the quality of CAM will achive the best when the value of 'cam_eval_thres' is set to 0.35.So i want to know how to set the value of other paremeters ? Looking forward to your reply，thanks！

about the search indices

` for x in range(1, max_radius):
search_dirs.append((0, x))

    for y in range(1, max_radius): 
        for x in range(-max_radius + 1, max_radius):
            if x * x + y * y < max_radius ** 2:
                search_dirs.append((y, x))`

Thanks for sharing the work. I think the search_dirs seems to be a half circle instead of a circle. Not sure whether i understand it correctly.
Look forward to your reply.

Would you share the weights of IRNet for generating the pseudo label?

Hi!

Thanks for this amazing work for weakly-supervised instance segmentation. I am wondering that you can share the weights file (.pth) for IRNet model since I get poor results for generating a boundary map! Thanks so much!

path index

Hi, Jiwoon Ahn,

I wonder to know what is the path index in the code? which part in the paper could I refer to it?

Additionally, when will the training detail be released? Looking forward to following your work.

Thanks.

On the number of convolutional filters in IRNet

I noticed that the convolutinal filter numbers in IRNet (either the class boundary part or the displacement part) is different from the settings in your original paper. So, may I ask, generally speaking, which setting is better in your former experiments? Best wishes.

Performance Gap and Hyper-parameter Settings

Hi Jiwoon Ahn,
Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?

In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following,

model	task	my exp.	reported
CAM	semantic segmentation	48.1	48.3
IRN	semantic segmentation	64.9	66.5
IRN	instance segmentation	32.4	37.7

The CAM models have similar performace, but there are performance gaps between IRN models in both task.

There may be two possible reasons for the gap.

I notice the hyper-parameter settings in the paper and the code are not exactly the same. The exp_times is set to 8 in the code, while in the paper it is set to 256 (which also does not work in my case).
Anthor possible problem is that multiscale testing is only used in CAM, but not in IRN.

Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!

Tuning GN using inference data?

Dear Jiwoon, in the file 'train_irn.py', I noticed that GN was tuning using the inference data in the latest commit, location. Is this right in the weakly supervised instance segmentation setting? I think the validation set should not be touched except for evaluation, rather than training/tuning parameters. And I'm also curious what would be affected by this? Will the mAP be improved? Thanks