jiwoon-ahn / irn Goto Github PK
View Code? Open in Web Editor NEWWeakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)
License: MIT License
Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)
License: MIT License
Hi, I have some questions on how to calculate loss. As all labels are generated according to centres( figure 3 in paper), how to determine These centres in Images?
Hi, in your paper, you have such part:
Comparison to AffinityNet: For a fair comparison, we
modified AffinityNet [1] by replacing its backbone with
ResNet50 as in our IRNet. Then we compare IRNet with
the modified AffinityNet in terms of the accuracy of
pseudo segmentation labels (Table 2) and performance of
DeepLab [5] trained with these pseudo labels (Table 4).
Could you provide more implementation details for this part using this repro?? Looking forward to your reply~
I have train CAM for 5 epochs and get 48.3 miou,
but map and miou is low for sem_seg and ins_seg, when I train it 3 epochs.
Every thing not changed except I modify train_irn.py in line40,
model = torch.nn.DataParallel(model).cuda()
I'am so sorry to bother you @jiwoon-ahn , I have a little trouble in the code you shared here.
At first step, generating class attention maps , I convert the *.npy file ,and find the picture like the fig.1.
What tasks should I do to get the results in your paper in 2018cvpr, like fig.2.
And the classification network only train about 5 epoches, I don't know whether it is enough.(I really don't know the reason, sincerely for help)
I'm looking forward to your letter. Thanks a lot.
Thanks for your attention! I am confused about the initial displacement field.
In the Figure 5 of your paper, the "center" image is an initial displacement field. What is meaning of different colors about it? And how we get it? Does it has any relationship with the CAM of the corresponding image?
Looking forward to your reply.
Hi,
In train_irn step, I remove the dispalce loss part and remains only boundary loss.
I notice boundary loss is similar to the AffinityNet which you published in CVPR18 even the detail has some differents. But the semantic mIoU only 37+% which is even worse than CAM result(50%),comared to Affinity result(59%)。
So I confuse the reason for such gap in same idea, similar loss. Have you some suggests? THX
Hi, first of all, thanks for the amazing work
I was wandering if you intend to provide pretrained models, mainly cam reset and Irnet
thanks
Hi,
Can you share the log files for your training? I am unable to reproduce the performance of IRN reported in the paper using the default hyper-parameters (also mentioned here [Link]).
For instance segmentation, instead of 37.7
[email protected], I am getting the following:
step.eval_ins_seg: Wed Aug 14 09:55:44 2019
0.5iou: {'ap': array([0.0402722 , 0. , 0.04831983, 0.02532846, 0.01264213,
0.21497569, 0.13079764, 0.06767052, 0.00229753, 0.08129419,
0.01570647, 0.05994737, 0.03092302, 0.26370536, 0.02019956,
0.02099569, 0.0646912 , 0.16558015, 0.23535844, 0.1566734 ]), 'map': 0.08286894241843508}
and for semantic segmentation, instead of 66.5
mIOU, I am getting:
step.eval_sem_seg: Wed Aug 14 10:15:06 2019
0.12114407058527121 0.08625727491374735
0.2459830480445712 0.30624211370783205
{'iou': array([0.79259865, 0.43975817, 0.27018399, 0.42519734, 0.34189571,
0.43639392, 0.57453956, 0.48851971, 0.41510347, 0.26892431,
0.54274295, 0.37697739, 0.40495999, 0.47331797, 0.5605337 ,
0.51401678, 0.39511615, 0.63538235, 0.40350322, 0.50775112,
0.48067896]), 'miou': 0.4641950199739483}
Thanks.
I think it's OK to use "CAM" and "Pairwise Affinities" capturing instance segmentation masks.
Because the "Instance Map" purpose is to distinguish instances, and "Pairwise Affinities" also has this function.
And only using these two modules can make the algorithm simple. Can you tell me why "Instance Map" can't be ignored? Thank you for your reply!
Hello, I would like to ask how to set the parameters for cam_to_irlabel, train_irn, and make_seg_labels. After using these methods, the performance improvement has been minimal. I have tried many parameters, but found that the performance does not change much.
Hi Jiwoon,
the original deeplab-v2 with VGG16 and ResNet-101 have somewhat different architectures (e.g. design of ASPP module). I was wondering, in your implementation with ResNet-50, did you use ResNet-101 as the reference, or the VGG-based one? Also, from A.2 it seems that you used CRF to compute the upper bound. Did you also use CRF after fully-supervised training on the pseudo labels?
Thanks in advance,
Nikita
Traceback (most recent call last):
File "run_sample.py", line 119, in
step.eval_ins_seg.run(args)
File "/home/maskrcnn-benchmark/irn/step/eval_ins_seg.py", line 10, in run
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/irn/step/eval_ins_seg.py", line 10, in
gt_masks = [dataset.get_example_by_keys(i, (1,))[0] for i in range(len(dataset))]
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/chainer_experimental/datasets/sliceable/getter_dataset.py", line 89, in get_example_by_keys
cache[getter_index] = self._gettersgetter_index
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_instance_segmentation_dataset.py", line 66, in _get_annotations
label_img, inst_img)
File "/home/anaconda3/envs/deeplab/lib/python3.6/site-packages/chainercv/datasets/voc/voc_utils.py", line 55,in image_wise_to_instance_wise
assert lbl != -1
AssertionError
Congratulation! may I ask when will you upload the paper on arxiv? I've been keeping track of weakly supervised learning now
thanks for the great work, I wish to train the network on Berkeley deep drive dataset where we have 2d bounding boxes in json files.
what would be the steps? is there a data converter available? I am trying to get the dataset in pascal voc 2012 format
thanks
Congratulations! This is really good work!
As I was running your code, I find that train_aug.txt file was used to train CAM. I wonder where is this file comes from? And why not directly use VOC2012 trainval set?
Thanks a lot!
Hi,
I took the instance-level pseudo labels generated by running `make_ins_seg_labels.py' and kept the instance mask whose score is higher than 0.
Then, I transfered these labels from *.npy to cocostyle json annotation and trained the standard Mask R-CNN with ResNet-50-FPN.
However, the performance I've get is:
Specifically, box mAP of AP50 is 45.8, segmentation mAP of AP50 is 22.6.
I noticed that the instance number in pseudo label is about 2/3 of the gt instance number for `train_aug' set.
Did I miss something to reimplement the performance of Mask R-CNN with pseudo label?
Thanks a lot!
Hi! Thanks for the great work!
Why You take only a half of circle in get_search_paths_dst
method of PathIndex
class:
for x in range(1, max_radius):
search_dirs.append((0, x))
for y in range(1, max_radius):
for x in range(-max_radius + 1, max_radius):
if x * x + y * y < max_radius ** 2:
search_dirs.append((y, x))
Maybe I miss something? Thanks for the explanation! :)
I am trying to adjust the code to my own dataset. However, I am really struggling since I am not a pro at python.
How can I generate cls_labels.npy for a different dataset? The script make_cls_labels.py does not work. Plus, it makes use of .xml files. Is there an easier way to generate a dictionary with image level labels?
cls_labels_dict = np.load('voc12/cls_labels.npy', allow_pickle=True).item()
print(cls_labels_dict) # 2011003271: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)}
Also, my images don't share the same naming conventions as VOC12, so this part of the code creates a ton of problems:
def decode_int_filename(int_filename):
s = str(int(int_filename))
Change it to the following
`def load_img_name_list(dataset_path):
img_name_list = np.loadtxt(dataset_path, dtype=str)
img_name_list = np.array(img_name_list, dtype=float)
return img_name_list`
Hello,
We were using a custom dataset for this repo. Training CAM is too slow. After the first epoch, it shows an estimated finish time of 2.5 days later.
Our training dataset has 8960 images. The batch size is 4.
Have you ever faced this problem? Thank you.
Hi,
After testing the IRNet, I found it takes about 3 seconds to generate one pseudo instance mask on my machine.
I searched around and found no one mentioned the efficiency here, or even in the WSIS community.
Or maybe I missed some paper/post.
I understand for the final goal the inference time matters, not the time of generating one pseudo instance mask.
But is there any way that I can make it faster? Why people don't care about this?
Thanks
Hi Jiwoon Ahn, congratulations! I'm really interested in your code and can't wait to try it out. So when are you going to release the code? Thank you! The paper was great!
Thanks to open your implementation!
I want to know how to save the visualization image like https://github.com/jiwoon-ahn/irn/blob/master/outline.jpg
thanks.
Could you release the code of visualization in the paper?
"Training code for MS-COCO" is on the TODOs.
Any plans to release this code soonish so as to include in ECCV2020 experiments ?
I don't know why it's out OOM , Windows platform
Hi,
For train/val data, CAMs firstly filter by GT classification labels, then get final segmentation by argmax after norming remained CAMs.
But How to handle with test data? Should I generate test classification label to do similar filter? or multiply cls probabilty with corresponding CAM?
In step.make_cam, the following error occurred:
[ OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Has anyone else seen the same issue?
Please kindly advice on how to fix this one.
Thanks a lot
BTW, I was running sampy.py on NVDIA-SMI 410.79, Driver Version: 410.79, CUDA version 10.0
What is the meaning or function of 'x = x[0] + x[1].flip(-1)' ?
Thank you for the very good work!
I have a question about the L_fg^D loss, in section 4.3, why the difference of (i, j) in D denotes D(x_i)-D(x_j) rather than D(x_j)-D(x_i)?
I'm very confused about this point, looking forward to your reply.
Thank you very much
I have ran your code, but the results are not good as yours. So do you have some special skills to run the code? Thanks.
Instance segmentation + training dataset (0.5AP): mine 35.7, yours 37.7;
Semantic segmentation + training dataset (miou): mine 66.0, yours 66.5;
When you were training Deeplabv2 using the pseudo-labels produced by your method, had the Deeplabv2 been pretrained before hand or just raw? Thanks.
I tried training deeplabv2 with the pseudo labels but got significantly lower performance than reported number... It would be really helpful if you're willing to make the deeplabv2 training code public! Would you do that? Thanks a lot!
在run_sample.py中,加入seed,具体代码如下:
import argparse
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
from misc import pyutils
import torch
import numpy as np
import random
def setup_seed(seed):
print("random seed is set to", seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
if name == 'main':
parser = argparse.ArgumentParser()
# Environment
parser.add_argument("--num_workers", default=os.cpu_count()//2, type=int)
parser.add_argument("--voc12_root", default="/disk4/xxx/2022-02-08-wang-peak/irn-master/VOC2012", type=str,
help="Path to VOC 2012 Devkit, must contain ./JPEGImages as subdirectory.")
# Dataset
parser.add_argument("--train_list", default="voc12/train_aug.txt", type=str)
parser.add_argument("--val_list", default="voc12/val.txt", type=str)
parser.add_argument("--infer_list", default="voc12/train.txt", type=str,
help="voc12/train_aug.txt to train a fully supervised model, "
"voc12/train.txt or voc12/val.txt to quickly check the quality of the labels.")
parser.add_argument("--chainer_eval_set", default="train", type=str)
parser.add_argument("--seed", default=15, type=int)
# Class Activation Map
parser.add_argument("--cam_network", default="net.resnet50_cam", type=str)
parser.add_argument("--cam_crop_size", default=512, type=int)
parser.add_argument("--cam_batch_size", default=16, type=int)
parser.add_argument("--cam_num_epoches", default=5, type=int)
parser.add_argument("--cam_learning_rate", default=0.1, type=float)
parser.add_argument("--cam_weight_decay", default=1e-4, type=float)
parser.add_argument("--cam_eval_thres", default=0.15, type=float)
parser.add_argument("--cam_scales", default=(1.0, 0.5, 1.5, 2.0),
help="Multi-scale inferences")
# Mining Inter-pixel Relations
parser.add_argument("--conf_fg_thres", default=0.30, type=float)
parser.add_argument("--conf_bg_thres", default=0.05, type=float)
# Inter-pixel Relation Network (IRNet)
parser.add_argument("--irn_network", default="net.resnet50_irn", type=str)
parser.add_argument("--irn_crop_size", default=512, type=int)
parser.add_argument("--irn_batch_size", default=32, type=int)
parser.add_argument("--irn_num_epoches", default=3, type=int)
parser.add_argument("--irn_learning_rate", default=0.1, type=float)
parser.add_argument("--irn_weight_decay", default=1e-4, type=float)
# Random Walk Params
parser.add_argument("--beta", default=10)
parser.add_argument("--exp_times", default=8,
help="Hyper-parameter that controls the number of random walk iterations,"
"The random walk is performed 2^{exp_times}.")
parser.add_argument("--ins_seg_bg_thres", default=0.25)
parser.add_argument("--sem_seg_bg_thres", default=0.25)
# Output Path
parser.add_argument("--log_name", default="sample_train_eval", type=str)
parser.add_argument("--cam_weights_name", default="sess/res50_cam.pth", type=str)
parser.add_argument("--irn_weights_name", default="sess/res50_irn.pth", type=str)
parser.add_argument("--cam_out_dir", default="result/cam", type=str)
parser.add_argument("--ir_label_out_dir", default="result/ir_label", type=str)
parser.add_argument("--sem_seg_out_dir", default="result/sem_seg", type=str)
parser.add_argument("--ins_seg_out_dir", default="result/ins_seg", type=str)
# Step
parser.add_argument("--train_cam_pass", default=True)
parser.add_argument("--make_cam_pass", default=True)
parser.add_argument("--eval_cam_pass", default=True)
parser.add_argument("--cam_to_ir_label_pass", default=False)
parser.add_argument("--train_irn_pass", default=False)
parser.add_argument("--make_ins_seg_pass", default=False)
parser.add_argument("--eval_ins_seg_pass", default=False)
parser.add_argument("--make_sem_seg_pass", default=False)
parser.add_argument("--eval_sem_seg_pass", default=False)
args = parser.parse_args()
setup_seed(args.seed)
os.makedirs("sess", exist_ok=True)
os.makedirs(args.cam_out_dir, exist_ok=True)
os.makedirs(args.ir_label_out_dir, exist_ok=True)
os.makedirs(args.sem_seg_out_dir, exist_ok=True)
os.makedirs(args.ins_seg_out_dir, exist_ok=True)
pyutils.Logger(args.log_name + '.log')
print(vars(args))
if args.train_cam_pass is True:
import step.train_cam
timer = pyutils.Timer('step.train_cam:')
step.train_cam.run(args)
if args.make_cam_pass is True:
import step.make_cam
timer = pyutils.Timer('step.make_cam:')
step.make_cam.run(args)
if args.eval_cam_pass is True:
import step.eval_cam
timer = pyutils.Timer('step.eval_cam:')
step.eval_cam.run(args)
if args.cam_to_ir_label_pass is True:
import step.cam_to_ir_label
timer = pyutils.Timer('step.cam_to_ir_label:')
step.cam_to_ir_label.run(args)
if args.train_irn_pass is True:
import step.train_irn
timer = pyutils.Timer('step.train_irn:')
step.train_irn.run(args)
if args.make_ins_seg_pass is True:
import step.make_ins_seg_labels
timer = pyutils.Timer('step.make_ins_seg_labels:')
step.make_ins_seg_labels.run(args)
if args.eval_ins_seg_pass is True:
import step.eval_ins_seg
timer = pyutils.Timer('step.eval_ins_seg:')
step.eval_ins_seg.run(args)
if args.make_sem_seg_pass is True:
import step.make_sem_seg_labels
timer = pyutils.Timer('step.make_sem_seg_labels:')
step.make_sem_seg_labels.run(args)
if args.eval_sem_seg_pass is True:
import step.eval_sem_seg
timer = pyutils.Timer('step.eval_sem_seg:')
step.eval_sem_seg.run(args)
Hi, Jiwoon Ahn
After transforming the pseudo label to the COCO-style annotations, I trained the Mask R-CNN with ResNet-50-FPN .
But the performance i got is slightly lower than the report ,mAP50 is 45.0.
I 'd like to ask you about the mask-rcnn training strategy, what kind data augmentation you adopt.
Thank you !
How does the displacement branch optimize for more than a single instance? The number of instances is missing from the groundtruth.
Would you please explain the reason why sqrt is used in generating CAM?
@jiwoon-ahn What does the cam_to_ir_label.py
do exactly? Does it create a binary mask?
In my model, the quality of CAM will achive the best when the value of 'cam_eval_thres' is set to 0.35.So i want to know how to set the value of other paremeters ? Looking forward to your reply,thanks!
` for x in range(1, max_radius):
search_dirs.append((0, x))
for y in range(1, max_radius):
for x in range(-max_radius + 1, max_radius):
if x * x + y * y < max_radius ** 2:
search_dirs.append((y, x))`
Thanks for sharing the work. I think the search_dirs seems to be a half circle instead of a circle. Not sure whether i understand it correctly.
Look forward to your reply.
Hi!
Thanks for this amazing work for weakly-supervised instance segmentation. I am wondering that you can share the weights file (.pth) for IRNet model since I get poor results for generating a boundary map! Thanks so much!
Hi, Jiwoon Ahn,
I wonder to know what is the path index in the code? which part in the paper could I refer to it?
Additionally, when will the training detail be released? Looking forward to following your work.
Thanks.
I noticed that the convolutinal filter numbers in IRNet (either the class boundary part or the displacement part) is different from the settings in your original paper. So, may I ask, generally speaking, which setting is better in your former experiments? Best wishes.
Hi Jiwoon Ahn,
Your paper is very good and I'm really interested in it. I've already tried your code, but I cannot achieve the same performace as the paper. Would you please help me figure out where the problem is?
In my experiments, the learning rates of both CAM and IRN are set to 0.1, while other hyper-parameters follow the default setting in rum_sample.py. My performance are as following,
model | task | my exp. | reported |
---|---|---|---|
CAM | semantic segmentation | 48.1 | 48.3 |
IRN | semantic segmentation | 64.9 | 66.5 |
IRN | instance segmentation | 32.4 | 37.7 |
The CAM models have similar performace, but there are performance gaps between IRN models in both task.
There may be two possible reasons for the gap.
Would you please point out the differences between my experiments and yours that may results in the gap? Thank you!
Dear Jiwoon, in the file 'train_irn.py', I noticed that GN was tuning using the inference data in the latest commit, location. Is this right in the weakly supervised instance segmentation setting? I think the validation set should not be touched except for evaluation, rather than training/tuning parameters. And I'm also curious what would be affected by this? Will the mAP be improved? Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.