hkchengrex / mask-propagation Goto Github PK

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Home Page: https://hkchengrex.github.io/MiVOS/

License: MIT License

Python 100.00%

computer-vision cvpr2021 deep-learning pytorch segmentation video-object-segmentation video-segmentation

mask-propagation's People

Contributors

Stargazers

Watchers

mask-propagation's Issues

Do I need to manually merge UTF-8 "BL30K_A, B, and C into the BL30K directory after the data set BL30K is downloaded?

Do I need to manually merge UTF-8 "BL30K_A, B, and C into the BL30K directory after the data set BL30K is downloaded?I see that your code reads 'JPEGImages and 'Annotations' directly.

How to install thinplate?

Hi, thanks for your great work. How to install thinplate?
/Mask-Propagation-main/dataset/tps.py", line 4, in
import thinplate as tps
ModuleNotFoundError: No module named 'thinplate'

Did you use Kernel Memory (KM) in your tests?

@hkchengrex Did you use Kernel Memory (KM) in your tests? Which line is it?

Which model should we choose for the next step of training?

Seven models were saved after pre-training with static images. Which model should we choose for the next step of training?

metrics results of test dataset

After I run the code eval_davis_2016.py, I only get the mask file in the output file. how could I get the value of metrics such as J, J&F? and how could we test the model on personal datasets to get those metrics after using interactive_gui.py?
Thanks for your suggestions

Does --benchmark speed up training?

My command line works with --benchmark. Does --benchmark speed up training?

Does the kernelized memory need training?

I use the kernelized memory when evaluate STCN while did not use it when training. But the result showed a slight decrease. The raw J&F-Mean of davis2016val is 0.916, davis2017val is 0.853and davis2017testdev is 0.755 . After I use the kernelized memory, J&F-Mean of davis2016val is 0.913, davis2017val is 0.852 and davis2017testdev is 0.750 . Does it because the kernelized memory need training? But why it need training since it has no trainable parameters?

How do I get the penultimate frame?

How do I get the penultimate frame? Am I doing it right?

J&F performance on BL30K

Hi, I am doing BL30K training for DAVIS 2017 val (including stage 0 and stage 1). I just want to know what J&F should I achieve on the DAVIS 2017 val after finishing BL30K training? Therefore, I can check whether my training is correct. I think it did not included in readme.

RuntimeError: Error(s) in loading state_dict for PropagationNetwork

Hello ! I want to train the PropagationNetwork on my personal image dataset, so I use the training command CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s01 --load_network ./saves/propagation_model.pth --stage 0.(based on the pretrain model S012). It threw a runtime error.

The training command works fine without the --load network parameters. Could you give me some suggestions?

Which YouTubeVOS do you use to train?

2018 or 2019?

about BL30K

您好，能把这个数据集BL30K传到百度网盘吗？BL30K这个数据集太大了，谷歌网盘有下载容量的限制，而且下载不稳定老是断。

YouTubeVOS dataset cannot be downloaded

I tried to go to the link in the code to download, but the link shows download error with the message as shown.

How to run two copies of your code at the same time?

I have duplicated two copies of your code and made small changes in the duplicated code respectively. When one is being trained, the other one cannot be trained. If the two codes are trained at the same time, what parameters need to be changed?One of my computers has 4 2080ti, the memory is enough.

thin-plate-spline question

Hey,

I'm the author of https://github.com/cheind/py-thin-plate-spline and discovered your dependency here. I've tracked its usage down to

Mask-Propagation/dataset/tps.py

Line 16 in b5d8e61

theta = tps.tps_theta_from_points(c_src, c_dst, reduced=True)

I'm curious, what are you using it for? It seems like you are augmenting training data by warping the data with thin-plate-spline model. Was the library useful in this respect, i.e did it improve performance (quantifyable)? I'd like add a linkto your paper/model the project page if you don't mind?

utf-8''BL30K_c.zip里面包含了 utf-8''BL30K_c.tar文件

在下载BK30K数据集时，utf-8''BL30K_c.zip里面包含了 utf-8''BL30K_c.tar文件，导致无法解压，其他几个文件都是tar格式的，您遇到过这个问题吗？

Why the DAVIS 2016 val did not use kernelized memory?

How to skip the pre-training of static images and BK30L and go directly to VOS training?

How to skip the pre-training of static images and BK30L and go directly to VOS training? What changes need to be made in the code and command line?

Pre-training on the BL30K dataset after pre-training on static images

As I see that in the pre-training on static images stage, the "single_object" in PropagationNetwork is True, so the MaskRGBEncoderSO is used.
When I try to load the pre-trained of the above stage for the pre-training on the BL30K dataset or Main training, the "single_object" now is False and the model use MaskRGBEncoder instead. After that, the model can not load the model successfully.
Here is the error:
Traceback (most recent call last): File "train.py", line 68, in <module> total_iter = model.load_model(para['load_model']) File "/content/Mask-Propagation/model/model.py", line 180, in load_model self.PNet.module.load_state_dict(network) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PropagationNetwork: size mismatch for mask_rgb_encoder.conv1.weight: copying a param with shape torch.Size([64, 4, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 5, 7, 7]).

So can you explain how can we fix it?
Thank you so much.

some question about codes

您好，在您的代码中我有个疑问想请教一下，如果我对meige每个帧分别保存两个key特征图和两个value特征图，如下：

应对PropagationModel中的self.PNet(Fs[:,0], Ms[:,0])也分别对应两个key特征图和两个value特征图？

是不是对应在PropagationNetwork的segment中的输入是不是也要对应两个key特征图和两个value特征图？
除外之外，还需要做哪些改进吗？

Does the full model means s012 model?

about train stage

Sorry to bother you, I plan to use my own image dataset and video dataset for training, I would like to ask if I can skip the training of stage1, i.e. use the image dataset to train stage0, use the video dataset (I have converted into the DAVIS dataset format) training stage2

Has anyone experienced CUDA out of memory?

Hi! Thank you for sharing this great code. I was wondering on what machine did you use for training and inference. I was using it to infer on my own data, and some of the bigger video sequence yield CUDA out of memory (v100 16gb).
I also tried to load the model to fp16, but I feel like the accuracy was compromised because some folders did not have anything segmented.

Please let me know if there's anything you'd suggest us trying. Thank you very much!

How do I get quantitative results after testing on the DAVIS 2016 dataset?

Hi, how do I get quantitative results after testing on the DAVIS 2016 dataset?

Question about result on DAVIS16

I want ask why I get this result using your pre-trained model? Thanks!

About BL30K

作者您好，我将BL30K的6个压缩包全部下载好，并全部解压之后，在进行第二个阶段的预训练时报错是找不到data/dangjisheng/BL30K/a/BL30K/Annotations/kea03423/00020.png'，不知道为什么？我是把6个文件压缩包全部下载好而且全部解压在一个目录下的，为什么会报错缺少文件？期待您的回复。

For the current frame, how do I search for adjacent frames?

For the current frame, how do I search for the next K frames?Or, how do you randomly sample multiple frames in your code at once?

license file missing

Is the repo/project have any restrictions for usage.Is it available for commercial use also?

Confusion about Fig.6

I apologize for bothering you.
Is Figure 6 generated by compiling relevant data from multiple video sequences, calculating their mean and interquartile range, and then plotting them? If so, could you please specify which dataset's video sequences you used?

only mask propagation?

Hi, your method consists of three core components: interaction-to-mask, mask propagation, and difference-aware fusion. The semi-supervised video target segmentation published by this project only includes Mask Propagation?

how to save the feature map of manymemory frames?

There is a part of your code that I don't understand. Should the memory frame be stored separately, or should the key-value feture map and the content feature map of the memory frame be connected together to save?Which line represents the memory frame saved?

The server remained unresponsive for a long time when I try to train your model.

When I ran this line of code on our server, the server did not respond for a long time. Do you know why?

UDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=1 train.py --id retrain_s0 --stage 0 --batch_size 4

About Fig.6 again

Do you include the background when calculating mIoU?
The results I calculated are somewhat unusual; the decrease in mIoU(-0.07 ~ -0.17) is not as significant as shown in Figure 6(-0.1 ~ -0.4).

I used the original data you provided, 81.5 and 83.1.

About STCN, I used the original data you provided, 83.3(computed with official model without top-k) and 85.3.

Here is my code:

def calculate_iou(pred_mask, true_mask, class_id):
    pred_class = (pred_mask == class_id)
    true_class = (true_mask == class_id)

    intersection = np.logical_and(pred_class, true_class)
    union = np.logical_or(pred_class, true_class)

    iou = (np.sum(intersection) + 1e-6) / (np.sum(union) + 1e-6)

    return iou


def calculate_miou(pred_dir, gt_dir):
    pred_images = [Image.open(path) for path in pred_dir]
    gt_images = [Image.open(path) for path in gt_dir]

    num_classes = np.max(np.array(pred_images[0])) + 1  # Assuming class labels start from 0

    class_ious = np.zeros(num_classes)

    for class_id in range(num_classes):
        class_iou_sum = 0
        class_pixel_count = 0

        for i in range(len(pred_images)):
            pred_mask = np.array(pred_images[i])
            true_mask = np.array(gt_images[i])

            class_iou = calculate_iou(pred_mask, true_mask, class_id)

            class_iou_sum += class_iou
            class_pixel_count += np.sum(true_mask == class_id)

        class_ious[class_id] = class_iou_sum / len(pred_images)

    mean_iou = np.mean(class_ious)

    return mean_iou, class_ious

What if I didn't use BL30K?

I did not use your pretrainined model and started training your network again. My pre-training only used static pictures instead of BL30K. After the completion of the training, the result of the test on Davis 2017 data set was only 70.6%(J F mean), is this result normal?What would the normal result be if we didn't use BL30K?

subprocess.CalledProcessError

Hi, thanks for your great work! When I try to run CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -m torch.distrib
uted.launch --master_port 9842 --nproc_per_node=2 train.py --id retrain_s0 --stage 0
, I meet this problem, can you help me?

File "/home/longma/anaconda2/envs/p3torchstm/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/longma/anaconda2/envs/p3torchstm/bin/python', '-u', 'train.py', '--local_rank=1', '--id', 'retrain_s0', '--stage', '0']' returned non-zero exit status 1.

About top-k filtering

I am sorry to bother you.
When I used top-k, I found a problem that my input values might be large sometimes.
After 'values.exp_()', there will be 'INF'.
To avoid overflow, I change the computation as:

Is it reasonable to make this change?

RuntimeError: CUDA error: out of memory

How many GPUs do you need to test on Davis and YouTube?I keep reporting memory errors during my tests.I directly used the model trained by static pictures for VOS training, skipping the pre-training of BL30K. Is that OK?

如何取所有内存帧中的前一帧？

您好，有个问题想请教下，

这是将所有内存帧和当前帧的内存读取操作，如果说只利用当前帧的前一帧进行匹配的话，如果去内存帧的最后保留的一帧，换句话说如何取前一帧？代码中如何实现？感谢。

When will you release the code on the youtube dataset?

Thank you very much for your great work. When will you release the code on the youtube dataset? Am looking forward to!

Is it possible to further improve the accuracy by pre-training with other static images and virtual video?

Is it possible to further improve the accuracy by pre-training with other static images and virtual video?Such as COCO dataset.

Out of memory was reported when batch_size=1 when testing the model

What parameters should be changed to reduce the memory footprint when the batch_size=1 is reported to be out of memory?

about batch size

The default value of your batch_size is 1. If we increase the batch size, will the accuracy be improved? Have you done any relevant experiments?

About BK30K

How to install thinplate manually?

(mivos2) dangjisheng@ubuntui:/data/dangjisheng$ pip install git+git://github.com/cheind/py-thin-plate-spline
Collecting git+git://github.com/cheind/py-thin-plate-spline
Cloning git://github.com/cheind/py-thin-plate-spline to /tmp/pip-g3zq_wu4-build
error: Couldn't set refs/heads/master
fatal: update_ref failed for ref 'HEAD':

How to install thinplate manually?I changed a computer and I didn't install thinplateon with the commands you provided.
Thank you very much for your help. I would like to reproduce your code as soon as possible and start some work based on your work.

Backbone uses Resnet50?

Backbone uses Resnet50?Will the precision be improved if Backbone is replaced with Resnet101?

Why don't you use top_k and km during the training phase?

Looking at your code I was a little confused why you didn't use top_k and km during the training phase.
But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?

hkchengrex / mask-propagation Goto Github PK

mask-propagation's People

Contributors

Stargazers

Watchers

Forkers

mask-propagation's Issues

Recommend Projects

Recommend Topics

Recommend Org