Dear author: I have a question about the inference setting. In this section: <

NOTE: This Answer Does Not Match the Question! <p dir

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dear author, I find another issue in the evaluation code. <div class="Box

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question about the input size of images during inference time. about pixelssl HOT 16 CLOSED

charlesCXK commented on August 29, 2024

Question about the input size of images during inference time.

from pixelssl.

Comments (16)

ZHKKKe commented on August 29, 2024

NOTE: This Answer Does Not Match the Question!

Hi! Thanks for your good question!

I agree that use the whole image to calculate the metric is more make sense. So I have checked this problem in detail.

Our dataloader is adapted from the repository: https://github.com/jfzhang95/pytorch-deeplab-xception, which is an excellent implementation of the PascalVOC dataset. The FixScaleCrop operation used in our code is the same as theirs (check here).

The problem is that they use the input image of size 513x513, but we follow the original DeepLabV2 paper to input the image of size 321x321. Such a difference is vital for the FixScaleCrop operation during validation since the most samples in PascalVOC is smaller than 513x513. I have tried to train and validate our code with the input image of size 513x513, and the mIOU is improved to 77.36% (averaged over two runs).

Besides, we follow the previous SSL work AdvSemiSeg to close multi-scale inputs with max fusion to save the memory. As reported in the original DeepLabV2 paper, when using input image of size 321x321, this trick can improve mIOU ~2.55% (please refer to TABLE 3 in their paper).

I believe that the reported performance gap does not have a significant impact on a fair comparison of SSL algorithms, since all algorithms perform the same operation. I will try to modify the _val_prehandle function to calculate the metric on the whole image of size 321x321 and share my results with you after I finish it.

BTW, since we adapted the DeepLabV2 code from the repository: https://github.com/hfslyc/AdvSemiSeg, we introduced a bug from their code (check here). In PixelSSL, this bug is written in here, which is reported by @tianzhuotao. I have fixed this bug and found that its impact on performance is negligible. The latest code and pretrained models will be uploaded soon.

If you have any other questions, please free to contact me. :)

from pixelssl.

charlesCXK commented on August 29, 2024

Dear author:
Thanks for your reply. I want to point out that you shouldn't calculate the metrics on the resized image (even if it is directly resized from the raw input). Specifically, if the raw image has a resolution of (200, 600), we should calculate the metrics on the resolution of (200, 600), not (312, 312). So I recommend modifying the Class FixScaleCrop as:

class FixScaleCrop(object):
    def __init__(self, crop_size):
        self.crop_size = crop_size

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']

        ''' !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! '''
        return {'image': img, 'label': mask}

Accordingly, the batch size for inference should be set as 1, since different images have different resolutions and they could not be combined in a same batch.

from pixelssl.

ZHKKKe commented on August 29, 2024

@charlesCXK

I have modified the code and tested it. And the results I got is the same as yours.

I am not sure why this operation will bring such a huge performance degradation. It may be due to more edge information needs to be predicted? (I am not an expert in semantic segmentation.)

I am sorry that I did not find this problem when using third-party code.
I have pointed out this problem in the documentation of the semantic segmentation demo task (check here) and will try to solve it.

Big thanks for your correction. If you come up with any solution, please let me know. :)

from pixelssl.

charlesCXK commented on August 29, 2024

Dear author,
I find another issue in the evaluation code.

PixelSSL/pixelssl/ssl_algorithm/ssl_gct.py

Line 466 in 6ef7cba

self.task_func.metrics(activated_pred, gt, inp, self.meters, id_str=mid)

During evaluation time, the left and the right model will calculate the metrics of the same image twice. At last, we actually traverse the whole dataset twice. A possible solution for standard evaluation of the dual-branch network is that we average the predictions (before softmax operation) of the two branches. However, I think the raw evaluation method in the code may not influence the results significantly.

from pixelssl.

ZHKKKe commented on August 29, 2024

Hi, big thanks for your report! It is a bug in the current release.

I fixed it as:

PixelSSL/task/sseg/func.py

Line 48 in 602ef99

meters.update('{0}_confusion_matrix'.format(id_str), confusion_matrix)

In this way, each task model will calcuate its own confusion_matrix independently.

from pixelssl.

ZHKKKe commented on August 29, 2024

@charlesCXK
Hi. I have investigated the implementation details of DeepLabV2. The DeepLabV2 used in our code closes both MSC and CRF tricks, where MSC is a multi-scale input fusion trick and CRF is a post-processing pipeline. Without these two techniques, the reported mIOU in the original paper is about 73.8% (refer to Table 4 in the original paper).

I modified the data loader for validation so that the input image will be inferred without cropping. Therefore, mIOU will be calculated on the whole input image, and the mIOU of the fully supervised baseline is 73.63%. Note that the short side of the input image is resized to 321, and the long side is scaled by the same ratio. Without this operation, a poor result is gained. Resizing the image to its original size to calculate mIOU does not affect the final result.

The latest results of all supported SSL algorithms have been updated in this doc.

Thanks for your help again. :)

from pixelssl.

charlesCXK commented on August 29, 2024

Dear author:
I think these two tricks (MSC and CRF) are not the point because usually, we do not use these two tricks in the segmentation task.

Besides, I want to point out two findings:
(1) The performance of the supervised baseline should be higher. Taking 1/8 labeled as an example, the supervised baseline trains the labeled set for 40 epochs, while the GCT trains the labeled set for 20*7 = 140 epochs. If you train the supervised baseline for 140 epochs, you could obtain about 2% mIoU performance gain.
(2) Whether GCT algorithm really helps. GCT actually constrains the consistency of the 2 branches. We could design such an experiment: we use the same 2-branch architecture (2 Deeplab v2), the same dataloader (labeled + unlabeled), but drop everything of GCT, including Flaw Detector, FC loss, DC loss. The only thing we need to do is to constrain the consistency of the outputs (class confidence map after softmax) of the 2 branches and you will find that this simple method could obtain similar performance to GCT.

from pixelssl.

ZHKKKe commented on August 29, 2024

(1) I found that longer training of 1/8 labeled supervised baseline will let the model overfit to the domain of the PascalVOC dataset.

(2) Have you tried it? Based on my experiments in the paper Dual Student (DS) (DS is a similar dual branch SSL method for image classification), the consistency constraint between two independent models will not improve results much. One possible reason is that DeeplabV2 is already pretrained by both ImageNet and COCO. Therefore, it will converge very quickly. I will validate it.

from pixelssl.

charlesCXK commented on August 29, 2024

Dear author,
(1) "Overfitting" means we obtain higher accuracy on the training set but lower accuracy on the validation/test set in the meanwhile. Obviously, our case is not "overfitting". It is an "under-fitting" situation for the initial baseline.
(2) I have tried the 2-branch consistency and observed its gain. I think this experiment is one of the essential ablation studies that should be written in the paper.

from pixelssl.

ZHKKKe commented on August 29, 2024

@charlesCXK

(1) "overfit to the domain of the PascalVOC dataset" means the poor performance in other datasets, i.e., it has bad domain adaptation performance.

(2) I have validated the 2-branch consistency setup in a number of cases. In the following context, I refer to the 2-branch consistency as 2-BC. I would like to use the following experiments to declare that GCT is effective than 2-BC in semi-supervised semantic segmentation:

On PascalVOC, when using 1/8 labeled data to train the DeepLabV2 pretrained by COCO, 2-BC gains the mIoU of 70.13%, which is close to the mIoU of 70.57% gained by GCT (the gap is 0.44%). However, the result of 2-BC comes from the COCO pretraining. In many cases, such a good pretraining is unavailable.
When COCO pretraining is disabled, with 1/8 labeled data, the mIoU of 2-BC is 68.89%, which is 1.14% lower than GCT (70.03%).
Although the COCO pretraining is applied, the advantages of GCT are clear when the amount of the labeled data is reduced. Under 1/16 labeled data, the mIoU of 2-BC is 63.55%, while the mIoU of GCT is 65.18%. The gap between them is 1.63%.

Moreover, in the portrait matting task with 100 labeled data, the PSNR of GCT is 1.22 higher than 2-BC.

The experiments show that 2-BC does help models to learn from the unlabeled data. It even outperforms some previous semi-supervised learning methods. However, the flaw detector and the loss functions in GCT still have clear advantages in most cases. I apologize that we forgot to do this ablation experiment in the paper. Thanks.

from pixelssl.

charlesCXK commented on August 29, 2024

Dear author:
You shouldn't compare 2-BC with GCT as that.
(1) There still some hyper-parameters in 2-BC, for example, the weight for the consistency loss. I know you have tried many hyper-parameters in GCT and reported the best one in the paper. It is not clear which hyper-parameter you use in 2-BC, I mean, did you try only once? That is not fair. In my case, changing the loss weight for the consistency loss of 2-BC will bring quite different results (I run experiments with COCO pretraining. This is especially the same situation if there is no pretraining) and I could obtain performance on par with GCT, not 0.44% lower.
(2) Most of us know that if we do not use pretraining, there are two problems we should pay attention to.
First, the hyper-parameters should be changed, especially the learning rate. I don't know whether you keep the learning rate the same as before but this operation is obviously wrong. The performance will suffer from more randomness in segmentation if no pretrained model is used, making the hyper-parameters very important.
Second, you said "in many cases, such a good pretraining is unavailable.", but the ResNet parameters pretrained from ImageNet is widely used by us in many tasks. You shouldn't avoid problems as that. Besides, the most important point is that we are discussing your paper accepted by ECCV 2020 and we should focus on the setting you used in your paper. Please remember that the reviewers decide to accept your paper because they think the numbers reported in the paper are reasonable, not other settings or numbers you report in the github. In your paper that submitted to ECCV, the gap between the supervised baseline and GCT is very large and attractive, for example, nearly 4% mIoU gain using the 1/8 labeled set. However, we know that is not the case. Besides, If you really think that the new setting is ok, you should carefully train the new baseline and conduct ablation studies on it, then submitting another new paper, not this one. I think as a PhD, we should be responsible for our paper and I suggest you send an email to ECCV's PCs and make it clear. Hoping for your reply and thank you.

from pixelssl.

ZHKKKe commented on August 29, 2024

The code of “2-BC” is available in the branch ‘develop_dct’. You can try the experiments under the 1/16 labeled data and the experiments without the COCO pretraining if you want. I have tried several hyper-parameters of 2-BC.

The difference results between our code and our paper come from the aforementioned ‘CenterCrop’ operation during the inference. It does not related to the hyper-parameters or the training process. And I am sorry for this issue. The gap between the baseline and GCT are consistent with the paper. Hence, the conclusion in our paper is consistent with our code. I have claimed the details of our experimental setups in our paper.

I think you sometimes miss my ideas. Please read the comments carefully. For example, “such a good pretraining” means the COCO pretraining, not ImageNet pretraining. The backbone is still pretrained by ImageNet. Previous works used COCO pretraining and we followed them. In fact, this setup may not totally make sense since COCO is a dataset as powerful as PascalVOC.

Thanks.

from pixelssl.

charlesCXK commented on August 29, 2024

(1) I don't think I missed something such as ImageNet pretraining. You could review your response and find whether it has appeared. What you said was "When COCO pretraining is disabled". Have you mentioned ImageNet pretraining? Do not be angry and lose your mind.
(2) You may carefully read my response and dare you to show to the public that your GCT is on par with the simple 2-BC in the paper? Again, please focus on the setting you used in your paper. Since you have tuned it, then we discuss it.
(3) I haven't tested the 1/16 subset but I have tested the 1/4 subset before and the 2-BC was even slightly higher than GCT. I think you may have tested it, too. However, you didn't mention it.

from pixelssl.

ZHKKKe commented on August 29, 2024

I think this discussion can be end here. But the above contents will serve as a reference for future researchers.

2-BC is not a existing SSL method, if you think it is a simple and effective SSL method, maybe you can conduct elaborate experiments and publish a new paper to indicate the problem of GCT. I agree that research is a progressive process.

PixelSSL will continue to be developed for future research.
Thanks again.

from pixelssl.

charlesCXK commented on August 29, 2024

Research is not to show to the reviewers what they want to see but to create something which really helps. For a PhD, the academic reputation is more important than just one paper.

from pixelssl.

ZHKKKe commented on August 29, 2024

I have published the code, welcome others to try and verify it. Please watch your words, thank you.

from pixelssl.

Question about the input size of images during inference time. about pixelssl HOT 16 CLOSED

Comments (16)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent