yangsenqiao / vida Goto Github PK

View Code? Open in Web Editor NEW

42.0 4.0 4.0 603 KB

[ICLR 2024] ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

License: MIT License

Shell 0.75% Python 97.67% Jinja 1.57%

vida's Introduction

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

Jiaming Liu*, Senqiao Yang*, Peidong Jia, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang

Overview

Since real-world machine systems are running in non-stationary and continually changing environments, Continual Test-Time Adaptation (CTTA) task is proposed to adapt the pre-trained model to continually changing target domains. Recently, existing methods mainly focus on model-based adaptation, which aims to leverage a self-training manner to extract the target domain knowledge. However, pseudo labels can be noisy and the updated model parameters are uncertain under dynamic data distributions, leading to error accumulation and catastrophic forgetting in the continual adaptation process. To tackle these challenges and maintain the model plasticity, we tactfully design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-agnostic knowledge. Specifically, we first comprehensively explore the different domain representations of the adapters with trainable high and low-rank embedding space. Then we inject ViDAs into the pre-trained model, which leverages high-rank and low-rank prototypes to adapt the current domain distribution and maintain the continual domain-shared knowledge, respectively. To adapt to the various distribution shifts of each sample in target domains, we further propose a Homeostatic Knowledge Allotment (HKA) strategy, which adaptively merges knowledge from each ViDA with different rank prototypes. Extensive experiments conducted on four widely-used benchmarks demonstrate that our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks. In addition, our method can be regarded as a novel transfer paradigm and showcases promising results in zero-shot adaptation of foundation models to continual downstream tasks and distributions.

Installation

Please create and activate the following conda envrionment.

# It may take several minutes for conda to solve the environment
conda update conda
conda env create -f environment.yml
conda activate vida

Classification Experiments

ImageNet-to-ImageNetC task

We release the code of the baseline method based on CNN and vit.

CNN as the backbone

cd imagenet
bash ./bash/source_cnn.sh # Source model directly test on target domain
bash ./bash/tent_cnn.sh # Tent 
bash ./bash/cotta_cnn.sh # CoTTA

ViT as the backbone

Our source model is from timm, you can directly donwload it from the code.

And our source ViDA model is here

cd imagenet
bash ./bash/source_vit.sh # Source model directly test on target domain
bash ./bash/tent_vit.sh # Tent 
bash ./bash/cotta_vit.sh # CoTTA
bash ./bash/vida_vit.sh # ViDA

Cifar10-to-Cifar10C task

Please load the source model from here

And our source ViDA model is here

cd cifar
bash ./bash/cifar10/source_vit.sh # Source model directly test on target domain
bash ./bash/cifar10/tent_vit.sh # Tent 
bash ./bash/cifar10/cotta_vit.sh # CoTTA
bash ./bash/cifar10/vida_vit.sh # ViDA

Cifar100-to-Cifar100C task

Please load the source model from here

And our source ViDA model is here

cd cifar
bash ./bash/cifar100/source_vit.sh # Source model directly test on target domain
bash ./bash/cifar100/tent_vit.sh # Tent 
bash ./bash/cifar100/cotta_vit.sh # CoTTA
bash ./bash/cifar100/vida_vit.sh # ViDA

For segmentation code, you can refer to cotta and SVDP. As for the source model, you can directly use Segformer trained on Cityscapes.

Citation

Please cite our work if you find it useful.

@article{liu2023vida,
  title={ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation},
  author={Liu, Jiaming and Yang, Senqiao and Jia, Peidong and Lu, Ming and Guo, Yandong and Xue, Wei and Zhang, Shanghang},
  journal={arXiv preprint arXiv:2306.04344},
  year={2023}
}

Acknowledgement

CoTTA code is heavily used. official
KATANA code is used for augmentation. official
Robustbench official

Data links

ImageNet-C Download

vida's People

Contributors

Stargazers

Watchers

Forkers

liujiaming1996 hnw-han xmxtony sohyun-l

vida's Issues

About training vida source model

Hi, @Yangsenqiao, thanks for releasing your code. I really appreciate it.
Btw, how can I train by myself this source model using your code base?

Thanks.

Adapter Code

When are you planning to release your Adapter code?
I'll be thank you if you inform me :)

Source VIDA Model vs. Source Model

Hi,

First off, thank you for the interesting work!

I'm conducting experiments with CIFAR-10 and noticed references to both a "source VIDA model" and a "source model." Could you kindly explain the difference between these two?

Additionally, in order to accurately replicate the VIDA model, could you confirm whether the correct command is: python cifar10c_vit.py --cfg cfgs/cifar10/vida.yaml --checkpoint vit_1_128_vida.t7 --data_dir [data_dir] (utilizing the source VIDA model), or should it be python cifar10c_vit.py --cfg cfgs/cifar10/vida.yaml --data_dir [data_dir] (utilizing the source model)?

Thank you very much for your time and help.

Thanks!

Pretrained ViDA Model and Code for ResNet-50

Hi,

I would like to request the pretrained ViDA model and the corresponding code for ResNet-50. If possible, please provide a download link for the model and the code.

Thank you.

Where are the codes of adapter?

Is the paper accepted to the NIPS?

About segmentation codes

Hi, thanks for your impressive work! :)
I really looked forward to releasing this code, so thank u for sharing!

btw, may I ask whether u can share segmentation codes also?

ViDA's results on ResNet50+ImageNet-C

Dear Yangsenqiao! Thanks for your work.
Since you did not provide the configuration file for ViDA on ResNet50, we are unable to reproduce the results for that part. Could you please provide the relevant configuration file?

Request code and pretrained source vida model on Cifar100C & ACDC

Hi, I kindly request the code for training the ViDA model on the Cityscapes->ACDC and CIFAR100->CIFAR100C benchmarks, including pre-trained source vida model weight.

Thank you for your interesting work which is very helpful for my research.

labels between cifar100 and cifar100c

Unfair comparison with CoTTA in segmentation tasks with lower learning rate

Hi, I want to first congratulate you on your excellent work!

However, I have experimented on CoTTA with a learning rate of 3e-4 and its performance is amazing enough even without test-time augmentation:
Revisiting 0: Fog = 70.17 Night = 44.49 Rain=65.23 Snow = 62.74

Comparing with yours in Table 4:
Revisiting 0: Fog = 71.6 Night = 43.2 Rain = 66.0 Snow = 63.4

It seems that not only in this paper but also in VDP and SVDP, your segmentation results are not fair to baselines with a learning rate of 7.5e-6. Kindly hope you can double check it.

Regards.

About Gradient Problem and Resnet Model

In your paper, only the adapter is updated. But although you only allow gradients for the adapter in your ImageNet code, the model is already initialized with all parameters requiring gradients.

def inject_trainable_vida(... ):
  # model already initialized with all parameters requiring gradients.
    for _module in model.modules():
        if _module.__class__.__name__ in target_replace_module:
            for name, _child_module in _module.named_modules():
                if _child_module.__class__.__name__ == "Linear":
                    # ... inject the adapter
                    _module._modules[name].vida_up.weight.requires_grad = True
                    _module._modules[name].vida_down.weight.requires_grad = True

                    require_grad_params.extend(
                        list(_module._modules[name].vida_up2.parameters())
                    )
                    require_grad_params.extend(
                        list(_module._modules[name].vida_down2.parameters())
                    )
                    _module._modules[name].vida_up2.weight.requires_grad = True
                    _module._modules[name].vida_down2.weight.requires_grad = True                    
                    names.append(name)

    print([name for name, param in model.named_parameters() if param.requires_grad])
    # will contains all module of the model (backbone + vida)

This means that when the code actually runs, all parts are updated.

When I tried disabling gradients for the backbone, I couldn't achieve the expected performance, with results like this, which is almost 1 persent higher then the reported result in the paper.
(Adapter_LR: 2e-7 EMA_MT: 0.8)

Metric	Gaussian	Shot	Impulse	Defocus	Glass	Motion	Zoom	Snow	Frost	Fog	Brightness	Contrast	ElasticTransform	Pixelate	JPEG	Mean
Error	48.68	42.72	42.80	52.56	59.10	44.78	49.74	39.22	42.36	40.28	24.34	58.50	50.64	33.64	32.96	44.15

Do you have any ideas why this happened, that will be a great help for us.

Also, could you please provide the code for the ResNet part and the warm-up model, or just the warm-up model? That would greatly help our research.

question about adapter initialization for segmentation vida

Hi, I have a question about the initialization of the adapters in vida segmentation model.

According to this ViDA OpenReview for NeurIPS '23 in this link, I noticed that you have tried 3 versions of adapter initializations for CityScapes-to-ACDC experiments: Scratch, ImageNet pretrained and Source pretrained.
(I think I read this table somewhere inside a paper, maybe in a supplementary material for ICLR '24(?), but I cannot find the paper rn.)

Since it was mentioned clearly in the experiments section of the paper that you have used SegFormer Mix Transformer as the backbone of your segmentation model, I got curious how you pretrained Mix Transformer encoder with ImageNet dataset since the dataset is specifically designed for image classification tasks.

It seems possible extracting image features using Mix Transformer encoder and then put them into MLP head for image classification, but I wanted to make sure.

Thank you!

yangsenqiao / vida Goto Github PK

vida's Introduction

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

Overview

Installation

Classification Experiments

ImageNet-to-ImageNetC task

Cifar10-to-Cifar10C task

Cifar100-to-Cifar100C task

Citation

Acknowledgement

Data links

vida's People

Contributors

Stargazers

Watchers

Forkers

vida's Issues

Recommend Projects

Recommend Topics

Recommend Org