Git Product home page Git Product logo

paranioar / unipt Goto Github PK

View Code? Open in Web Editor NEW
56.0 1.0 1.0 16 MB

[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"

License: Apache License 2.0

Python 99.03% HTML 0.18% SCSS 0.05% Shell 0.74%
cross-modal parameter-efficient-learning parameter-efficient-tuning transfer-learning memory-efficient-learning memory-efficient-tuning parameter-efficient-fine-tuning

unipt's Introduction

UniPT

PyTorch implementation for CVPR2024 paper of “UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory”.

It is built on top of the VSE-infty, CLIP-ViL, CLIP4Clip, MDETR, LST and Awesome_Pretraining_Transfering.

If any problems, please contact me at [email protected]. ([email protected] is deprecated)

Introduction

The framework of UniPT:

Overview of the Framework with (a) parallel interaction ($\varphi$) and (b) confidence aggregation ($\theta$) layers. The former attempts to extract more discriminative features at each layer independently guided by the relatively most powerful output features, while the latter learns a dynamic and optimal combination strategy over the blended features at each layer for the ultimate domain adaptation.

Task & Model Details

Image-Text Retrieval: VSE-infty with the strongest combination of a BERT-base model and a ResNeXt-101(32×8d) backbone pre-trained on Instagram (WSL).

Video-Text Retrieval: CLIP4Clip with the pre-trained CLIP network using Text Transformer and ViT-B/32 models.

Question Answering: CLIP-ViL that utilizes the CLIP image backbone and encodes the text into the word embedding sequence, followed by a cross-modal Transformer.

Visual Grounding: MDETR with a pre-trained ResNet-101 vision encoder, a RoBERTa-base text encoder, and a query-based encoder-decoder Transformer.

Please refer to their respective README.md file for the detailed settings.

Guidance for Applications

We summarize the positions where UniPT is defined and invoked in each work as follows:
We hope these help you quickly realize your idea beyond UniPT.

  1. CLIP-ViL: UniPT is defined and called at class LXRTEncoder(nn.Module) from CLIP-ViL/src/lxrt/modeling.py.

  2. CLIP4Clip: UniPT is defined at CLIP4Clip/modules/module_adapter.py, and called at Line 251-261 from CLIP4Clip/modules/modeling.py.

  3. VSE-infty: UniPT is defined at VSE-infty/lib/adapter_for_cnn.py and VSE-infty/lib/adapter_for_transformer.py, and called at VSE-infty/lib/encoders.py.

  4. MDETR: UniPT is defined and called at class Transformer(nn.Module) from MDETR/models/transformer.py.

Reference

If UniPT is useful for your research, please cite the following paper:

  @article{Diao2023UniPT,
      title={UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory},
      author={Diao, Haiwen and Wan, Bo and Zhang, Ying and Jia, Xu and Lu, Huchuan and Chen, Long},
      journal={arXiv preprint arXiv:2308.14316},
      year={2023}
  }

License

Apache License 2.0.

unipt's People

Contributors

paranioar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

brainstormyyf

unipt's Issues

roberta-base

@Paranioar

Nice work !

I aim to reproduce results on RefCOCO dataset. When following line is executed, the issue occurs, i.e., ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

self.tokenizer = RobertaTokenizerFast.from_pretrained(text_encoder_type)
self.text_encoder = RobertaModel.from_pretrained(text_encoder_type)

However, my Internet connection is on constantly. So you can upload the roberta-base model you used and put them in corresponding folder, like pretrained_resnet101_checkpoint.pth put into pretrained_weights folder ?

请求提供一份简单样例

能否提供一个关于CNN和transformer的简单样例代码,并在其上使用UniPT,以便于可以通过学习这个样例来将该方法使用在其他领域上。十分感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.