Asymmetry in Low-Rank Adapters of Foundation Models

🌟 ArXiv Preprint

This repo hosts the code for the paper "Asymmetry in Low-Rank Adapters of Foundation Models". We discover and analysis the asymmetry of the LoRA adapter matrices B and A,

🔗 Quick Links

Asymmetry in Low-Rank Adapters of Foundation Models

Install Requirements

Step 1: Please follow the installation steps. First, make sure you have Pytorch installed.

pip3 install torch==1.13.0 torchvision

Step 2: Then install the rest of the required packages:

cd AsymmetryLoRA
pip install -r requirement.txt

Usage

Our LoRASYM module follows the structure of the peft module. Specifically, we provide a flexible interface to account for the initialization settings of matrices A and B:

V and U: Right and left singular matrices of the original weight matrix.
random: Initializes with a random orthonormal matrix.
he: Uses torch.nn.init.kaiming_uniform_ for random uniform distribution, optimizing neural network layer activations.

You can customize matrices A and B with these options.

Matrix	Options	Example	Explanation
A	`V`, `rand`, `he`, `zero`	`A_rand`	A is intialized as random orthonormal matrix and is freezed during training.
B	`U`, `rand`, `he`, `zero`	`hB_zero`	B is initialized as zero and will be updated.

Explaination: A_rand_hB_zero means A is initialized as random orthonormal and unchanged, while B starts at zero and is being updated.

We provide a wrapper that compiles with other models from Huggingface's transformer models. The following is an example of usage:

from transformers import AutoModelForSequenceClassification
from LoRASYM_peft.local_peft_model_all import PeftModelForCausalLM_local,
from LoRASYM_peft.local_lorasym_all import LoRASYMConfig

model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path,
    )

update_rule_dict = para_dict = {"update_A": False, "update_B": True, 
"A_init": "rand", "B_init": "zero"}

lorasym_config = LoRASYMConfig(
                r=16,   
                lora_alpha=32,
                lora_dropout=0.05,
                bias="none",
                modules_to_save=["classifier"],
                update_rule=update_rule_dict,
                task_type="SEQ_CLS",
                )

lora_model = PeftModelForCausalLM_local(model, lorasym_config)

GLUE benchmark

Use the following command to fine-tune RoBERTa-large model for tasks in the GLUE benchmark.

cd GPT_experiments

python -m run_glue_origin_ft --model_name_or_path roberta-large \
    --task_name rte \
    --ft_method LoRASYM \
    --bf16 True \
    --tf32 True \
    --do_train \
    --do_eval \
    --learning_rate 4e-4 \
    --num_train_epochs 20 \
    --input_seed 7 \
    --lora_svd_method A_rand_hB_zero \
    --lora_rank 8 \
    --lora_alpha 16 \
    --overwrite_output_dir

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Jiacheng Zhu ([email protected]). Please feel free to open an issue if you encounter any problems when using the code.

Citation

Please cite our paper if you find the repo helpful in your work:

@article{zhu2024asymmetry,
      title={Asymmetry in Low-Rank Adapters of Foundation Models}, 
      author={Jiacheng Zhu and Kristjan Greenewald and Kimia Nadjahi and Haitz Sáez de Ocáriz Borde and Rickard Brüel Gabrielsson and Leshem Choshen and Marzyeh Ghassemi and Mikhail Yurochkin and Justin Solomon},
      year={2024},
}

jiacheng-zhu-aiml / asymmetrylora Goto Github PK

asymmetrylora's Introduction

Asymmetry in Low-Rank Adapters of Foundation Models

🔗 Quick Links

Install Requirements

Usage

GLUE benchmark

Bugs or Questions?

Citation

asymmetrylora's People

Contributors

Stargazers

Watchers

Forkers

asymmetrylora's Issues

Paper, B.2. Proof of Lemma 4.1 - Definition of ∆

Paper, B.2. Proof of Lemma 4.1 - σ

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent