Comments (4)
Thanks for reporting. I dug a bit deeper. The offending line, at least in my setup, is:
peft/src/peft/utils/loftq_utils.py
Line 140 in 32f3878
With the incoming weight having a shape of (3072, 3072), we have:
weight_divabs
=> 147456, 64, 1L_reshaped
=> 1, 256abs_diff
=> 147456, 64, 256
So abs_diff
tries to allocate 9 GB of memory (with float32). I wonder if we can avoid such a huge shape. Pinging @yxli2123.
What you could try right now is to use the replace_lora_weights_loftq
function. This allows you to load the model with bnb quantized weights, i.e. with lower memory requirement, and apply LoftQ on the fly with relatively little overhead. I tried this on my machine and memory was consistently < 5GB:
import time
import gc
import torch
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, replace_lora_weights_loftq
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
checkpoint_path = "microsoft/Phi-3-mini-4k-instruct"
model_kwargs = dict(
use_cache=False,
trust_remote_code=True,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
)
peft_config = LoraConfig(
r = 8,
lora_alpha = 32,
lora_dropout = 0.05,
bias = "none",
task_type = "CAUSAL_LM",
target_modules = "all-linear",
modules_to_save = None,
use_rslora = True,
)
model = AutoModelForCausalLM.from_pretrained(checkpoint_path, **model_kwargs)
model = prepare_model_for_kbit_training(model)
#model = model.to("cpu")
torch.cuda.empty_cache()
gc.collect()
model = get_peft_model(model, peft_config)
replace_lora_weights_loftq(model) # takes a couple of minutes
Note that using this approach is more memory efficient, but it might not perform as well, at least not without making use of the callback feature described in this LoftQ init notebook.
from peft.
@BenjaminBossan Thank you for the advice, your method works! So the issue was that the weight matrix (3072,3072) was being quantized all at once and there wasn't enough space available for the necessary computations.
Can you clarify however what does replace_lora_weights_loftq
do? Because from the source code it seems to assume that the Lora adapter weights are quantized already, but there's no mention of quantization in your LoraConfig
. Are the Lora weights initialized as quantized because the model weights are quantized?
from peft.
Can you clarify however what does
replace_lora_weights_loftq
do? Because from the source code it seems to assume that the Lora adapter weights are quantized already, but there's no mention of quantization in yourLoraConfig
. Are the Lora weights initialized as quantized because the model weights are quantized?
The LoRA weights are never quantized, regardless of whether the base model is quantized or not. This is necessary because quantized weights cannot be trained, and we want the LoRA weights to be trained. But since the total number of parameters of the LoRA weights is typically small, this should still result in less memory being used than full fine-tuning.
from peft.
You're right, I forgot that quantization can be used only during inference. Thank you very much.
from peft.
Related Issues (20)
- TypeError: unsupported operand type(s) for *: 'Parameter' and 'NoneType' HOT 1
- Add support for OpenELM LoRA fine-tuning HOT 2
- Initialization for LoRA weights A and B initialized HOT 1
- Trainer.train() giving me Key Error: [random number] HOT 3
- Delete certain layers from PEFT model. HOT 6
- DoRA training in distributed setting
- Reproducibility when using a model with batch norm
- FSDP Dora/QDora Broken HOT 4
- CUDA kernels from PEFT v0.11.0 breaks C++ compilation HOT 4
- Adapter Merge for Idefics2 HOT 2
- `AdaLoRA` is incompatible with `gradient checkpointing` when using `torchrun` HOT 2
- LoRA adaptation shape mismatch HOT 7
- cannot import name 'get_peft_config' from 'peft' (unknown location) HOT 1
- how to fine tune LoRA HQQ? HOT 1
- How to finetune embeddings and LM head as a single layer when they are tied? HOT 1
- Help with : LoRA issue in distributed setting HOT 2
- ImportError with pkg_resources and packaging in PEFT when using setuptools >= 70.0.0 HOT 2
- High loss when init with `AdaLora` HOT 3
- LoRA Adapter from local model are leading to error HOT 1
- AdaLora: rank remains constant (to init_r value) across training HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from peft.