Git Product home page Git Product logo

Comments (23)

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024 1

Thanks for sharing. This looks correct so far: When saving the adapter with PEFT, the adapter name is being removed from the key, so e.g. when the adapter name is "default" (which is the default), foo.layers.0.self_attn.q_proj.lora_A.default.weight would become foo.layers.0.self_attn.q_proj.lora_A.weight. I'm not 100% sure why it's removed -- probably it's so that we can load the adapter with a different adapter name later, but whatever the reason, that's what happens. In the key names you showed, there is no adapter name, so this is correct.

Later, when we load the adapter, we have to inject the adapter name back into the key, which is happening in the code snippet cited above. Looking through the code, I don't see what could go wrong for the adapter name to be injected twice, so that we end up with base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.default.weight. I thought that maybe the adapter name was not properly removed in the stored adapter file, but as you've shown, that's not the case. Ideally, if you can somehow create a dummy adapter that causes this issue, without any weights trained on your data, and share it as a safetensors file, I could do further debugging.

I think that I should re-train base model with LoRA config, re-convert lora adapter to safetensors, and re-load adapter and re-merge it with base model.

If that's not too much effort for you, this could certainly be a solution. I would certainly start with very little data and ensure that this time around, loading the model works, before spending too much time training.

Alternatively, what you could try to do is to modify the PEFT code a little bit so that the double adapter name is removed. So e.g. in this line, add the following snippet:

peft_model_state_dict = {k.replace("default.default", "default"): v for k, v in peft_model_state_dict.items()}

It's very blunt, but it would be interesting to see if it solves the problem.

from peft.

shjunn avatar shjunn commented on June 2, 2024 1

I appreciate your kind explanation and helpful solution!
I'll try and those jobs won't take so long(with small epochs and small data size for training).
I probably made a mistake on previous training. haha

I hope that I will be able to show you the reason later. Thanks!

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

PEFT should normally handle the prefix correctly. What kind of adapter are you using? Can you share it (ideally as a safetensors file) so that we can try to reproduce? I assume that if you load the saved model, it's not working as expected because the adapter is missing?

from peft.

afalf avatar afalf commented on June 2, 2024

PEFT should normally handle the prefix correctly. What kind of adapter are you using? Can you share it (ideally as a safetensors file) so that we can try to reproduce? I assume that if you load the saved model, it's not working as expected because the adapter is missing?

Yes, the saved model's weight is same with the original model. You can download my adapter from the below url, I use a lora method and save it as a adapter_model.bin.
https://drive.google.com/file/d/15tWQGR9Imrk5lKTaYRMKU70yl4_VdBNF/view?usp=drive_link

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Thanks for the link. I converted the file to safetensors and then loaded it. For me, it worked correctly:

>>> model = AutoPeftModelForCausalLM.from_pretrained(<path>, device_map="cpu")
>>> print(model)
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151646, 2048)
        (layers): ModuleList(
          (0-23): 24 x Qwen2DecoderLayer(
            (self_attn): Qwen2SdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=16, bias=False)  # <= LoRA adapter
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=2048, bias=False)  # <= LoRA adapter
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
etc.
>>> len([m for m in model.modules() if isinstance(m, peft.tuners.lora.LoraLayer)])
168

How exactly did you determine that the model was the same as previously?

from peft.

afalf avatar afalf commented on June 2, 2024

Thanks for the link. I converted the file to safetensors and then loaded it. For me, it worked correctly:

>>> model = AutoPeftModelForCausalLM.from_pretrained(<path>, device_map="cpu")
>>> print(model)
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151646, 2048)
        (layers): ModuleList(
          (0-23): 24 x Qwen2DecoderLayer(
            (self_attn): Qwen2SdpaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=16, bias=False)  # <= LoRA adapter
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=2048, bias=False)  # <= LoRA adapter
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
etc.
>>> len([m for m in model.modules() if isinstance(m, peft.tuners.lora.LoraLayer)])
168

How exactly did you determine that the model was the same as previously?

Yes, this adapter can be correctly loaded, but after I run "model.merge_and_unload()", the merged model's weights is same as previously. I check it by print the values of each layer's weight.

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Yes, the reason appears to be that the LoRA adapter was not trained. All the lora_B weights are 0, therefore LoRA is a no-op and doesn't change the weights:

>>> all((module.weight == 0.0).all() for name, module in model.named_modules() if "lora_B.default" in name)
True

You should train the LoRA adapter correctly first.

from peft.

afalf avatar afalf commented on June 2, 2024

Yes, the reason appears to be that the LoRA adapter was not trained. All the lora_B weights are 0, therefore LoRA is a no-op and doesn't change the weights:

>>> all((module.weight == 0.0).all() for name, module in model.named_modules() if "lora_B.default" in name)
True

You should train the LoRA adapter correctly first.

Emmm, but the lora adapter has been trained and I have checked the weights is not zero in the adapter.bin file, is there some error in converting it to safetensor?
image

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Oh, very strange. I used this space to convert it but maybe I did something wrong. Could you please upload a safetensors file, so that I can check it out? If you do save_pretrained with a recent version of PEFT, it should default to safetensors automatically.

from peft.

afalf avatar afalf commented on June 2, 2024

Oh, very strange. I used this space to convert it but maybe I did something wrong. Could you please upload a safetensors file, so that I can check it out? If you do save_pretrained with a recent version of PEFT, it should default to safetensors automatically.

Here:
https://drive.google.com/file/d/1CQI7UCV-zBTNBK4asHl-UUR1Lqz72JAd/view?usp=sharing

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Thanks for uploading a safetensors version. Your zip file seems to contain the same checkpoint twice, but they appear to be identical, I tried both. I still found that lora_B is all zeros:

>>> all((module.weight == 0.0).all() for name, module in model.named_modules() if "lora_B.default" in name)
True
>>> model.base_model.model.model.layers[0].self_attn.q_proj.lora_B["default"].weight
Parameter containing:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

Could you please verify if you find the same?

from peft.

afalf avatar afalf commented on June 2, 2024

Thanks for uploading a safetensors version. Your zip file seems to contain the same checkpoint twice, but they appear to be identical, I tried both. I still found that lora_B is all zeros:

>>> all((module.weight == 0.0).all() for name, module in model.named_modules() if "lora_B.default" in name)
True
>>> model.base_model.model.model.layers[0].self_attn.q_proj.lora_B["default"].weight
Parameter containing:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

Could you please verify if you find the same?

I try to use safetensors to load the adapter_model.safetensors, and find it is not all zeros. Here is my code:

from safetensors import safe_open
with safe_open('adapter_model.safetensors',framework="pt", device='cpu') as f:
    tensors = {}
    for k in f.keys():
        tensors[k] = f.get_tensor(k)
print(tensors['base_model.model.layers.0.self_attn.q_proj.lora_B.weight'])

tensor([[-0.0747, -0.0737, -0.0747,  ...,  0.0762, -0.0742, -0.0737],
        [-0.0391, -0.0400, -0.0400,  ...,  0.0415, -0.0396, -0.0371],
        [-0.0544, -0.0535, -0.0557,  ...,  0.0571, -0.0549, -0.0520],
        ...,
        [-0.0034, -0.0036, -0.0023,  ...,  0.0069, -0.0031, -0.0054],
        [-0.0042, -0.0039, -0.0055,  ...,  0.0023, -0.0058, -0.0034],
        [-0.0114, -0.0098, -0.0100,  ...,  0.0110, -0.0115, -0.0120]],
       dtype=torch.bfloat16)

Besides that, I have confirmed that this peft adapter can be accurately utilized in downstream task when it is loaded using the AutoModel.load_adapter() method. However, an issue arises when attempting to use it with the merged model. So I believe that the error is from the merging process.

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Okay, so the reason seems to be that there's a mismatch between the keys found in the adapter and the keys expected by the model. When I jump into this line and check the keys, I can see that they all mismatch:

>>> keys_found = sorted(adapters_weights.keys())
>>> keys_expected = sorted(self.state_dict())
>>> s0 = set(keys_expected)
>>> s1 = set(keys_found)
>>> len(s0), len(s1)
(627, 448)
>>> len(s0 - s1)
627
>>> len(s1 - s0)
448
>>> pp keys_found[:10]
['base_model.model.layers.0.mlp.down_proj.lora_A.weight',
 'base_model.model.layers.0.mlp.down_proj.lora_B.weight',
 'base_model.model.layers.0.mlp.gate_proj.lora_A.weight',
 'base_model.model.layers.0.mlp.gate_proj.lora_B.weight',
 'base_model.model.layers.0.mlp.up_proj.lora_A.weight',
 'base_model.model.layers.0.mlp.up_proj.lora_B.weight',
 'base_model.model.layers.0.self_attn.k_proj.lora_A.weight',
 'base_model.model.layers.0.self_attn.k_proj.lora_B.weight',
 'base_model.model.layers.0.self_attn.o_proj.lora_A.weight',
 'base_model.model.layers.0.self_attn.o_proj.lora_B.weight']
>>> pp keys_expected[:10]
['base_model.model.lm_head.weight',
 'base_model.model.model.embed_tokens.weight',
 'base_model.model.model.layers.0.input_layernorm.weight',
 'base_model.model.model.layers.0.mlp.down_proj.base_layer.weight',
 'base_model.model.model.layers.0.mlp.down_proj.lora_A.default.weight',
 'base_model.model.model.layers.0.mlp.down_proj.lora_B.default.weight',
 'base_model.model.model.layers.0.mlp.gate_proj.base_layer.weight',
 'base_model.model.model.layers.0.mlp.gate_proj.lora_A.default.weight',
 'base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight',
 'base_model.model.model.layers.0.mlp.up_proj.base_layer.weight']

When creating a fresh model, I can confirm that the latter is the correct format for this model. My adapter_model.safetensors also only contained 336 entries, not 448 as in your file.

I'm not sure what exactly happened that causes the adapter you have to use a different format, maybe there was a change in the version of PEFT or transformers between creating the adapter and loading it?

To ensure that there is no bug in PEFT, I confirmed that it's possible to save and load an adapter with qwen:

>>> from transformers import AutoModelForCausalLM
>>> from peft import get_peft_model, LoraConfig, PeftModel

>>> base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-1.8B")
>>> peft_model = get_peft_model(base_model, LoraConfig(target_modules=["up_proj", "q_proj", "down_proj", "k_proj", "o_proj", "gate_proj", "v_proj"], init_lora_weights=False))

>>> peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_A["default"].weight
Parameter containing:
tensor([[ 0.0061,  0.0040, -0.0037,  ...,  0.0030, -0.0103, -0.0040],
        [-0.0190,  0.0183,  0.0137,  ..., -0.0065,  0.0063, -0.0156],
        [ 0.0170,  0.0203,  0.0184,  ..., -0.0191,  0.0132, -0.0176],
        ...,
        [-0.0091,  0.0197, -0.0063,  ..., -0.0170, -0.0003,  0.0013],
        [ 0.0135,  0.0209, -0.0040,  ..., -0.0119,  0.0159,  0.0164],
        [ 0.0003,  0.0220, -0.0092,  ...,  0.0070,  0.0012,  0.0212]],
       requires_grad=True)

>>> peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_B["default"].weight
Parameter containing:
tensor([[-0.3462, -0.1964,  0.0248,  ...,  0.1943, -0.1583, -0.2640],
        [-0.0430,  0.3114,  0.1676,  ..., -0.0210,  0.1741,  0.2173],
        [ 0.0789,  0.2819, -0.1108,  ..., -0.1683,  0.1381, -0.3278],
        ...,
        [ 0.1441, -0.0852,  0.2126,  ..., -0.0384, -0.1946,  0.3313],
        [-0.2722,  0.2995,  0.2065,  ...,  0.0393, -0.2830,  0.3083],
        [ 0.0508,  0.2045,  0.0730,  ...,  0.1732,  0.3274,  0.0733]],
       requires_grad=True)

>>> peft_model.save_pretrained("/tmp/peft/qwen")
>>> del base_model
>>> del peft_model

>>> base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-1.8B")
>>> peft_model = PeftModel.from_pretrained(base_model, "/tmp/peft/qwen")
>>> peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_A["default"].weight
Parameter containing:
tensor([[ 0.0061,  0.0040, -0.0037,  ...,  0.0030, -0.0103, -0.0040],
        [-0.0190,  0.0183,  0.0137,  ..., -0.0065,  0.0063, -0.0156],
        [ 0.0170,  0.0203,  0.0184,  ..., -0.0191,  0.0132, -0.0176],
        ...,
        [-0.0091,  0.0197, -0.0063,  ..., -0.0170, -0.0003,  0.0013],
        [ 0.0135,  0.0209, -0.0040,  ..., -0.0119,  0.0159,  0.0164],
        [ 0.0003,  0.0220, -0.0092,  ...,  0.0070,  0.0012,  0.0212]])

>>> peft_model.base_model.model.model.layers[0].self_attn.q_proj.lora_B["default"].weight
Parameter containing:
tensor([[-0.3462, -0.1964,  0.0248,  ...,  0.1943, -0.1583, -0.2640],
        [-0.0430,  0.3114,  0.1676,  ..., -0.0210,  0.1741,  0.2173],
        [ 0.0789,  0.2819, -0.1108,  ..., -0.1683,  0.1381, -0.3278],
        ...,
        [ 0.1441, -0.0852,  0.2126,  ..., -0.0384, -0.1946,  0.3313],
        [-0.2722,  0.2995,  0.2065,  ...,  0.0393, -0.2830,  0.3083],
        [ 0.0508,  0.2045,  0.0730,  ...,  0.1732,  0.3274,  0.0733]])

It would probably be possibly to salvage this adapter by remapping the keys from its state_dict to what's actually expected.

from peft.

shjunn avatar shjunn commented on June 2, 2024

Thanks for uploading a safetensors version. Your zip file seems to contain the same checkpoint twice, but they appear to be identical, I tried both. I still found that lora_B is all zeros:

>>> all((module.weight == 0.0).all() for name, module in model.named_modules() if "lora_B.default" in name)
True
>>> model.base_model.model.model.layers[0].self_attn.q_proj.lora_B["default"].weight
Parameter containing:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

Could you please verify if you find the same?

System Info.
peft==0.10.0
transformers==4.37.2

Comment
I had the same issue as above. (I trained custom lora adapter model and it has lora_A and lora_B layers with non-zero weights)
I tried different way and that is using the method PeftModel.from_pretrained()
For the right above method arguments, model=my_model_path and model_id=my_adapter_model_path

Below code is what I used.

model = AutoModelForCausalLM.from_pretrained("my_model")
lora_model = PeftModel.from_pretrained(model=model, model_id="my_adapter_model_folder_path")

I also debugged for a day, there is some mismatch while running set_peft_model_state_dict() in src/peft/utils/save_and_load.py and I found something.

k = k.replace(suffix_to_replace, f"{adapter_name}.{suffix_to_replace}")

With this original code line, k became base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.default.weight
But I need k as base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
The problem is duplicated default in k.
So, I fixed the line 234 as k = k.replace(suffix_to_replace, f"{suffix_to_replace}")
In my case, it works and merging my custom model and my custom lora adapter model succeeded.
@BenjaminBossan I wanna know why the suffix replacing rule is needed and could it be fixed? I tried only LoRA adapter and LlamaModel as base model, so my solution can have other problems on other models.

Thanks for reading all this!

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

The problem is duplicated default in k.

Yes, this is very strange and should definitely not happen. Would it be possible for you to share the code how you created and saved the adapters (training code should not be necessary here), as well as how you load the adapter? I need to see that in order to figure out how this duplication could have occurred.

from peft.

shjunn avatar shjunn commented on June 2, 2024

Yes, this is very strange and should definitely not happen. Would it be possible for you to share the code how you created and saved the adapters (training code should not be necessary here), as well as how you load the adapter? I need to see that in order to figure out how this duplication could have occurred.

Sure, the way I created my custom adapter is simple. Set LoraConfig with r, alpha, dropout and load a wrapped model(get_peft_model()) and then trained it with deepspeed stage3. (I followed quick tour of PEFT in huggingface webpage).
The way I load my custom adapter is this.(Above two code lines)

model = AutoModelForCausalLM.from_pretrained("my_model")
lora_model = PeftModel.from_pretrained(model=model, model_id="my_adapter_model_folder_path")

For more detail, that duplicated default happened in

for k, v in state_dict.items():
if parameter_prefix in k:
suffix = k.split(parameter_prefix)[1]
if "." in suffix:
suffix_to_replace = ".".join(suffix.split(".")[1:])
k = k.replace(suffix_to_replace, f"{adapter_name}.{suffix_to_replace}")
else:
k = f"{k}.{adapter_name}"
peft_model_state_dict[k] = v

while adapter_name is set to "default"

I confirmed that lora layers names saved in my_adapter_model are identical in adapter_weights which is loaded when codeline below was being executed.

adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)

  1. example of lora layer name saved in adapter and loaded from load_peft_weights()
    base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight
  2. example of lora layer name while for loop is being executed
    base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.default.weight

If my way to load adapter is wrong, please let me know. Thanks!

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Could you additionally describe how you saved the adapter? Moreover, if you can share the adapter, that would help.

from peft.

shjunn avatar shjunn commented on June 2, 2024

I used transformers.Trainer.train(), Trainer.save_state() and, in the end, Trainer.save_model().
Deepspeed saved several files of states in checkpoints and I used zero_to_fp32.py(which is provided from Deepspeed) to save a single adapter model as safetensors.

Sorry that I cannot share the adapter because it was trained with in-house data.

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

I see, that makes sense. Since you cannot share the adapter, could you share its general structure? I.e.:

from safetensors.torch import load_file
weights = load_file("<PATH>/adapter_model.safetensors")  # path to adapter weights saved by your training script
print([(k, v.shape) for k, v in weights.items()][:30])  # print 30 keys and the weight shapes

from peft.

shjunn avatar shjunn commented on June 2, 2024

Okay. I'll try tomorrow at the office.
By the way, does adapter_weights in the third line is right? There is weights in second line.

from peft.

BenjaminBossan avatar BenjaminBossan commented on June 2, 2024

Okay. I'll try tomorrow at the office.

Thanks a lot.

By the way, does adapter_weights in the third line is right? There is weights in second line.

Yes, sorry, I made some changes to the snippet but missed that line, edited snippet should be correct now.

from peft.

shjunn avatar shjunn commented on June 2, 2024

Its general structures are below.

base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight, torch.Size([8, 8192])
base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight, torch.Size([8192, 8])

I think that I should re-train base model with LoRA config, re-convert lora adapter to safetensors, and re-load adapter and re-merge it with base model.
Maybe re-doing that process will give me something.
It would take few days. If you got a solution, would you please notice me?

from peft.

github-actions avatar github-actions commented on June 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

from peft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.