Git Product home page Git Product logo

Comments (11)

elesun2018 avatar elesun2018 commented on July 3, 2024

"wte", "lm_head"表示模型的embedding和输出层的参数?
from:
注意,如果你使用预训练模型进行LoRA微调,而非chat模型,模型的embedding和输出层的参数将被设为可训练的参数。这是因为预训练模型没有学习过ChatML格式中的特殊token,因此需要将这部分参数设为可训练才能让模型学会理解和预测这些token。这也意味着,假如你的训练引入新的特殊token,你需要通过代码中的modules_to_save将这些参数设为可训练的参数。如果你想节省显存占用,可以考虑使用chat模型进行LoRA微调,显存占用将大幅度降低。

from qwen-vl.

elesun2018 avatar elesun2018 commented on July 3, 2024

image
出错位置在这,site-packages/peft/auto.py
_target_peft_class.from_pretrained(
base_model,
pretrained_model_name_or_path
请问什么原因,谢谢

from qwen-vl.

jweihe avatar jweihe commented on July 3, 2024

你lora微调的时候改变了词表大小?

from qwen-vl.

elesun2018 avatar elesun2018 commented on July 3, 2024

没有
Qwen-VL会出错,
我选择了新的预训练模型Qwen-VL-Chat

若保存了"wte", "lm_head"参数,如何应用到merge模型?peft会自动合并这个参数吗?
另,lora训练后数参数,如何知道正在合并到了新的整体模型中了,(目前问题是lora训练loss下降明显趋于稳定,但是推理时发现跟没训练几乎没区别)
谢谢

from qwen-vl.

elesun2018 avatar elesun2018 commented on July 3, 2024

请问model.transformer.ln_f是论文里面的adapter层(交叉注意力)吗?

from qwen-vl.

littlepan0413 avatar littlepan0413 commented on July 3, 2024

有同样的问题,如果是用的直接下载位置的模型,就可以,挪动位置后就会报错,不知道为啥

from qwen-vl.

elesun2018 avatar elesun2018 commented on July 3, 2024

能否帮忙解答一下,谢谢

from qwen-vl.

FuHTong avatar FuHTong commented on July 3, 2024

这里我也遇到了相应的问题,目前我理解的应该是vocab_size没有对齐造成的,目前模型给出的self.tokenizer.n_vocab的长度为151860 , 这个数字是qwen.tiktoken的长度151643 + 217个特殊字符的个数计算而来,而模型的配置文件中的长度为 "vocab_size": 151936 , 造成Qwen-VL经过lora微调后无法对齐,目前还缺少76个字符,但我不太清楚其余76个字符是什么,所以我只能自作主张的修改Qwen-VL文件夹下的tokenization_qwen.py的45行:
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(281)))
将extras由原来的205生成到281,填补了76个tokens,再继续训练,目前这个方案对我来说是可以完成模型合并的。

from qwen-vl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.