Comments (11)
"wte", "lm_head"表示模型的embedding和输出层的参数?
from:
注意,如果你使用预训练模型进行LoRA微调,而非chat模型,模型的embedding和输出层的参数将被设为可训练的参数。这是因为预训练模型没有学习过ChatML格式中的特殊token,因此需要将这部分参数设为可训练才能让模型学会理解和预测这些token。这也意味着,假如你的训练引入新的特殊token,你需要通过代码中的modules_to_save
将这些参数设为可训练的参数。如果你想节省显存占用,可以考虑使用chat模型进行LoRA微调,显存占用将大幅度降低。
from qwen-vl.
出错位置在这,site-packages/peft/auto.py
_target_peft_class.from_pretrained(
base_model,
pretrained_model_name_or_path
请问什么原因,谢谢
from qwen-vl.
你lora微调的时候改变了词表大小?
from qwen-vl.
没有
Qwen-VL会出错,
我选择了新的预训练模型Qwen-VL-Chat
若保存了"wte", "lm_head"参数,如何应用到merge模型?peft会自动合并这个参数吗?
另,lora训练后数参数,如何知道正在合并到了新的整体模型中了,(目前问题是lora训练loss下降明显趋于稳定,但是推理时发现跟没训练几乎没区别)
谢谢
from qwen-vl.
请问model.transformer.ln_f是论文里面的adapter层(交叉注意力)吗?
from qwen-vl.
有同样的问题,如果是用的直接下载位置的模型,就可以,挪动位置后就会报错,不知道为啥
from qwen-vl.
能否帮忙解答一下,谢谢
from qwen-vl.
这里我也遇到了相应的问题,目前我理解的应该是vocab_size没有对齐造成的,目前模型给出的self.tokenizer.n_vocab的长度为151860 , 这个数字是qwen.tiktoken的长度151643 + 217个特殊字符的个数计算而来,而模型的配置文件中的长度为 "vocab_size": 151936 , 造成Qwen-VL经过lora微调后无法对齐,目前还缺少76个字符,但我不太清楚其余76个字符是什么,所以我只能自作主张的修改Qwen-VL文件夹下的tokenization_qwen.py的45行:
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(281)))
将extras由原来的205生成到281,填补了76个tokens,再继续训练,目前这个方案对我来说是可以完成模型合并的。
from qwen-vl.
Related Issues (20)
- 关于图片描述,如果有多个描述,能否在标注文件中都加入?加入的话格式如何? HOT 4
- Qwen-VL多模态的的图片识别幻觉太严重了,识别不了就无中生有,这是有参数可以设置的吗,例如只输出能识别的部分。 HOT 1
- 请教关于微调训练finetune HOT 30
- chartQA的test集使用的是chartqa_test_human 还是 chartqa_test_augmented?
- PermissionError: [Errno 13] Permission denied: 'SimSun.ttf'[BUG] <title> HOT 1
- 💡 [REQUEST] - <title> Could you add the evaluation of ConBench. HOT 1
- [BUG] TypeError: isin() received an invalid combination of arguments.
- typeError: isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor,), but expected one of HOT 4
- RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!
- 关于推理预测 HOT 10
- [BUG] 在利用vl-plus的api进行图片总结时,如果多张图片名称相同,只会按照第一张图片内容进行总结
- Discussion closed HOT 1
- [BUG] <title> deepspeed expected the next 1 parameters in the parameter fetch queue to be
- [BUG] <'Only Support Self-Attention Currently' Assert Error> HOT 1
- Pretrain数据格式
- [BUG] <qwen-vl api 在阿里云ecs 上调用出现 网络连接错误>
- 关于 chat模型 和 base模型的微调
- 多卡推理错误[BUG] <title>
- 如何使用langchain使用qwen-vl_max 模型
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen-vl.