Comments (11)
我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。
以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。
chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。
from internlm.
如果避免出现空格,似乎可以改善现象
from internlm.
重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:
prompts = [[
{
'role': 'system',
'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容'
},
{
'role': 'user',
'content': '待翻译的文本'
},]
response = self.model(prompts, gen_config)
from internlm.
重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:
prompts = [[ { 'role': 'system', 'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容' }, { 'role': 'user', 'content': '待翻译的文本' },] response = self.model(prompts, gen_config)
仍然未改善 哭泣,还是有类似现象
from internlm.
from internlm.
我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。 以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。 chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。
感觉 ,得等下一版本?
from internlm.
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.
from internlm.
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.
from internlm.
This issue is closed because it has been stale for 7 days. Please open a new issue if you have similar issues or you have any new updates now.
from internlm.
请问你解决了么?是哪里有问题呀 ?我也出现同样的问题了
from internlm.
hi, @hotmengmeng 请问用的是 lmdeploy 哪个版本?
from internlm.
Related Issues (20)
- [QA] 对于internlm2系列模型中的SFT、RHLF模型的细节问题 HOT 2
- [QA] internlm2.5的function call的template的来源 HOT 6
- [Feature] 关于模型部署后,模型调用报错 core dumped HOT 1
- [Bug] 在将internlm转换为llama之后,使用转换后的llama tokenizer对prompt做embedding会超出词表范围 HOT 3
- [Bug] internlm2_5-20b-chat量化报错 HOT 3
- [QA] sft 训练数据格式 HOT 2
- [QA] 多轮意图理解 HOT 2
- [QA] What is the size of the native Context window? HOT 3
- [Bug] Unable to run inference using internlm/internlm2_5-7b-chat-4bit HOT 4
- [QA] 推理时遇到报错? HOT 2
- [QA] 我按照官方文档的步骤,尝试调用书生浦语的API来回答问题,但是报错:模型已下架 HOT 3
- [QA] Why an OpenAI account is needed for long context demo? HOT 3
- [QA] 请问reward模型的template是什么形式呢?有例子可以参考吗 HOT 2
- [QA] internlm2 function call的模板文档和lmdeploy不一致 HOT 2
- [QA] 多轮对话微调的长度限制 HOT 2
- [Bug] internlm2_5-7b-chat-4bit 无法使用vllm加速推理 HOT 6
- [Feature] Ollama vision support HOT 1
- [Feature] 是否有计划支持json输出 HOT 11
- [QA] Is there a 4 bit awq model for internlm 2_5-20b-chat ? HOT 3
- [Feature] internlm2.5的reward model?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from internlm.