Git Product home page Git Product logo

Comments (11)

RangiLyu avatar RangiLyu commented on September 25, 2024 1

我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。
以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。
chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。

from internlm.

sanbuphy avatar sanbuphy commented on September 25, 2024

如果避免出现空格,似乎可以改善现象

from internlm.

RangiLyu avatar RangiLyu commented on September 25, 2024

重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:

prompts = [[
{
    'role': 'system',
    'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容'
},
{
    'role': 'user',
    'content': '待翻译的文本'
},]
response = self.model(prompts, gen_config)

from internlm.

sanbuphy avatar sanbuphy commented on September 25, 2024

重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:

prompts = [[
{
    'role': 'system',
    'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容'
},
{
    'role': 'user',
    'content': '待翻译的文本'
},]
response = self.model(prompts, gen_config)

仍然未改善 哭泣,还是有类似现象

image

from internlm.

sanbuphy avatar sanbuphy commented on September 25, 2024

image
有时候还会有这样的问题

from internlm.

sanbuphy avatar sanbuphy commented on September 25, 2024

我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。 以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。 chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。

感觉 ,得等下一版本?

from internlm.

github-actions avatar github-actions commented on September 25, 2024

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.

from internlm.

github-actions avatar github-actions commented on September 25, 2024

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.

from internlm.

github-actions avatar github-actions commented on September 25, 2024

This issue is closed because it has been stale for 7 days. Please open a new issue if you have similar issues or you have any new updates now.

from internlm.

hotmengmeng avatar hotmengmeng commented on September 25, 2024

请问你解决了么?是哪里有问题呀 ?我也出现同样的问题了

from internlm.

lvhan028 avatar lvhan028 commented on September 25, 2024

hi, @hotmengmeng 请问用的是 lmdeploy 哪个版本?

from internlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.