Deion Hi here! 🤗 I was wondering what's t

It looks like this change was made in <a class="commit-link" href="https://github.com/

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About DPO formatting before fine-tuning about alignment-handbook HOT 4 CLOSED

alvarobartt commented on June 9, 2024

About DPO formatting before fine-tuning

from alignment-handbook.

Comments (4)

eryk-mazus commented on June 9, 2024

I noticed the same thing today, while studying their code. Whether that's deliberate or not, I don't think it will hurt the performance if we won't add response tokens to the prompt

from alignment-handbook.

dctanner commented on June 9, 2024

It looks like this change was made in f0ffa0d#diff-0668e2e3ee795fdc034f50182f4719a5f8574357831f2e4705fa730ed2db5831L76 by @lewtun but I can't spot an explanation. It looks delivery, so it's probably safe to assume it doesn't affect performance.

from alignment-handbook.

alvarobartt commented on June 9, 2024

Hi @lewtun, friendly pinging you here!

Did you see any performance issue when adding the generation_prompt as part of the chosen and rejected pairs instead of keeping it within the prompt itself just as the former version? I'll be comparing both approaches, but just wondering whether there's an explanation backing the change, or simply because that worked better during the experiments you ran.

Thanks in advance 🤗

from alignment-handbook.

alvarobartt commented on June 9, 2024

Ok I've already run two full fine-tunes using DPO (similarly to the HuggingFaceH4/zephyr-7b-gemma-v0.1 recipe) and both approaches work similarly, so I guess there are no issues on adding the generation prompt as part of the chosen and rejected pairs, see the wandb screenshot below:

16bit is the full DPO fine-tune where the add_generation_prompt=True and then it's stripped from both chosen and rejected; while 16bit-no-gen-prompt is the full DPO fine-tune where add_generation_prompt=False and the chosen and rejected are tokenized normally.

from alignment-handbook.

Recommend Projects

About DPO formatting before fine-tuning about alignment-handbook HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent