Very impressive work. I would like to ask a question. The paper says that IFD is ineff

How to filter code SFT data？ about cherry_llm HOT 2 CLOSED

tianyi-lab commented on July 28, 2024

How to filter code SFT data？

from cherry_llm.

Comments (2)

MingLiiii commented on July 28, 2024

Thank you very much for your interest!
In the paper, we claim that IFD is ineffective for code data because there is so little code-related data that has a relatively high IFD score. So I think if you specifically want code-related data, you can increase the number of code data chosen. For example, directly calculating the IFD scores on code cluster to ensure the number of code data.

from cherry_llm.

wyjksyjs commented on July 28, 2024

Thank you very much for your interest! In the paper, we claim that IFD is ineffective for code data because there is so little code-related data that has a relatively high IFD score. So I think if you specifically want code-related data, you can increase the number of code data chosen. For example, directly calculating the IFD scores on code cluster to ensure the number of code data.

Thank you for your reply. Following your suggestion, my understanding is to train a initial model using a subset of code data separately, and then evaluate the IFD value of the full code data.

from cherry_llm.

How to filter code SFT data？ about cherry_llm HOT 2 CLOSED

Comments (2)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent