Thank you for sharing I'm trying to train models using my Chinese SF

Questions related to training about cherry_llm HOT 5 CLOSED

JieDengsc commented on July 28, 2024

Questions related to training

from cherry_llm.

Comments (5)

MingLiiii commented on July 28, 2024

Thanks for your interest in our work!

The direct answer for your Q1 is YES. We found that the best way to train a pre-experienced model is to consider diversity. Thus we try to gain the embeddings for all the data and select by diversity.
However:
1 If your base model is already really powerful, you can try to neglect the pre-experienced model and directly run the cherry_analysis on the base model.
2 You can also randomly choose some data for the training of the pre-experienced model. Though not as good as considering diversity, it still works.
3 You can also use other quick methods to consider the diversity. For example, sentence_bert + K means.

For the second question, I don't know what base model and what SFT data you use, so I can not give a definite answer. But I think in most situations, you don't need to modify it.

from cherry_llm.

JieDengsc commented on July 28, 2024

Thank you for your reply!

Because I saw the previous text saying "Learning from Brief Experience" by selecting a small amount of data, I'm not sure it's right to put all the data into it for training.
In addition, full data takes a long time to train.

I'll try it. Thank you.

from cherry_llm.

MingLiiii commented on July 28, 2024

Ah, I am not sure if there is still a misunderstanding.

For the pre-experienced model, it indeed only needs a small amount of data. The code "pre_experience_analysis.sh" you were asking for is not "put all the data into it for training", it just tries to select a suitable small amount of the data for training the pre-experienced model.

from cherry_llm.

JieDengsc commented on July 28, 2024

Thank you for your reply.

Maybe I'm not asking the question accurately.
The "pre_experience_analysis.sh" script does not perform training. It embeds all SFT data (that is, "get_perplexity_and_embedding_whole_text") and then uses the "pre_expeerience_selection.sh" script to perform clustering.

Is my understanding correct?

Thank you again

from cherry_llm.

MingLiiii commented on July 28, 2024

Yes, I think you are correct~

from cherry_llm.

Recommend Projects

Questions related to training about cherry_llm HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent