Git Product home page Git Product logo

Comments (4)

MingLiiii avatar MingLiiii commented on July 28, 2024

Thanks for your interest!

Below are some questions for me to know what is going on:

  1. What codebase are you using for sft?
  2. Can you specifically let me know your training setting?
  3. How much exactly is the slight performance gap?
  4. I don't think we showcase the performance for llama2 fine-tuning on cherry_data_v1, so which list of performance are you comparing your model with?
  5. Indeed previously there was one bro using the incorrect scripts. Can you send me your scripts used in lm-evaluation-harness for evaluation?

As for "data splits in cherry_data_v1 are not exactly 5%, 10%, or 15%", yes indeed. It is caused by our previous filtering mechanism. Thus in our paper, we all present "approximately 5% data" or so and present the specific data number in the paper.

from cherry_llm.

Cheungki avatar Cheungki commented on July 28, 2024

Thx for your quick reply.

  1. I used llama-factory for sft.
  2. I train these three models under the same settings as yours: batch_size 128, learning_rate 2e-5, num_train_epochs 3, warmup_ratio 0.03, max_length 2048. Due to hardware limitations, I use fp16 rather than bf16 in my training.
  3. As for the performance gap, please check the attached files below.
    cherry_5percent.json
    cherry_10percent.json
    cherry_15percent.json
  4. I'm comparing mine with the results you report here.
  5. I used CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --main_process_port 29999 -m lm_eval --model hf --model_args pretrained={sft_model_path} --tasks mmlu,ai2_arc,hellaswag,truthfulqa --batch_size 1 --output_path {log_path} in 'lm-evaluation-harness' for evaluation.

from cherry_llm.

MingLiiii avatar MingLiiii commented on July 28, 2024

Thanks for your reply~

  1. Since you are using a different codebase, so there is a great possibility that different prompts will lead to a different performance when using lm-evaluation-harness, as they don't support customized prompts. As mentioned, for llama2 models, we used vicuna prompt for training.
  2. I think the settings are the same as ours.
  3. N/A
  4. We are sorry for the misunderstanding. In this table, they are not using cherry_data_v1 (calculated based on llama1), but the "IFD scores are calculated on llama2-7b or llama2-13b". So there should be gaps if using cherry_data_v1 data.
    Also, the "IFD scores are calculated on llama2-7b or llama2-13b" is also released recently, please check: Alpaca llama2 7b, Alpaca llama2 13b, WizardLM70k llama2 7b, WizardLM70k llama2 13b.
    You might need to sort the data by yourself.
  5. It seems that all of the testing scripts you are using are Zeroshot, however, according to the open_llm_leaderboard site, most of them are using the few-shot. So I think this is the main reason.

To conclude, the main reason is that you are not using the few-shot settings mentioned in the open_llm_leaderboard. Besides, it's better to use "IFD scores are calculated on llama2-7b or llama2-13b".

from cherry_llm.

Cheungki avatar Cheungki commented on July 28, 2024

ic, I'll try this later. Thx again.

from cherry_llm.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.