Git Product home page Git Product logo

dromedary's Introduction

Dromedary Logo

NeurIPS 2023 (Spotlight)

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Code License Data License

Introduction

Dromedary is an open-source self-aligned language model trained with minimal human supervision. For comprehensive details and insights, we kindly direct you to our project page and paper.

Dromedary Pipeline

Update: Dromedary-2 (SFT)

The new SELF-ALIGN process in Dromedary-2 only involves two stages, We replace the first stage with diverse user prompts from ShareGPT, Dolly-15k, OpenAssistant, and OpenOrca, and create an improved prompt with one additional exemplar that encourages the LLM AI-assistant to generate responses in a general-specific-general response style, i.e., initiate with an overview, delve into specifics, and wrap up with a summary. Specifically, we directly take the one-shot exemplar from FastChat as this additional exemplar.

By utilizing the new principle-driven self-alignment prompt, we found that the LLaMA-2 base model with the improved ICL exemplars can achieve enhanced performance even without the verbose cloning phase nor inference-time few-shot examples. Therefore, we also drop the last stage of the original SELF-ALIGN process.

Dromedary-2 (RLAIF)

The SALMON (Self-ALignMent with principle-fOllowiNg reward models) training pipeline of Dromedary-2 can be found in the IBM/SALMON repository.

Original Dromedary

The repo for the original Dromedary release is in the dromedary_v1 branch.

Setup

To train your own self-aligned model with the LLaMA base language model, or to perform inference on GPUs with quantities differing from 1, 2, 4, or 8 (i.e., any power of 2), you should install our customized llama_dromedary package.

In a conda env with pytorch / cuda available, run:

cd llama_dromedary
pip install -r requirements.txt
pip install -e .
cd ..

Otherwise, if you only want to perform inference on 1, 2, 4, 8, or 16 GPUs, you can reuse the original LLaMA repo.

git clone https://github.com/facebookresearch/llama.git
cd llama
pip install -r requirements.txt
pip install -e .
cd ..

In addition, you should at least install the packages required for inference:

cd inference
pip install -r requirements.txt

Model Weights

We release Dromedary weights as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Dromedary weights. Instructions:

  1. Get the original LLaMA weights in the Hugging Face format by following the instructions here.
  2. Download the LoRA delta weights from our Hugging Face model hub.
  3. Follow our inference guide to see how to deploy Dromedary/LLaMA on your own machine with model parallel (which should be significantly faster than Hugging Face's default pipeline parallel when using multiple GPUs).

Synthetic Data for Self-Align

We release the synthetic data used to train Dromedary-65b (final) in Hugging Face Datasets Hub here.

The instructions are generated by the base LLaMA model with the (Topic-Guided Red-Teaming) Self-Instruct framework, while the responses are generated by the Dromedary (non-verbose) model prompted with the verbose prompt.

Update: We also release the synthetic data used to train Dromedary-2-70b (SFT) in Hugging Face Datasets Hub here.

Inference

We provide a chatbot demo for Dromedary.

Training

We provide the full training pipeline of Dromedary for reproduction.

Prompts

All the human annotations used in this project can be found here.

Citation

Please cite the following paper if you use the data or code in this repo.

@inproceedings{sun2023principle,
      title = {Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision},
      author = {Sun, Zhiqing and Shen, Yikang and Zhou, Qinhong and Zhang, Hongxin and Chen, Zhenfang and Cox, David and Yang, Yiming and Gan, Chuang},
      booktitle = {Thirty-seventh Conference on Neural Information Processing Systems},
      year = {2023},
      url = {https://openreview.net/forum?id=p40XRfBX96},
}

Acknowledgements

We thank Yizhong Wang for providing the code for the parse analysis plot. We also thank Meta LLaMA team, Standford Alpaca team, Vicuna team, Alpaca-LoRA, QLoRA team, and Hugging Face PEFT for their open-source efforts in democratizing large language models.

dromedary's People

Contributors

codeaimcts avatar dchichkov avatar edward-sun avatar eltociear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dromedary's Issues

Red team instructions without red-team chain of thought?

Thank you for the super interesting paper! I really appreciate the idea of using challenging synthetic data to improve alignment.

One question: I see in https://github.com/IBM/Dromedary/blob/main/training/step1_topic_guided_red_teaming_self_instruct/merge_all_synthetic_inputs.py that all the synthetic data is merged into one file, and the ICL step does not use any hints from the red-teamed data to prompt chain of thought. Can you say a little bit about why this decision?

An alternative could be to use information about the kind of red-teamed prompt to guide the chain-of-thought reasoning in the self-align prompt; e.g. Questions that require knowledge of future events could trigger your rule about (dated knowledge)?

About the way to generate 99,121 synthetic prompts from TGRT Self-Instruct

Hello,Thanks for your awesome work and code!

However, I encountered some confusion while trying to understand how you generated TGRT Self Instruction. You mentioned in the article that you first handwrite 20 instruction types and then generated some topics from these types. Finally, instructions were generated by the “instruction type - topic" pair.

Therefore, my first question is:
How many topics have you generated with each instruction type? I see in Appendix G that your prompt generates 10 topics for each instruction type.

My second question is :
How many instructions will be generated for each "instruction type - topic" pair? Because you finally get 99,121 synthetic prompts from TGRT Self-Instruct, if every "instruction type - topic" pair generates only one instruction, does it mean you at least generate 99,121 topics?

Thanks a lot for your help!

adapter_name problem

Hi! I encounter an issue that when doing the Step3(SFT).

The function "get_accelerate_model" in qlora_model.py sets the adapter_name="lora_default". This results in an error that the trainable parameters are set to 0.0 rather than 1.6% of the full parameters:

def get_accelerate_model(
    args: Namespace,
    checkpoint_dir: Optional[str] = None,
    adapter_name="lora_default",
    is_trainable=True,
    reuse_base_model=False,
):

I fix this by setting the adapter_name="default". I am finetuning a llama-2-7b-hf model and I wonder if it is a bug or an issue caused by the different finetuned model(7b and 70b)

About vicuna_dummy_data.json lack 'example_id'

Hi! I encounter a bug when doing the step3 (Principle Engraving). I used the self_align_merged.json which is created with "self_align_32shards_*.jsonl" and "vicuna_dummy_data.json" to finetune the base model.

However, I find that vicuna_dummy_data.json file items do not have 'example_id' labels. It results in a bug when execute function "extract_dromedary_dataset":

def extract_dromedary_dataset(example, meta_prompts):
    assert "example_id" in example
    total_meta_prompt = len(meta_prompts)
    meta_prompt = meta_prompts[int(example["example_id"]) % total_meta_prompt]
    if example.get("input", "") != "":
        prompt_format = DROMEDARY_PROMPT_DICT["prompt_input"]
    else:
        prompt_format = DROMEDARY_PROMPT_DICT["prompt_no_input"]
    return {
        "input": prompt_format.format(meta_prompt=meta_prompt, **example),
        "output": "\n" + example["output"],
    }

The vicuna_dummy_data are all labeled "example_id" = None, and result in a int error.

Therefore, I wonder how to deal with this issue and correctly get the vicuna_dummy_data example_ids.Thanks a lot for your reply!

Topic-guided red-teaming: issues with prompt/examples

  1. Topic-guided red-teaming

I am trying to understand how this step works, and some differences between what the paper says and what the prompt implies.

The paper seems to imply this step is mainly for guiding instruction generation along topics, including difficult to answer ones, but the prompt explicitly asks for all instructions to be of the type

a machine learning model can't answer, or will answer with the wrong facts.

However, the examples are a mix, with e.g. the following ones being quite reasonable to expect a model to be able to answer nowadays.

type: Instructions that require historical knowledge, topic: Battle of Waterloo, instruction: What was the significance of the Battle of Waterloo in European history?
type: Instructions that require technology knowledge, topic: Storage, instruction: What is the difference between a solid-state drive (SSD) and a hard disk drive (HDD)?

Could you give me some background on what led to the prompt being based on all "impossible" questions?

  1. Balance of vanilla vs topic-based
    How many vanilla and how many topic-based samples did you generate, and are these datasets available anywhere?

  2. Count in prompt
    This is fairly minor, but the prompt has

* 20 Hints:
* 20 Instructions:

But you run this with 10 items, correct?

Thank you for any clarifications.

What a human annotator should do?

May be a stupid question, I am not sure what prompts messages I should edit as a human annotator.

https://github.com/IBM/Dromedary/tree/main/prompts

It seems I should give customized hint and instruction in the tgrt_self_instruct_question_generation_prompt.txt file?

What about the seed tasks? Can they be customized to specific areas?

I am not sure I understand what I should touch and What I should not touch

How many GPUs were used for training?

Thank you very much for all the info provided.

But the one thing I couldn't seem to find is details of how many GPUs (and what GPUs) you used to run finetune.py?

It'd be great to see the deepspeed/torchrun command you executed to do the training.

evaluation code

Hi, thanks for your excellent work.

Could you please share your evaluation code on truthful QA and BBH?

Time & Cost for this training

Hi,

What is the approximate time & cost of training a dromedary 65B model?

You have set a time limit of 6 hours in each training step, and you are using a 64 * 6 V100GPU cluster. So the training cost will be approximately 1 day of renting a 64 * 6 V100 cluster?

I am trying to replicate a 7B LLaMa model with a dromedary pipeline.

Also, looking forward seeing your sharing the evaluation code mentioned in TODO list soon 👍

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.