Git Product home page Git Product logo

mt-llama's Introduction

Multi-task LLaMA

This repository provides training and evaluation code for Multi-task LLaMA (MT-LLaMA), a LLaMA variant fine-tuned on multi-task natural language prompted datasets.

# Sentiment Analysis:
Human: a film made with as little wit, interest, and professionalism as artistically possible for a slummy hollywood caper flick. Based on this review, would the user recommend this product? No or Yes?
MT-LLaMA: no

# Multi-Choice Question Answering:
Human: There is most likely going to be fog around: Which is the correct answer? - (A) a marsh - (B) a tundra - (C) the plains - (D) a desert
MT-LLaMA: A

# Sentence Selection:
Human: The flame on the candle went out. This happened because... Help me pick the more plausible option: - I blew on the wick.. - I put a match to the wick.
MT-LLaMA: I blew on the wick..

# Textual Entailment:
Human: The new rights are nice enough Question: Does this imply that Everyone really likes the newest benefits ? Please response with 'Yes', 'No', or 'Maybe'.
MT-LLaMA: Yes, everyone likes the newest benefits.

Model Weights

Our MT-LLaMA is already on the Huggingface Model Hub. Since the application of original LLaMA weights is required, we only upload the weight delta against original LLaMA

  • Get the original LLaMA weights in the huggingface format by following the instructions here.
  • Download the weight delta from here and obtain the weights of MT-LLaMA by adding delta weights to the original ones.
python3 -m mtllama.model.apply_delta.py \
    --base-model-path /path/to/llama-7b \
    --target-model-path /output/path/to/mt-llama-7b \
    --delta-path /path/to/mt-llama-7b-delta

Prerequisite

# install pytorch
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

# install flash-attention
pip install flash-attn==0.2.8

# install other dependencies
pip install -r requirement.txt

Train Models

We fine-tune LLMs with human language instructions from massive tasks in P3 (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below:

  • Multi-choice QA: CommonsenseQA, Cosmos QA, DREAM, QuAIL, QuaRTz, QASC, QuaRel, SciQ, Social IQA, Wiki Hop, WiQA
  • Extractive QA: Adversarial QA, DuoRC, Quoref, ROPES
  • Close-Book QA: Hotpot QA, Wiki QA
  • Sentiment Classification: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
  • Topic Classification: AG News, DBPedia, TREC
  • Structure-to-Text Generation: Common Gen, Wiki Bio
  • Text Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
  • Paraphrase Identification: MRPC, PAWS, QQP

Sampling Strategy

We exactly follow the downsampling strategy of Bigscience T0 to get the mixture data of multiple tasks. Specifically, for extremely large datasets with over 500’000 examples, we regard it as having 500’000 / num_templates examples, where num_templates is the number of templates created by Bigscience T0 for the dataset. We then do sampling according to the normalized size of the above datasets. Totally, we sample 20'000'000 examples for training.

Hyperparameter Settings

Effective Batch Size Learning rate Steps Max length Weight decay
16 5e-6 300K 2048 0

Zero-shot Evaluation

We primarily follow the protocols of Bigscience T0 to assess the generalization capability of our Multi-task LLaMA to: (1) Unseen Datasets (i.e., datasets from seen tasks); (2) Unseen Tasks.

Prompt Format

Extractive QA:

  1. XQuAD, TyDiQA, MLQA, SQuAD
     Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer:
     Output: ${Answer}
    

Sentiment:

  1. SST-2
    Input: ${sentence} Based on this review, would the user recommend this product? No or Yes?
    Output: Yes / No
    

Multiple-Choice QA:

  1. OpenbookQA
    Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD}
    Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD}
    

Sentence Completion:

  1. COPA
    Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2}
    Output: ${text1} / ${text2}
    

Coreference Resolution:

  1. Winogrande:
    Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}?
    Output: ${option1} / ${option2}
    

Word Sense Disambiguation:

  1. WiC
    Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2}
    Output: ${sentence1} / ${sentence2}
    

Natural Language Inference:

  1. MNLI:
    Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'.
    Output: Yes / No / Maybe
    
  2. RTE
    Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no?
    Output: Yes / no
    

Results on Unseen Datasets

Model XQuAD-en (F1/EM) TyDiQA-en (F1/EM) MLQA-en (F1/EM) SQuAD (F1/EM) SST-2 (Acc.) OpenbookQA (Acc.)
LLaMA-7b 9.5 / 2.0 14.3 / 2.6 13.4 / 3.3 29.4 / 11.5 50.5 32.4
MT-LLaMA-7b 42.3 / 31.1 38.9 / 26.9 45.4 / 31.5 85.9 / 77.6 92.6 38.2

Results on Unseen Tasks

Model COPA (Acc.) Winogrande (Acc.) WiC (Acc.) MNLI (Acc.) RTE (Acc.)
LLaMA-7b 56.0 49.3 51.7 30.2 52.7
MT-LLaMA-7b 88.0 54.9 52.2 49.6 79.1

Acknowledgement

  • Our training codes are largely borrowed from FastChat
  • We are also grateful for the efforts of LLaMA (from FAIR) and T0 (from BigScience), which serve as the foundation of our work

If you find this resource useful, please cite the repo as follows:

@software{damonlpsg2023mtllama,
  author = {Xu, Weiwen and Li, Xin and Bing, Lidong},
  title = {Multi-task Instruction-tuned LLaMA},
  year = 2023,
  url = {https://github.com/DAMO-NLP-SG/MT-LLaMA}
}

mt-llama's People

Contributors

lixin4ever avatar wwxu21 avatar

Stargazers

L. Jiang avatar WenWen avatar Tang Yixuan avatar Hang Zhang (张航) avatar Yingfei(Jeremy) Xiang avatar QianyuH avatar Meng Zhou avatar  avatar  avatar  avatar  avatar Lidong Bing avatar

Watchers

Lidong Bing avatar  avatar Xuan-Phi Nguyen avatar Cheng Liying 程丽颖 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.