Git Product home page Git Product logo

flan-alpaca-lora's Introduction

๐Ÿฎ๐Ÿฆ™๐ŸคFlan-Alpaca-LoRA: Instruction Tuning from Humans and Machines with Low-Rank Adaptation

This repo trains google/flan-t5 on alpaca dataset with low-rank adaptation training method. It reduces the GPU memory needed and speeds the training.

Jun 17, 2023: add a notebook. You can try flan-alpaca-lora with Open In Colab now.

May 3, 2023: train flan-t5-xl using alpaca-gpt4 dataset.

Apr 13, 2023: train flan-t5-xl using GPTeacher dataset (Instruct and Roleplay), which seems to perform well.

Apr 5, 2023: train flan-t5-xxl using 8bit quantization. The model can be fitted into a single 3090 GPU. All of the models can be found in huggingface.

model adapter_params data GPU time
flan-alpaca-lora-base 0.9M alpaca cleaned 3090 20mins
flan-alpaca-lora-large 2.4M alpaca cleaned 3090 50mins
flan-alpaca-lora-xl 4.7M alpaca cleaned 3090 2.5hrs
flan-alpaca-lora-xxl 9.4M alpaca cleaned 3090 10hrs
flan-gpteacher-lora-xl 4.7M GPTeacher 3090 80mins
flan-alpaca-gpt4-lora-xl 4.7M alpaca-gpt4 3090 3.25hrs

Dependencies

torch == 1.13.1
transformers == 4.29.1
peft == 0.3.0
bitsandbytes==0.38.1
accelerate==0.19.0

Newest version of these packages should work fine.

Training

The following command finetune Flan-T5-base with only 20 mins on a single 3090 GPU

python train.py \
    --model_name_or_path google/flan-t5-base \
    --data_path ./alpaca_data_cleaned.json \
    --bf16 True \
    --output_dir ./ckpts/ \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "no" \
    --learning_rate 5e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 50 \
    --tf32 True

Example usage:

import transformers
from peft import PeftModel

# Where peft_model_id should be the saving directory or huggingface model id
model_name = "google/flan-t5-large"; peft_model_id = "reasonwang/flan-alpaca-lora-large"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
base_model = transformers.AutoModelForSeq2SeqLM.from_pretrained(model_name)
peft_model = PeftModel.from_pretrained(base_model, peft_model_id)

# Input an instruction or any other questions.
inputs = tokenizer("List a few tips to get good scores in math.", return_tensors="pt")
outputs = peft_model.generate(**inputs, max_length=128, do_sample=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

flan-alpaca-lora's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

flan-alpaca-lora's Issues

NameError: name 'bnb' is not defined

Getting the following error -
in <cell line: 8>:8 โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:143 in from_pretrained โ”‚
โ”‚ โ”‚
โ”‚ 140 โ”‚ โ”‚ if config.task_type not in MODEL_TYPE_TO_PEFT_MODEL_MAPPING.keys(): โ”‚
โ”‚ 141 โ”‚ โ”‚ โ”‚ model = cls(model, config) โ”‚
โ”‚ 142 โ”‚ โ”‚ else: โ”‚
โ”‚ โฑ 143 โ”‚ โ”‚ โ”‚ model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config) โ”‚
โ”‚ 144 โ”‚ โ”‚ โ”‚
โ”‚ 145 โ”‚ โ”‚ # load weights if any โ”‚
โ”‚ 146 โ”‚ โ”‚ if os.path.exists(os.path.join(model_id, WEIGHTS_NAME)): โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:642 in init โ”‚
โ”‚ โ”‚
โ”‚ 639 โ”‚ """ โ”‚
โ”‚ 640 โ”‚ โ”‚
โ”‚ 641 โ”‚ def init(self, model, peft_config: PeftConfig): โ”‚
โ”‚ โฑ 642 โ”‚ โ”‚ super().init(model, peft_config) โ”‚
โ”‚ 643 โ”‚ โ”‚ self.base_model_prepare_inputs_for_generation = self.base_model.prepare_inputs_f โ”‚
โ”‚ 644 โ”‚ โ”‚ self.base_model.prepare_inputs_for_generation = self.prepare_inputs_for_generati โ”‚
โ”‚ 645 โ”‚ โ”‚ self.base_model_prepare_encoder_decoder_kwargs_for_generation = ( โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:79 in init โ”‚
โ”‚ โ”‚
โ”‚ 76 โ”‚ โ”‚ if isinstance(self.peft_config, PromptLearningConfig): โ”‚
โ”‚ 77 โ”‚ โ”‚ โ”‚ self._setup_prompt_encoder() โ”‚
โ”‚ 78 โ”‚ โ”‚ else: โ”‚
โ”‚ โฑ 79 โ”‚ โ”‚ โ”‚ self.base_model = LoraModel(peft_config, model) โ”‚
โ”‚ 80 โ”‚ โ”‚ if getattr(self.peft_config, "modules_to_save", None) is not None: โ”‚
โ”‚ 81 โ”‚ โ”‚ โ”‚ self.modules_to_save = self.peft_config.modules_to_save โ”‚
โ”‚ 82 โ”‚ โ”‚ โ”‚ _set_trainable(self) โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py:118 in init โ”‚
โ”‚ โ”‚
โ”‚ 115 โ”‚ โ”‚ super().init() โ”‚
โ”‚ 116 โ”‚ โ”‚ self.peft_config = config โ”‚
โ”‚ 117 โ”‚ โ”‚ self.model = model โ”‚
โ”‚ โฑ 118 โ”‚ โ”‚ self._find_and_replace() โ”‚
โ”‚ 119 โ”‚ โ”‚ mark_only_lora_as_trainable(self.model, self.peft_config.bias) โ”‚
โ”‚ 120 โ”‚ โ”‚ self.forward = self.model.forward โ”‚
โ”‚ 121 โ”‚
โ”‚ โ”‚
โ”‚ /usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py:148 in _find_and_replace โ”‚
โ”‚ โ”‚
โ”‚ 145 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ is_target_modules_in_base_model = True โ”‚
โ”‚ 146 โ”‚ โ”‚ โ”‚ โ”‚ parent, target, target_name = self._get_submodules(key) โ”‚
โ”‚ 147 โ”‚ โ”‚ โ”‚ โ”‚ bias = target.bias is not None โ”‚
โ”‚ โฑ 148 โ”‚ โ”‚ โ”‚ โ”‚ if loaded_in_8bit and isinstance(target, bnb.nn.Linear8bitLt): โ”‚
โ”‚ 149 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ kwargs.update( โ”‚
โ”‚ 150 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ { โ”‚
โ”‚ 151 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ "has_fp16_weights": target.state.has_fp16_weights,

Training script takes more than 2 hours to finish

Hi. Thanks for your nice work!

I've tried to run your training script on a RTX3090 with exact dependencies as you suggested. It turned out that it took more than 2 hours to finish instead of 20 minutes. I also tried training flan-t5-large and it took more than 4 hours. What can be the reasons for this?

Question about training loss

Hi, I'm very interested in your project, but during the training, I found that the training loss will be very big, more than 30, is it normal?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.