Git Product home page Git Product logo

gptqlora's Introduction

GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ

QLoRA with AutoGPTQ for quantization

License and Intended Use

I release the resources associated with GPTQLoRA finetuning in this repository under MIT license.

Installation

conda create -n gptqlora python=3.8
conda activate gptqlora
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
git clone -b peft_integration https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
pip install .[triton]
cd ..
git clone https://github.com/timdettmers/bitsandbytes.git
cd bitsandbytes
# CUDA_VERSIONS in {110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 120}
# make argument in {cuda110, cuda11x, cuda12x}
# if you do not know what CUDA you have, try looking at the output of: python -m bitsandbytes
CUDA_VERSION=117 make cuda11x
python setup.py install
cd ..
pip install git+https://github.com/huggingface/transformers.git
pip install git+https://github.com/huggingface/peft.git
pip install git+https://github.com/huggingface/accelerate.git
pip install -r requirements.txt
pip install protobuf==3.20.*

Getting Started

The gptqlora.py code is a starting point for finetuning and inference on various datasets. Basic command for finetuning a baseline model on the Alpaca dataset:

python gptqlora.py --model_path <path>

For models larger than 13B, we recommend adjusting the learning rate:

python gptqlora.py –learning_rate 0.0001 --model_path <path>

The file structure of the model checkpoint is as follows:

(bnb) root@/root/qlora-main# ls llama-7b/
config.json             gptq_model-4bit-128g.bin  special_tokens_map.json  tokenizer_config.json
generation_config.json  quantize_config.json      tokenizer.model

Quantization

Quantization is based on AutoGPTQ. Also, to run the code, you first need a model converted to GPTQ.

Paged Optimizer

You can access the paged optimizer with the argument --optim paged_adamw_32bit

Known Issues and Limitations

Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.

  1. Resuming a LoRA training run with the Trainer currently runs on an error
  2. Make sure that tokenizer.bos_token_id = 1 to avoid generation issues.

Acknoledgements

This code is based on QLoRA.

This repo builds on the Stanford Alpaca and LMSYS FastChat repos.

gptqlora's People

Contributors

artidoro avatar qwopqwop200 avatar timdettmers avatar birch-san avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.