Git Product home page Git Product logo

run_llama2-7b-1gpu's Introduction

Train Llama2-7B on one GPU

This code example uses huggingface to finetune Llama2-7B using 1 GPU with 80GB of RAM. The following arguments to the trainer are important:

  • fp16=False; Changing this to true will make two copies of the model; one original model and in fp16. You don't want that as it takes up memory.
  • optim="adafactor: changing this to sth like adamw will use double the amounts of bytes per param.
  • per_device_train_batch_size=1 : Important as greater batch size requires more RAM.
  • gradient_accumulation_steps=2 or 4: This allows you to train with an effective batch size that is greater than 1; e.g., 2 or 4 when batch_size = 1 and acc steps are 2 or 4
  • gradient_checkpointing=True

In particular, you could change the train args like that:

 training_args = TrainingArguments(output_dir="finetuned_models",
                                      seed=0,
                                      fp16=False,
                                      gradient_accumulation_steps=4,
                                      gradient_checkpointing=True
                                      per_device_train_batch_size=args.batch_size,
                                      learning_rate=args.learning_rate,
                                      num_train_epochs=args.n_epochs,
                                      optim="adafactor"
                                    )

1. Create conda environment and install requirements

conda create -n p310 python=3.10 
conda activate p310
# Install the correct torch version depending on CUDA version from https://pytorch.org/
pip install -r requirements.txt

Next, finetune the llama model on the SST-2 dataset.

2. Run models

For example, to train the models on the SST-2 dataset using the 7B Llama2 model do the following:

sbatch run_sst2_ubs1_llama7b.sbatch

run_llama2-7b-1gpu's People

Contributors

martinpawel avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.