Comments (2)
Thanks for your question and sorry for my delayed response.
Regarding Q1, PEFTSeq2SeqTrainer
inherits Trainer
Class, which has a train
method.
Regarding Q2, I am not sure about this.
Regarding Q3, does this work for you?
MODEL=llama-2-7b
MAX_LENGTH=64
MAX_STEPS=50000
PREFIX_LENGTH=40
R=60
for TASK_NAME in sst2; do
for LORA_LR in 5e-3 3e-1 5e-4; do
for lr in 3e-1 4e-1; do
python train.py \
--peft_type PROMPT_TUNING_LORA \
--lora_embedding_lr ${LORA_LR} \
--learning_rate ${lr} \
--prefix_length ${PREFIX_LENGTH} \
--r ${R} \
--task_name ${TASK_NAME} \
--dataset_config_name en \
--model_name_or_path your_path/${MODEL} \
--do_train \
--do_eval \
--do_predict \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--max_seq_length ${MAX_LENGTH} \
--save_strategy steps \
--evaluation_strategy steps \
--max_steps ${MAX_STEPS} \
--eval_steps 1000 \
--save_steps 1000 \
--warmup_steps 500 \
--weight_decay 1e-5 \
--load_best_model_at_end \
--save_total_limit 1 \
--output_dir saved_${MODEL}/${TASK_NAME}_lr${lr}_loralr${LORA_LR}_pl${PREFIX_LENGTH}_r${R}_st${MAX_STEPS};
done;
done;
done
from dept.
Thank you for the help. I will try it and let you know.
from dept.
Related Issues (8)
- About the hyperparameters of large models
- Runing Time HOT 2
- Some tensors share memory, this will lead to duplicate memory HOT 3
- About the hyperparameters HOT 4
- About the max_length of SuperGLUE-MultiRC dataset HOT 2
- General question about padding in the setting of soft-prompt tuning HOT 2
- Could you please provide the checkpoints of MRQA datasets? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dept.