Comments (4)
Thanks for your attention to our work!
This error is caused by DeepSpeed-zero3 (They do not support backward in an evaluation mode). We need the eval mode because the MiniLLM training is similar to the RL training where dropout is usually not applied.
If you use zero3 because of insufficient GPU memory, you can try using zero-offload by setting the DeepSpeed config to the corresponding path .
If you insist on using zero3, you can set the dropout value to 0 in the model config files before running the script and add self.model.train()
before every step (for example, at this line). In this way, you can avoid using dropout and do backward with the training mode at the same time.
from lmops.
I was using your deepspeed config file for zero2 offload, which I changed to zero3 and added parameter offload. If I understand you correctly, optimizer offload with zero2 works, but adding parameter offload and using zero3 is not yet implemented?
Which model config file are you referring to exactly which contains the dropout value?
I managed to run llama7B_13B minillm with zero2 on a single H100 GPU by reducing the max sequence length to 256. In your opinion, will this degrade performance significantly?
Thank you for your help!
from lmops.
I was using your deepspeed config file for zero2 offload, which I changed to zero3 and added parameter offload. If I understand you correctly, optimizer offload with zero2 works, but adding parameter offload and using zero3 is not yet implemented?
Yes.
The "model config file" refers to the huggingface config files in the checkpoint directory you load from.
I think reducing the max sequence length to 256 will mainly affect the performance of long instructions/responses. On short instructions, the performance will not be affected much.
from lmops.
Alright, thank you for your help!
from lmops.
Related Issues (20)
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
- the logits between MP=1 and MP=4 is different when control all other variables to be the same HOT 9
- Details for GPT4 evaluation
- The file name is missing l HOT 2
- AdaptLLM models with Llama Index HOT 6
- Paper:ADAPTING LARGE LANGUAGE MODELS VIA READING COMPREHENSION HOT 3
- prompt_optimization HOT 2
- iS LLMA lossless? HOT 1
- top-p < 1 fails inf assertion HOT 1
- why is the mpu/cross_entropy missing a softmax_logits_t HOT 2
- [MiniLLM] sft of llama2-7b out of memory on V100 HOT 2
- [MiniLLM]LLama sft on Dolly hard to reproduce results in paper. HOT 2
- Questions about the free-law data used in the paper "Adapt LLM to domains" HOT 2
- 【MiniLLM】About the number of training data of dolly HOT 4
- [MiniLLM]Why dolly only has 12435 training samples? HOT 2
- [MiniLLM] About the gradient accumulation in finetune.py HOT 2
- [tuna] Libraries are conflicting and/or very aged HOT 5
- Missing Jailbreak dataset from protegi? HOT 2
- ImportError: cannot import name 'mpu' from 'transformers' HOT 4
- ModuleNotFoundError: No module named 'deepspeed'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmops.