Comments (6)
Hi! It's normal to get nan for some batches when the sampled batch does not contain data for a specific domain, usually because the sampling ratio for that domain is low.
from llm-shearing.
It's weird to me why it happens.. Have you tried the original set up with 7b domains? Does it cause problems? Meanwhile I will try out the 2 domain set up once I get some compute ready.
from llm-shearing.
Hi @xiamengzhou,
I also encounter this issue with the original dynamic loading setup in pruning.sh
.
set_names=[cc,github,book,stackexchange,wiki,arxiv,c4]
proportion=[0.67,0.045,0.045,0.02,0.045,0.025,0.15]
And NaN happens in the first batch when calculating metric/train/stackexchange_LanguageCrossEntropy
.
The environment I use is the same as yours except that flash attn
is 2.3.6
.
The sample data for pruning is 0.1B
.
from llm-shearing.
Could you try the processed data I have here: https://drive.google.com/drive/folders/1WPIRx2NGkNBDswqZZh-hwI1h-QiKVCuN
And see if the same issue occurs again?
@PengWenChen @YanxiZSQ
from llm-shearing.
Hi @xiamengzhou! Thanks for your reply.
However, I can not access google drive where I am working :(
Could you please upload the processed data to this repository?
It would really help a lot!
from llm-shearing.
Hi, @xiamengzhou!
The proportion updating fails because of NaN loss on evaluation data. And it is because of the missing data of some subdatasets.
I solved this issue by increasing the number of evaluation sequence to 3500!
However, during normal training (update L_prune), the nan still happens due to the same reason (missing data of some subdatasets), but L_prune can still be updated.
I would like to confirm the correctness of this part!
Is this normal to get nan in train/metric/xx_LanguageCrossEntropy
?
Thank you.
from llm-shearing.
Related Issues (20)
- Could you provide tokenized continue-pretraining dataset for reproduction? HOT 2
- missmatch shape
- Start training but nothing continue HOT 6
- TypeError: buffer is too small for requested array
- Pruning fine-tuned model HOT 2
- save model meet problem HOT 1
- Instruction tuning dataset HOT 2
- If I can't configure Slurm on a cluster, does that mean I can't use multi-node multi-GPU setups? HOT 5
- 有没有不用Slurm跑剪枝的方法?
- None
- Start training but only output config information HOT 3
- The Project is not implemented for 70B llama? HOT 7
- LlamaRMSNorm() layer differs from original llama HOT 1
- composer model trans to pythia problem
- The dtype of tokenized data should be uint32 HOT 1
- Why the rope params are ignored while converting hf checkpoint to composer checkpoint? HOT 3
- about shearing params config HOT 1
- Can LLM-Shearing be used on ViT models? HOT 1
- Support for Llama-3 / GQA? HOT 1
- Open source the pruning mask. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-shearing.