I have a issue, I used the two data sets you provided: [book,github] the mds_sampl

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Could you try the processed data I have here: <a href="https://drive.google.com/drive/

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

LanguageCrossEntropy logs nan when bash pruning.sh about llm-shearing HOT 6 OPEN

princeton-nlp commented on June 19, 2024

LanguageCrossEntropy logs nan when bash pruning.sh

from llm-shearing.

Comments (6)

xiamengzhou commented on June 19, 2024 1

Hi! It's normal to get nan for some batches when the sampled batch does not contain data for a specific domain, usually because the sampling ratio for that domain is low.

from llm-shearing.

xiamengzhou commented on June 19, 2024

It's weird to me why it happens.. Have you tried the original set up with 7b domains? Does it cause problems? Meanwhile I will try out the 2 domain set up once I get some compute ready.

from llm-shearing.

PengWenChen commented on June 19, 2024

Hi @xiamengzhou,
I also encounter this issue with the original dynamic loading setup in pruning.sh.
set_names=[cc,github,book,stackexchange,wiki,arxiv,c4]
proportion=[0.67,0.045,0.045,0.02,0.045,0.025,0.15]

And NaN happens in the first batch when calculating metric/train/stackexchange_LanguageCrossEntropy.

The environment I use is the same as yours except that flash attn is 2.3.6.
The sample data for pruning is 0.1B.

from llm-shearing.

xiamengzhou commented on June 19, 2024

Could you try the processed data I have here: https://drive.google.com/drive/folders/1WPIRx2NGkNBDswqZZh-hwI1h-QiKVCuN
And see if the same issue occurs again?
@PengWenChen @YanxiZSQ

from llm-shearing.

PengWenChen commented on June 19, 2024

Hi @xiamengzhou! Thanks for your reply.
However, I can not access google drive where I am working :(
Could you please upload the processed data to this repository?
It would really help a lot!

from llm-shearing.

PengWenChen commented on June 19, 2024

Hi, @xiamengzhou!
The proportion updating fails because of NaN loss on evaluation data. And it is because of the missing data of some subdatasets.
I solved this issue by increasing the number of evaluation sequence to 3500!

However, during normal training (update L_prune), the nan still happens due to the same reason (missing data of some subdatasets), but L_prune can still be updated.
I would like to confirm the correctness of this part!
Is this normal to get nan in train/metric/xx_LanguageCrossEntropy ?
Thank you.

from llm-shearing.

Recommend Projects

LanguageCrossEntropy logs nan when bash pruning.sh about llm-shearing HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent