Git Product home page Git Product logo

Comments (9)

Longyichen avatar Longyichen commented on June 16, 2024 1

Hi mengzhou,
Thank you very much for your quick fix, I will try again tomorrow to test your new code.
It is normal to have problems in the code, but fortunately, I have basically run through the framework process now. Although the process is a bit difficult, your work is very interesting and meaningful, so I am happy to participate in the realization process.
Another happy thing is that I have found a solution to the problem of "train.fit" not reporting errors and multi-card process blocking that bothered me before. I raised an issue in the composer library and found the relevant solution with their help. When the code in this warehouse is interrupted, the shared memory will be blocked by some bugs in Steam. The zombie memory needs to be cleaned up in time to avoid lagging. For specific circumstances, please refer to the following two issues.
mosaicml/llm-foundry#436 (comment)
mosaicml/composer#2733
I wrote a script for cleaning. If you need it, I will create a new branch and merge it in.

from llm-shearing.

xiamengzhou avatar xiamengzhou commented on June 16, 2024

Thanks for catching this! Scripts are updated.

from llm-shearing.

Longyichen avatar Longyichen commented on June 16, 2024

Hi mengzhou, i change the code and it prints load weight from my path, its ok

but it raise a problem that the loss keep all the same

It seems that the gradient is not calculated and it is not training normally.
And the loss (10) of the hot start after loading is much higher than the loss (2) I pruned before, on the same data set.
Is this a normal phenomenon?

[batch=366/48000]:
         Train time/batch: 365
         Train time/sample: 93440
         Train time/batch_in_epoch: 365
         Train time/sample_in_epoch: 93440
         Train time/token: 382730240
         Train time/token_in_epoch: 382730240
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6420
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 746
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 250
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 18
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 33
         Train metrics/train/arxiv_LanguageCrossEntropy: nan
         Train metrics/train/arxiv_count: 2
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2279
         Train throughput/batches_per_sec: 0.0414
         Train throughput/samples_per_sec: 10.5889
         Train throughput/device/batches_per_sec: 0.0052
         Train throughput/device/samples_per_sec: 1.3236
         Train throughput/tokens_per_sec: 43372.1727
         Train throughput/device/tokens_per_sec: 5421.5216
         Train throughput/flops_per_sec: 877674723199506.0000
         Train throughput/device/flops_per_sec: 109709340399938.2500
         Train throughput/device/mfu: 0.3516
         Train time/train: 2.4628
         Train time/val: 0.0000
         Train time/total: 2.4628
[batch=367/48000]:
         Train time/batch: 366
         Train time/sample: 93696
         Train time/batch_in_epoch: 366
         Train time/sample_in_epoch: 93696
         Train time/token: 383778816
         Train time/token_in_epoch: 383778816
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6420
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 803
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 273
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 20
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 35
         Train metrics/train/arxiv_LanguageCrossEntropy: 10.3750
         Train metrics/train/arxiv_count: 3
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2450
         Train throughput/batches_per_sec: 0.0413
         Train throughput/samples_per_sec: 10.5839
         Train throughput/device/batches_per_sec: 0.0052
         Train throughput/device/samples_per_sec: 1.3230
         Train throughput/tokens_per_sec: 43351.5384
         Train throughput/device/tokens_per_sec: 5418.9423
         Train throughput/flops_per_sec: 877257170181446.8750
         Train throughput/device/flops_per_sec: 109657146272680.8594
         Train throughput/device/mfu: 0.3515
         Train time/train: 2.4695
         Train time/val: 0.0000
         Train time/total: 2.4695
[batch=368/48000]:
         Train time/batch: 367
         Train time/sample: 93952
         Train time/batch_in_epoch: 367
         Train time/sample_in_epoch: 93952
         Train time/token: 384827392
         Train time/token_in_epoch: 384827392
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6430
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 861
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 293
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 22
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 40
         Train metrics/train/arxiv_LanguageCrossEntropy: 10.3750
         Train metrics/train/arxiv_count: 4
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2620
         Train throughput/batches_per_sec: 0.0412
         Train throughput/samples_per_sec: 10.5394
         Train throughput/device/batches_per_sec: 0.0051
         Train throughput/device/samples_per_sec: 1.3174
         Train throughput/tokens_per_sec: 43169.4561
         Train throughput/device/tokens_per_sec: 5396.1820
         Train throughput/flops_per_sec: 873572571549509.1250
         Train throughput/device/flops_per_sec: 109196571443688.6406
         Train throughput/device/mfu: 0.3500
         Train time/train: 2.4766
         Train time/val: 0.0000
         Train time/total: 2.4766

from llm-shearing.

Longyichen avatar Longyichen commented on June 16, 2024

Hi mengzhou, i change the code and it prints load weight from my path, its ok

but it raise a problem that the loss keep all the same

It seems that the gradient is not calculated and it is not training normally. And the loss (10) of the hot start after loading is much higher than the loss (2) I pruned before, on the same data set. Is this a normal phenomenon?

[batch=366/48000]:
         Train time/batch: 365
         Train time/sample: 93440
         Train time/batch_in_epoch: 365
         Train time/sample_in_epoch: 93440
         Train time/token: 382730240
         Train time/token_in_epoch: 382730240
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6420
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 746
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 250
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 18
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 33
         Train metrics/train/arxiv_LanguageCrossEntropy: nan
         Train metrics/train/arxiv_count: 2
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2279
         Train throughput/batches_per_sec: 0.0414
         Train throughput/samples_per_sec: 10.5889
         Train throughput/device/batches_per_sec: 0.0052
         Train throughput/device/samples_per_sec: 1.3236
         Train throughput/tokens_per_sec: 43372.1727
         Train throughput/device/tokens_per_sec: 5421.5216
         Train throughput/flops_per_sec: 877674723199506.0000
         Train throughput/device/flops_per_sec: 109709340399938.2500
         Train throughput/device/mfu: 0.3516
         Train time/train: 2.4628
         Train time/val: 0.0000
         Train time/total: 2.4628
[batch=367/48000]:
         Train time/batch: 366
         Train time/sample: 93696
         Train time/batch_in_epoch: 366
         Train time/sample_in_epoch: 93696
         Train time/token: 383778816
         Train time/token_in_epoch: 383778816
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6420
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 803
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 273
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 20
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 35
         Train metrics/train/arxiv_LanguageCrossEntropy: 10.3750
         Train metrics/train/arxiv_count: 3
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2450
         Train throughput/batches_per_sec: 0.0413
         Train throughput/samples_per_sec: 10.5839
         Train throughput/device/batches_per_sec: 0.0052
         Train throughput/device/samples_per_sec: 1.3230
         Train throughput/tokens_per_sec: 43351.5384
         Train throughput/device/tokens_per_sec: 5418.9423
         Train throughput/flops_per_sec: 877257170181446.8750
         Train throughput/device/flops_per_sec: 109657146272680.8594
         Train throughput/device/mfu: 0.3515
         Train time/train: 2.4695
         Train time/val: 0.0000
         Train time/total: 2.4695
[batch=368/48000]:
         Train time/batch: 367
         Train time/sample: 93952
         Train time/batch_in_epoch: 367
         Train time/sample_in_epoch: 93952
         Train time/token: 384827392
         Train time/token_in_epoch: 384827392
         Train metrics/train/cc_weight: 0.2192
         Train metrics/train/github_weight: 0.0002
         Train metrics/train/book_weight: 0.0791
         Train metrics/train/stackexchange_weight: 0.0064
         Train metrics/train/wiki_weight: 0.0096
         Train metrics/train/arxiv_weight: 0.0010
         Train metrics/train/c4-rp_weight: 0.6845
         Train memory/current_allocated_mem: 9.7173
         Train memory/current_active_mem: 9.7173
         Train memory/current_inactive_mem: 0.6447
         Train memory/current_reserved_mem: 51.3280
         Train memory/peak_allocated_mem: 44.6430
         Train memory/peak_active_mem: 44.8020
         Train memory/peak_inactive_mem: 17.7940
         Train memory/peak_reserved_mem: 51.3280
         Train memory/alloc_retries: 0
         Train trainer/device_train_microbatch_size: 16
         Train loss/train/total: 10.3750
         Train loss/train/ce_loss: 10.3750
         Train metrics/train/LanguageCrossEntropy: 10.3750
         Train metrics/train/Perplexity: 32048.3164
         Train metrics/train/cc_LanguageCrossEntropy: 10.3750
         Train metrics/train/cc_count: 861
         Train metrics/train/github_LanguageCrossEntropy: nan
         Train metrics/train/github_count: 0
         Train metrics/train/book_LanguageCrossEntropy: 10.3750
         Train metrics/train/book_count: 293
         Train metrics/train/stackexchange_LanguageCrossEntropy: 10.3750
         Train metrics/train/stackexchange_count: 22
         Train metrics/train/wiki_LanguageCrossEntropy: 10.3750
         Train metrics/train/wiki_count: 40
         Train metrics/train/arxiv_LanguageCrossEntropy: 10.3750
         Train metrics/train/arxiv_count: 4
         Train metrics/train/c4-rp_LanguageCrossEntropy: 10.3750
         Train metrics/train/c4-rp_count: 2620
         Train throughput/batches_per_sec: 0.0412
         Train throughput/samples_per_sec: 10.5394
         Train throughput/device/batches_per_sec: 0.0051
         Train throughput/device/samples_per_sec: 1.3174
         Train throughput/tokens_per_sec: 43169.4561
         Train throughput/device/tokens_per_sec: 5396.1820
         Train throughput/flops_per_sec: 873572571549509.1250
         Train throughput/device/flops_per_sec: 109196571443688.6406
         Train throughput/device/mfu: 0.3500
         Train time/train: 2.4766
         Train time/val: 0.0000
         Train time/total: 2.4766

I trained on a single card and it can run normally. It seems that there is still a compatibility issue with the composer code on multiple cards. There may be issues with model distributed loading and data sharding. Is it possible to run this code using the deepspeed framework?

from llm-shearing.

xiamengzhou avatar xiamengzhou commented on June 16, 2024

Hey I think there is a bug here -- not sure what it is. I am working on it now.

from llm-shearing.

xiamengzhou avatar xiamengzhou commented on June 16, 2024

Hi the issue is resolved! It stemmed from the init_device setup from the yaml files. Originally it was set to meta, yet it causes unexpected issues in loading the model. Now it is switched to cpu. Ideally, we would love to support meta loading as it is faster, but I am not sure how to integrate it with the current codebase yet. Thanks for spotting this!

PS. The codebase has been changed a lot after my runs for the paper (mostly to make it adaptable for the update to date composer package.) So there could be issues here and there, as it's not fully tested. Thanks for your work on this!

from llm-shearing.

xiamengzhou avatar xiamengzhou commented on June 16, 2024

Hi! Awesome :) Feel free to start a PR on it!

from llm-shearing.

argitrage avatar argitrage commented on June 16, 2024

Hey @Longyichen , I am also facing issues with pruning getting stuck after 'Starting Training'.

Could you guide me with the changes you made to solve this issue?

from llm-shearing.

Longyichen avatar Longyichen commented on June 16, 2024

Hey @Longyichen , I am also facing issues with pruning getting stuck after 'Starting Training'.

Could you guide me with the changes you made to solve this issue?

@argitrage see #30

from llm-shearing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.