Git Product home page Git Product logo

Comments (12)

Ryoo72 avatar Ryoo72 commented on May 23, 2024

same question.

from monkey.

samaritan1998 avatar samaritan1998 commented on May 23, 2024

same question.

请问你解决了么?

from monkey.

Ryoo72 avatar Ryoo72 commented on May 23, 2024

No. I haven't solved it yet. Actually, I gave up🥲 It's difficult to use the translator. Please speak in English...

from monkey.

samaritan1998 avatar samaritan1998 commented on May 23, 2024

image
附内存占用情况

from monkey.

Ryoo72 avatar Ryoo72 commented on May 23, 2024

There are two types of a100: 40GB and 80GB. The author seems to have trained with 80GB.

from monkey.

zc-zhao avatar zc-zhao commented on May 23, 2024

When I use zero3, there is a situation where the loss is larger than when I use zero2. It seems that the model did not initialize successfully. Have you ever encountered this when using zero3?
微信图片_20240314172641

微信图片_20240314172647
The first picture shows the loss of zero3 and the second picture shows the loss of zero2, and there is a parameter mismatch issue during model initialization, as follows:

b0c10984d8765f94751df92bee2bc56

from monkey.

samaritan1998 avatar samaritan1998 commented on May 23, 2024

Training has been stuck at loading the base model, and I haven't been able to successfully train yet.

from monkey.

luohao123 avatar luohao123 commented on May 23, 2024

Does it training from scratch or loading the textmoney weights finetune?

from monkey.

samaritan1998 avatar samaritan1998 commented on May 23, 2024

Does it training from scratch or loading the textmoney weights finetune?
loading the monkey weights finetune

from monkey.

luohao123 avatar luohao123 commented on May 23, 2024

This is not good, we need train from scratch

from monkey.

MelosY avatar MelosY commented on May 23, 2024

Could you please give more detail information about your training? Our model use the Qwen-VL as pretrained model and it can work well in 8xA800 80G with ZeRO2.

from monkey.

samaritan1998 avatar samaritan1998 commented on May 23, 2024

Could you please give more detail information about your training? Our model use the Qwen-VL as pretrained model and it can work well in 8xA800 80G with ZeRO2.您能提供有关您的培训的更多详细信息吗?我们的模型使用 Qwen-VL 作为预训练模型,它可以在 8xA800 80G 和 ZeRO2 中很好地工作。

40G 8卡 A100

from monkey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.