Git Product home page Git Product logo

Comments (11)

xqmmy avatar xqmmy commented on July 26, 2024

试了很多方法都不太行

from chinese-vicuna.

Facico avatar Facico commented on July 26, 2024

你的cuda和pytorch版本是能对上的吗,你可以到pytorch官网看看

from chinese-vicuna.

xqmmy avatar xqmmy commented on July 26, 2024

你的cuda和pytorch版本是能对上的吗,你可以到pytorch官网看看

是对应的

from chinese-vicuna.

Facico avatar Facico commented on July 26, 2024

你可以参考指引来问问题。像上面,你应该把你的pytorch版本和CUDA版本告诉我,就一个报错信息我很难去猜测你的具体情况。

from chinese-vicuna.

xqmmy avatar xqmmy commented on July 26, 2024

你可以参考指引来问问题。像上面,你应该把你的pytorch版本和CUDA版本告诉我,就一个报错信息我很难去猜测你的具体情况。

好的,抱歉我没描述清楚
torch1.13.1 cuda11.7
使用的是完整的merge数据和原始的多卡finetune脚本
使用的A4000显卡、7张

from chinese-vicuna.

Facico avatar Facico commented on July 26, 2024

你配置对应的pytorch安装脚本应该是“pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117” 看看有没有误下成cpu版本。

你使用单卡的时候能成功吗?多卡的时候使用双卡、四卡等配置看看是否会有问题,这里有个类似的issue

可以在跑的时候加上export NCCL_DEBUG=INFO,看看有没有更详细的报错输出。
或者你可以看看这个有没有帮助

from chinese-vicuna.

zhoujx4 avatar zhoujx4 commented on July 26, 2024

我跟你问题一样,超时,单机多卡的
改了bios,关闭ACS 后解决问题,你参考下

from chinese-vicuna.

zhoujx4 avatar zhoujx4 commented on July 26, 2024

具体可以参考下这个:https://www.modb.pro/db/617940

from chinese-vicuna.

xqmmy avatar xqmmy commented on July 26, 2024

换了3090就没问题,不知道啥原因

from chinese-vicuna.

xqmmy avatar xqmmy commented on July 26, 2024

我跟你问题一样,超时,单机多卡的 改了bios,关闭ACS 后解决问题,你参考下

谢谢解答,我再试试

from chinese-vicuna.

thelongestusernameofall avatar thelongestusernameofall commented on July 26, 2024

solution in NVIDIA/nccl#426 works.

export NCCL_IB_GID_INDEX=3 solved my problem.

from chinese-vicuna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.