Git Product home page Git Product logo

Comments (4)

yangapku avatar yangapku commented on May 18, 2024 3

这里是all-gather操作发生的代码位置:

# We gather tensors from all gpus to get more negatives to contrast with.
gathered_image_features = [
torch.zeros_like(image_features) for _ in range(world_size)
]
gathered_text_features = [
torch.zeros_like(text_features) for _ in range(world_size)
]
dist.all_gather(gathered_image_features, image_features)
dist.all_gather(gathered_text_features, text_features)

from chinese-clip.

yangapku avatar yangapku commented on May 18, 2024

您好,由于目前我们的多卡训练默认打开了aggregate(由训练配置中的--skip-aggregate控制),在模型输出侧会进行一次机器间all-gather的通信操作,把每张卡上的local batch通过通信聚合成global batch,从而计算负样本更多的对比学习损失。这个操作是比较消耗通信带宽的,所以可能会受限于机器间网络通信性能上,最优的配置是机器间通信网络支持RDMA通信,这样训练的加速比会好很多。

from chinese-clip.

mantianlong avatar mantianlong commented on May 18, 2024

懂了,赞👍🏻,谢谢~

from chinese-clip.

Xujianzhong avatar Xujianzhong commented on May 18, 2024
借楼主的帖子,请教咨询个问题: 目前使用pytorch2.0版本,随机机器节点数量的增加,显存消耗变越来越严重。

例如:1台8卡v100,batch size 能到2048, 4台8卡v100 batch size只能到1024。
@yangapku 请教大佬有遇到这个问题不,有啥办法优化不

from chinese-clip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.