🚀 The feature, motivation and pitch there's a new DP shard strate

does FSDP support AMSP (a new DP shard strategy) about pytorch HOT 2 OPEN

guoyejun commented on July 18, 2024

does FSDP support AMSP (a new DP shard strategy)

from pytorch.

Comments (2)

awgu commented on July 18, 2024 1

I do not think FSDP supports this currently. In my high level understanding, the flexibility introduced in AMSP is mainly useful when doing microbatching / gradient accumulation?

from pytorch.

guoyejun commented on July 18, 2024

My understanding is that the flexibility comes from the new solution that the sharding strategy for parameter, gradient and optimizer states can be different. It by nature provides many sharding strategies, including DDP, ZeRO1, ZeRO2, ZeRO3, HSDP and MiCS and many more. With a given cluster and a given model, we may find a better sharding strategy, such as the table iv in the paper, also copy below.

another thing is that the sharding strategy is represented in two dims, one for node#, the other for gpu# in one node, it is more clear.

It is general because all these sharding strategies can be obtained by just changing the values of the configuration. We can even loose some constrains in the paper with the key idea from the paper, if possible.

from pytorch.

does FSDP support AMSP (a new DP shard strategy) about pytorch HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent