Currently the PipelineStage will assume that forward outputs are sent to the 'next_sta

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[RFC][pipelining] PipelineStage should let user control send/recv endpoints about pytorch HOT 6 OPEN

wconstab commented on July 1, 2024 2

[RFC][pipelining] PipelineStage should let user control send/recv endpoints

from pytorch.

Comments (6)

wconstab commented on July 1, 2024 1

my latest thinking is we don't need any new args to Stage() ctor, if we continue to assume no skip connections.

if we want to enable skip connections (later) we could add the args i proposed in the RFC with an adjustment. instead of 'args_rank' it would be 'args_stage', so e.g. it would tell you 'send arg 0 to stage 3, send arg1 to stage 4' but Stage would not know which rank owned those stages yet.

The change i would propose for now is that when you ask the stage for 'get_*send_ops', it has an optional argument for stage mapping. if None, it assumes a linear-modulo-pp_size mapping. If not None, it uses the mapping to determine which pp_rank a stage is on.

think this through and see if it makes sense, its just off the top of my head so could have issues.

from pytorch.

wconstab commented on July 1, 2024

One annoying thing is that it depends on which schedule you use what 'next stage' will be. It would be ideal if we could later-bind that information when the stages and schedule are used together.

Maybe the stage can have a map of stage-id to rank given to it by the schedule either during schedule init or during each call to get_*_ops?
cc @H-Huang

from pytorch.

H-Huang commented on July 1, 2024

Yeah I agree, there will need to be a stage-id to rank mapping for the correct comm ops. There are currently a few assumptions baked into the code that need to be updated:

Assumption 1) Stage id to rank mapping in looped cases is always stage_ids = range(rank, total_num_stages, local_num_stages). We can fix this by adding stage-id to rank mapping.
Assumption 2) You always receive from stage_id - 1 and send to stage_id + 1. We can fix this by the optional arguments mentioned above.

from pytorch.

wconstab commented on July 1, 2024

@H-Huang one more design consideration is how we should deal with the communication between the two stages at the bottom of the 'V' that are on the same physical rank.

e.g. say stage 3 needs to send outputs to 4 and recv grads from 4.

can we use NCCL for this use case today until we decide to optimize it? (Does nccl support sending/recving from the same rank to itself?)
if we want to avoid doing a comm op, how can we cleanly let stage3 know about stage4 and share a tensor? perhaps the schedule code itself needs to do this by passing the output tensor from 3 as an input to 4? (and skip generating send/recv ops).

from pytorch.

H-Huang commented on July 1, 2024

@wconstab I'm not sure about (1) I can test it out, but I think a clean way of doing it is to just check a condition in get_*_send_ops of whether the rank you are sending to is yourself, if so then just automatically update the respective recv_buffers (much like what should be updated from a get_*_recv_op). I think all of the changes can remain in the Stage class (the stage would just somehow need to know the other stages) without any changes to Schedule implementation. The send / recv ops will just return empty lists in this case (thus batch_isend_irecv will be a no-op)

from pytorch.

wconstab commented on July 1, 2024

the stage would just somehow need to know the other stages)

what's your proposal for how to let stages know about other stages?

during Stage init we could not easily pass all other stages, so lets rule this out
(a) a new method on Stage to 'register peer stages' could be called by the schedule at init time, for all stages on the same rank
(b) passing the recv 'Stage' object to Stage.get_fwd_send_ops(recv_stage) might be another way, during schedule step()

I guess (a) is pretty clean if we can do it in a schedule base class. And we should define the fallback too. If this registration is not performed, what will happen?

ranks will fall back to using nccl to send/recv to local same rank?
or will this error?

from pytorch.

[RFC][pipelining] PipelineStage should let user control send/recv endpoints about pytorch HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent