Hi, thank you for making this book available before publishing. It is a treasure for l

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

A question about "building a distributed Ray Training" section in Chapter 3 about learning_ray HOT 3 CLOSED

maxpumperla commented on May 18, 2024

A question about "building a distributed Ray Training" section in Chapter 3

from learning_ray.

Comments (3)

maxpumperla commented on May 18, 2024

@sagebei sorry for the late reply, I sometimes get lost in GitHub notifications. You're right that you can in principle have race conditions there. and yes, in practice you will have to take care of that (there are different ways of ensuring that, acquiring a lock is one of them). the worst thing that can happen above is that some updates simply get lost.

from learning_ray.

sagebei commented on May 18, 2024

@maxpumperla Thank you very much for your reply! I closed the issue because I found there is no problem in the code. My understanding (which might not be correct) is that although the update_policy_task is invoked by num_episodes times in the for-loop and runs in parallel, the update_policy_tasks are "chained" together. as the policy_ref in the parameter comes from the policy_ref returned from the prior function call. Inside the update_policy_task, Ray actually does two extra things under the hood for us, which are ray.get() and ray.put() as shown below.

@ray.remote
def update_policy_task(policy_ref, experiences_list):
# policy = ray.get(policy_ref) policy_ref: ObjectRef(7df446e0be2f9350ffffffffffffffffffffffff0100000001000000)
[update_policy(policy_ref, ray.get(xp)) for xp in experiences_list]
# policy_ref = ray.put(policy) policy_ref: ObjectRef(80f450872c2ccadaffffffffffffffffffffffff0100000001000000)
return policy_ref

As ray.get is a waiting function, the function must be wait until the execution of the prior function gets finished. I have been fiddling with code for a while, and still cannot make sure that I understand the code correctly. Please correct me if my understanding is wrong. Much appreciated!

from learning_ray.

maxpumperla commented on May 18, 2024

@sagebei apologies for the long turnaround. we've now updated the example (https://github.com/maxpumperla/learning_ray/blob/main/notebooks/ch_03_core_app.ipynb) to only do rollouts in parallel, not the actual update step, as this was both confusing (e.g. the race conditions you mentioned) and unnecessary. also note that this pattern (distributed rollouts, central updates to a policy on the driver) is how RLlib currently does things as well.

Hope that helps!

from learning_ray.

A question about "building a distributed Ray Training" section in Chapter 3 about learning_ray HOT 3 CLOSED

Comments (3)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent