Git Product home page Git Product logo

Comments (2)

crleblanc avatar crleblanc commented on August 13, 2024 1

Thanks for the info Jim, those sound like excellent suggestion.

We're making heavy use of PySpark at the moment so I managed to solve my problem by switching to a pandas_udf. They're simple but fairly restrictive. We may look at Dask again if pandas_udfs don't give us enough flexibility.

from dask-yarn.

jcrist avatar jcrist commented on August 13, 2024

Apologies for the slow response here. The issue is that scale is a non-blocking operation. It schedules two new workers to be added to the cluster, but doesn't wait for them to be up. If you add:

Client.wait_for_workers(2)

before calling upload_file, that would work. The downside with this approach is that it the files won't be added to workers that are added later (e.g. if you scale up your cluster later).

A more complicated, but also more robust approach would be to use a worker plugin (http://distributed.dask.org/en/latest/api.html#distributed.Client.register_worker_callbacks) to download code on worker startup. These are run for all existing and future workers, but would require you to write a function like:

def download_code_and_add_to_python_path():
    ...
client.register_worker_callbacks(setup=download_code_and_add_to_python_path)

Note that both of these features are only on the master branch of distributed, and haven't been released yet. If you want to try them out, you can install like:

pip install git+https://github.com/dask/distributed.git

Alternatively, you could do a less efficient but still sufficient wait using the current release:

def wait_for_workers(client, n):
    while not client.scheduler_info()['workers'] >= n:
        time.sleep(0.25)

from dask-yarn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.