Git Product home page Git Product logo

Comments (14)

jakirkham avatar jakirkham commented on May 24, 2024 1

Happy to give this a go. We use LSF at work. So this should be pretty useful.

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024 1

If someone has interest and time, they should feel free to go ahead.

Yep, that was what I had in mind by specifying the steps in more details.

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024 1

@lzamparo I'll ping you when #78 is merged and it will be great if you could give it a go on your LSF cluster!

from dask-jobqueue.

lzamparo avatar lzamparo commented on May 24, 2024

@jakirkham if it helps, I tried making a dask-distributed LSF script a while back. I encountered an error I couldn't fix, but maybe you can? Here's a gist.

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024

Just to split the task in smaller chunks:

  • add a LSFCluster that inherits from dask_jobqueue.core.JobQueueCluster
  • define LSFCluster.submit_command (probably bsub?) and LSFCluster.cancel_command (probably bkill?)
  • implement LSFCluster._job_id_from_submit_output that takes a string (stdout output from the submit_cmd) and turns into into a job identifier.

Once implemented, it would be great to test it on a LSFCluster that you have at your disposal.

Looking at dask_jobqueue/slurm.py, dask_jobqueue/pbs.py, dask_jobqueue/sge.py is a good way to get started too.

from dask-jobqueue.

jakirkham avatar jakirkham commented on May 24, 2024

Sorry have been busy with other things lately. If someone has interest and time, they should feel free to go ahead.

from dask-jobqueue.

lzamparo avatar lzamparo commented on May 24, 2024

@lesteve I can give this a go; thanks for the direction. Is there a specific fixture-based test implemented for other cluster methods, or am I free to try some of my own tasks?

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024

It'd be great if you could give this a go! For the first iteration, I think you can try to put together and run a small snippet, e.g. something along these lines and get it to work on your local LSFCluster:

from dask_jobqueue import LSFCluster, Client

cluster = LSFCluster(...)  # use some arguments that make sense on your local cluster
client = Client(cluster)
result = client.map(lambda x: x + 1, range(10))
client.gather(result)

For tests, you could probably took some inspiration from the existing tests from test_slurm.py, test_pbs.py, etc ...
Personally I would be in favour of doing that in a separate PR.

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024

@lzamparo just so you know and to avoid duplicating work, there is an ongoing PR at #78. It would be great if you can give it a go (nowish or once the PR is merged, your call really) on your local LSF cluster and tell us whether that works for you!

from dask-jobqueue.

raybellwaves avatar raybellwaves commented on May 24, 2024

Still working through my PR and trying to get it to work. Subsequent testing (once working) on other LSF clusters will be great.
Leaving this link here: https://slurm.schedmd.com/rosetta.pdf which is helping me adapt the Slurm and PBS codes.

from dask-jobqueue.

lzamparo avatar lzamparo commented on May 24, 2024

@lesteve Thanks for the heads up. Looks like @raybellwaves is making good headway, I'll step aside.

from dask-jobqueue.

raybellwaves avatar raybellwaves commented on May 24, 2024

While #78 is almost finished, i stumbled across https://github.com/IBMSpectrumComputing/lsf-python-api. May be of interest for the future e.g. implementing something like bjobs to check if the dask workers are running or pending. (#11)

from dask-jobqueue.

raybellwaves avatar raybellwaves commented on May 24, 2024

This has been merged now. Happy to know if it works on other LSF schedulers. Not just UM Pegasus.
One thing which cropped up for me again was the psutil issue at showed as my first comment in the PR. If this comes up for others we could add psutil to the dependencies as installing psutil made it go away. But If it's just something unique to pegasus I'll just have to remember to install it.

from dask-jobqueue.

lesteve avatar lesteve commented on May 24, 2024

I am going to close this one since the associated PR has been merged.

@raybellwaves note you can use "Fix #issueNumber" in your PR description, this way the associated issue gets closed automatically when the PR is merged. For more details, look at this.

About the psutil problem that you bumped into, I think it's quite hard to guess what the root cause is but I would bet that it was a problem with your environment (somehow you ended up with a broken psutil) rather than with with dask-jobqueue.

Note psutil is already in the dependencies because dask-jobqueue depends on distributed which depends on psutil.

from dask-jobqueue.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.