Comments (14)
Happy to give this a go. We use LSF at work. So this should be pretty useful.
from dask-jobqueue.
If someone has interest and time, they should feel free to go ahead.
Yep, that was what I had in mind by specifying the steps in more details.
from dask-jobqueue.
@lzamparo I'll ping you when #78 is merged and it will be great if you could give it a go on your LSF cluster!
from dask-jobqueue.
@jakirkham if it helps, I tried making a dask-distributed LSF script a while back. I encountered an error I couldn't fix, but maybe you can? Here's a gist.
from dask-jobqueue.
Just to split the task in smaller chunks:
- add a LSFCluster that inherits from
dask_jobqueue.core.JobQueueCluster
- define
LSFCluster.submit_command
(probablybsub
?) andLSFCluster.cancel_command
(probablybkill
?) - implement
LSFCluster._job_id_from_submit_output
that takes a string (stdout output from thesubmit_cmd
) and turns into into a job identifier.
Once implemented, it would be great to test it on a LSFCluster that you have at your disposal.
Looking at dask_jobqueue/slurm.py, dask_jobqueue/pbs.py, dask_jobqueue/sge.py is a good way to get started too.
from dask-jobqueue.
Sorry have been busy with other things lately. If someone has interest and time, they should feel free to go ahead.
from dask-jobqueue.
@lesteve I can give this a go; thanks for the direction. Is there a specific fixture-based test implemented for other cluster methods, or am I free to try some of my own tasks?
from dask-jobqueue.
It'd be great if you could give this a go! For the first iteration, I think you can try to put together and run a small snippet, e.g. something along these lines and get it to work on your local LSFCluster:
from dask_jobqueue import LSFCluster, Client
cluster = LSFCluster(...) # use some arguments that make sense on your local cluster
client = Client(cluster)
result = client.map(lambda x: x + 1, range(10))
client.gather(result)
For tests, you could probably took some inspiration from the existing tests from test_slurm.py, test_pbs.py, etc ...
Personally I would be in favour of doing that in a separate PR.
from dask-jobqueue.
@lzamparo just so you know and to avoid duplicating work, there is an ongoing PR at #78. It would be great if you can give it a go (nowish or once the PR is merged, your call really) on your local LSF cluster and tell us whether that works for you!
from dask-jobqueue.
Still working through my PR and trying to get it to work. Subsequent testing (once working) on other LSF clusters will be great.
Leaving this link here: https://slurm.schedmd.com/rosetta.pdf which is helping me adapt the Slurm and PBS codes.
from dask-jobqueue.
@lesteve Thanks for the heads up. Looks like @raybellwaves is making good headway, I'll step aside.
from dask-jobqueue.
While #78 is almost finished, i stumbled across https://github.com/IBMSpectrumComputing/lsf-python-api. May be of interest for the future e.g. implementing something like bjobs
to check if the dask workers are running or pending. (#11)
from dask-jobqueue.
This has been merged now. Happy to know if it works on other LSF schedulers. Not just UM Pegasus.
One thing which cropped up for me again was the psutil
issue at showed as my first comment in the PR. If this comes up for others we could add psutil
to the dependencies as installing psutil
made it go away. But If it's just something unique to pegasus I'll just have to remember to install it.
from dask-jobqueue.
I am going to close this one since the associated PR has been merged.
@raybellwaves note you can use "Fix #issueNumber" in your PR description, this way the associated issue gets closed automatically when the PR is merged. For more details, look at this.
About the psutil problem that you bumped into, I think it's quite hard to guess what the root cause is but I would bet that it was a problem with your environment (somehow you ended up with a broken psutil
) rather than with with dask-jobqueue.
Note psutil
is already in the dependencies because dask-jobqueue depends on distributed which depends on psutil.
from dask-jobqueue.
Related Issues (20)
- Replace --nprocs by --nworkers in the generated job script HOT 4
- Drop Python 3.7
- SLURM workers not getting queued HOT 2
- Check if CI really don't build Docker image if not needed HOT 1
- Dask Jobqueue is not async ready
- HTCondor CI is failing HOT 2
- CI: Distributed fixtures not compliant with dask-jobqueue
- dask-jobqueue for Fujitsu HPC HOT 3
- Remove deprecated parameters `env_extra`, `extra`, `job_extra` HOT 3
- Suppress "Couldn't detect a suitable IP address" messages on cluster nodes with no internet HOT 1
- Cluster keeps appending "interface" flag to job script HOT 7
- Release 0.8.1 HOT 2
- OARCluster implementation does not let OAR take into account the memory parameter HOT 4
- `JobQueueCluster` with local worker(s) HOT 3
- Restart cluster job on task completion HOT 3
- Add CI with more tests for OAR
- dask_jobqueue tries to import non-existent function dask.utils.ignoring HOT 3
- a direct way to specify the worker spec HOT 4
- Documentation bug: interface HOT 1
- documentation: document `worker_command` kwarg
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-jobqueue.