Git Product home page Git Product logo

dask-jobqueue's People

Contributors

basnijholt avatar costrouc avatar d-v-b avatar ericmjl avatar guillaumeeb avatar jacobtomlinson avatar jakirkham avatar jgerardsimcock avatar josephhardinee avatar jrbourbeau avatar kaelancotter avatar khusmann avatar leej3 avatar lesteve avatar lnaden avatar louisabraham avatar lpsinger avatar matyasselmeci avatar mrocklin avatar msimonin avatar rblcoder avatar rsignell-usgs avatar sdtaylor avatar spencerkclark avatar stuarteberg avatar tomaugspurger avatar twoertwein avatar wgustafson avatar willirath avatar zonca avatar

Watchers

 avatar  avatar  avatar

Forkers

prasunanand

dask-jobqueue's Issues

Define the user story

I think defining what usage looks like for the intended users will help us get on the same page. I think it's still evolving at this stage, but it'll be nice to document it here.

Create a simple text format that can represent a workflow and be converted to a Dask DAG

We need a parser that can take as it's input a text file with bash commands and create a Dask HighLevelGraph.

Jobs are normally grouped together by analysis type (QC, filter, preprocess, align) and/or compute resources.

Simple Example with job only dependencies. In this example, all tasks in job_1 must complete before any in job_2 can complete.

#HPC jobname=job_1
#HPC walltime=10:00:00
#HPC mem=32GB
#HPC cpus_per_task=1
job_1_task_1
job_1_task_2
job_1_task_3
job_1_task_4

#HPC jobname=job_2
#HPC walltime=10:00:00
#HPC mem=32GB
#HPC cpus_per_task=1
#HPC deps=job_1
job_2_task_1
job_2_task_2
job_2_task_3
job_2_task_4

Example with Task Dependencies

I think Dharhas would have some really good insight into how this should be done.

When I would submit something like this on SLURM I would use array dependencies. Essentially, instead of submitting a single job N times you submit a job once that is itself a for loop that executes N times.

Jobs can depend on jobs, and tasks within the job can depend upon tasks in another job.

I don't think that this needs to be translated specifically into job arrays, but can be handled by the Dask Scheduler itself.

#HPC jobname=job_1
#HPC walltime=10:00:00
#HPC mem=32GB
#HPC cpus_per_task=1

#HPC task_tags=task_1
job_1_task_1
#HPC task_tags=task_2
job_1_task_2
#HPC task_tags=task_3
job_1_task_3
#HPC task_tags=task_4
job_1_task_4

#HPC jobname=job_2
#HPC deps=job_1
#HPC walltime=10:00:00
#HPC mem=32GB
#HPC cpus_per_task=1

#HPC task_tags=task_1
job_2_task_1
#HPC task_tags=task_2
job_2_task_2
#HPC task_tags=task_3
job_2_task_3
#HPC task_tags=task_1
job_2_task_4

#HPC jobname=job_3
#HPC deps=job_3
#HPC walltime=10:00:00
#HPC mem=32GB
#HPC cpus_per_task=1

job_3_task_1
job_3_task_2
job_3_task_3
job_3_task_4

Will Dask Developers want the BashWrapper in JobQueue?

So I'm also new to Open Source Development, and so this issue is mostly so I can understand the process, but I have a few questions.

  • Do the developers of dask know that we want to add the BashWrapper to Jobqueue, and do they agree? I guess the alternative would be to make a separate package that depends on Dask JobQueue.
  • If not, should we bring it up in an issue and get some feedback first?

Create a AWS Batch JobQueue Runner

This is fairly self explanatory. We need to extend the base scheduler to implement an AWSBatch, and it should look very similar to SLURM. (Except for Batch! ;-) )

If you need an environment to test on please see the login credentials in Slack. There is a fully functioning AWSBatch cluster.

Here is a tutorial that includes submitting a job to batch. The backend of Batch is ECS, but its supposed to act like a HPC cluster.

Is there a potential gotcha that it doesn't supply IP addresses?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.