Git Product home page Git Product logo

jor's Introduction

jor

jor (JOb Runner) keeps track of jobs. Jobs are specified in a simple format and can then be run collectively with a simple command, mindful of which outputs already exist. Jobs can optionally be run in a conda environment or in a Singularity container, and can be run either locally or submitted to a compute node with slurm.

jor is particularly useful

  • if a large number of different jobs needs to be run
  • for parameter sweeps
  • in an HPC environment: jor can submit jobs with slurm
  • for reproducible research results: jor can run jobs in a Singularity container

Usage

To use jor requires 3 preparatory steps:

  1. Write a simple Python wrapper around the actual computations, specifying the job's inputs and outputs
  2. Collect all the jobs in a todo-list
  3. Write up some runtime-arguments in a config file

These steps are described in detail below.

Once these preparations are done, all jobs can be run with

jor run

and the status of the jobs specified in the todo-list can be inspected with

jor status

Preparatory steps

1. The Job-wrapper

This is the most complex of the 3 steps. It requires to implement 4-5 functions that specify

  1. the parameters for each job
  2. the output folder for each job
  3. the name of the output file for each job
  4. the actual computation for each job
  5. if applicable, how to concatenate all outputs of this job into a single file

Note

All these steps would be necessary no matter if jor is used to perform the jobs' computations or not, the only difference is that, to use jor they need to be specified in a particular way. Thus jor doesn't lead to much code overhead.

Here is a template:

import pathlib
import jor


class Jobs(jor.JobsBase):

    # slurm options
    name = 'job'
    time = '0-23:59:59'
    mem = '5G'
    cpus_per_task = 1

    def __init__(self, n=3, path_prefix='.'):

        # init base class
        super().__init__(path_prefix=path_prefix)

        # store job-specific keyword arguments
        self.n = int(n)

        # assemble a list of jobs
        self._mk_jobs()

    def _mk_jobs(self):
        """Generates the attribute ``self._jobs``

        ``self._jobs`` is a list of dictionaries, containing one dictionary
        for each jobs to be run. Each of these dictionaries specifies the
        parameters for each individual job.
        """
        self._jobs = [
            dict(index=i)
            for i in range(self.n)
        ]

    def _get_output_folder(self):
        """Return output folder for jobs

        Output folder is the ``path_prefix``, followed by the name of a subfolder
        encoding the arguments given in the constructor.
        """
        output_folder = pathlib.Path(self.path_prefix) / f'example{self.n}'
        return str(output_folder)

    def _get_output_fname(self, index):
        """Return output file name for a given job

        The particular job is specified by the arguments to this function,
        which match the keys of the dictionaries in ``self._jobs`` (cf.
        :func:`self._mk_jobs`).
        """
        outfname = f'ind{index}.txt'
        return outfname

    def execute(self, i):
        """This function performs the actual work

        Parameters
        ----------
        i : int between 0 and number of jobs (``len(self._jobs)``)
            indicating the index in ``self._jobs`` from which to take
            the dictionary with this job's parameter values
        """
        myargs = self._jobs[i]
        output_path = self._get_output_path(**myargs)

        # do the work and write outcomes to file ``output_path``
        with open(output_path, 'wt') as f:
            f.write(str(myargs) + '\n')

    def collect(self):
        """Concatenates all outputs into a common output

        This function is optional and can be implemented if desired for
        the particular job. It is called by running ``jor collect`` on the
        command line.
        """
        pass

Note

  1. The wrapper needs to be a Python file containaing a class Jobs, derived from jor.JobsBase
  2. The indicated slurm options are defaults inherited from jor.JobsBase, i.e. they only need to be specified if a different value is desired

To adapt this to a specific application:

  1. Adapt the constructor to take job-specific arguments
  2. Reimplement _mk_jobs
  3. Reimplement _get_output_folder
  4. Reimplement _get_output_fname
  5. Reimplement execute
  6. If applicable, reimplement collect

2. The todo-list

The file containing the todo-list is a YAML file named todo.yaml by default. It has the following format:

jobs:
- jobmodule: ./jobs_example.py
  jobargs: n=3
- jobmodule: ./jobs_example.py
  jobargs: n=4

There can be an arbitrary number of jobs specified in this file.

Note

  1. The file needs to start with jobs:
  2. Each job is specified by 2 lines, 1 starting with - jobmodule: the other with an indented jobargs:.
  3. The argument for jobmodule: is the name (or path to) the Python file containing the wrapper code.
  4. The argument to jobargs: is a comma-separated list of keyword-arguments for the constructor of the Jobs class in the wrapper file. It needs to be valid Python and can be empty if no keyword arguments are necessary.

3. The config file

The config file needs to be named jor.cfg and needs to reside in the working directory from which jor is called. It has the following format:

[global]
path-prefix = output
overwrite-output = False

[run]
todo-list = todo.yaml

[submit]
scheduler = local
partition = day
sif =
condaenv =

[collect]
missing-output = ignore

The configuration options have the following meaning:

Configuration options
Keyword Allowed values Meaning
path-prefix file-system paths the job-wrapper should receive path_prefix as a keyword argument in the Jobs constructor, and should prefix all internally generated output-paths with the value of path-prefix
overwrite-output True or False if False jor will check which outputs already exist and only run jobs that result in the remaining outputs
todo-list a filename file name containing todo-list, by default this is todo.yaml, there's probably no reason to change this
scheduler local or slurm if local jobs will be run in order locally; if slurm one job-array will be submitted via slurm's sbatch command per entry in the todo-list
partition a valid slurm partition (queue) name ignored if scheduler = local
sif either empty or path to a Singularity container if not empty all jobs will be run in this container
condaenv either empty of name of a conda environment if not empty, this conda environment will be activated before running each job
missing-output ignore or raise in case the job-wrapper implements a collect method to concatenate outputs, this specifies how missing files are handled: if ignore missing outputs will be ignored, if raise missing outputs will cause concatenation to abort

Note

All configuration-options in jor.cfg can be overwritten in the command-line call to jor.

Example

An example is provided in the examples subfolder. The file jobs_example.py contains the code shown above. Likewise the todo.yaml and jor.cfg files from above can be found there. Calling

jor run

returns

[jor] Submitting job: ./jobs_example.py
[jor] Submitting job: ./jobs_example.py

and inspecting the output folder

ls -R output

shows that all output files are present:

example3 example4

output/example3:
ind0.txt ind1.txt ind2.txt

output/example4:
ind0.txt ind1.txt ind2.txt ind3.txt

jor's People

Contributors

mdhelmer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.