Git Product home page Git Product logo

cromwell-tools's Introduction

Cromwell-tools

Container Build Status Unit Test Status Documentation Status Latest Release License Language Code Style Code Coverage

Overview

This repo contains a cromwell_tools Python package, accessory scripts and IPython notebooks.

The cromwell_tools Python package is designed to be a Python API and Command Line Tool for interacting with the Cromwell. with the following features:
  • Python3 compatible. (Starting from release v2.0.0, cromwell-tools will no longer support Python 2.7)
  • Consistency in authentication to work with Cromwell.
  • Consistency between API and CLI interfaces.
  • Sufficient test cases.
  • Documentation on Read The Docs.
The accessory scripts and IPython notebooks are useful to:
  • Monitor the resource usages of workflows running in Cromwell.
  • Visualize the workflows benchmarking metrics.

Installation

1. (optional and highly recommended) Create a Python 3 virtual environment locally and activate it: e.g. virtualenv -p python3 myenv && source myenv/bin/activate

  1. Install (or upgrade) Cromwell-tools from PyPI:
pip install -U cromwell-tools
  1. You can verify the installation by:
cromwell-tools --version

Usage

Python API

In Python, you can import the package with:

import cromwell_tools.api as cwt
cwt.submit(*args)

assuming args is a list of arguments needed.

For more details, please check the tutorial on Read the Docs.

Commandline Interface

This package also installs a command line interface that mirrors the API and is used as follows:

$> cromwell-tools -h
usage: cromwell-tools [-h]
                  {submit,wait,status,abort,release_hold,query,health}
                  ...

positional arguments:
  {submit,wait,status,abort,release_hold,query,health}
                        sub-command help
    submit              submit help
    wait                wait help
    status              status help
    abort               abort help
    release_hold        release_hold help
    query               query help
    health              health help

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit

A set of sub-commands to submit, query, abort, release on-hold workflows, wait for workflow completion and determining status of jobs are exposed by this CLI.

For more details, please check the tutorial on Read the Docs.

Testing

To run tests:

Run Tests with Docker

Running the tests within docker image is the recommended way, to do this, you need to have docker-daemon installed in your environment. From the root of the cromwell-tools repo:

cd cromwell_tools/tests && bash test.sh

Run Tests with local Python environment

  • If you have to run the tests with your local Python environment, we highly recommend to create and activate a virtualenv with requirements before you run the tests:
virtualenv test-env
source test-env/bin/activate
pip install -r requirements.txt -r requirements-test.txt
  • Finally, from the root of the cromwell-tools repo, run the tests with:
python -m pytest --cov=cromwell_tools cromwell_tools/tests

Note

Which version of Python is used to run the tests here depends on the virtualenv parameter. You can use virtualenv -p to choose which Python version you want to create the virtual environment.

Development

Code Style

The cromwell-tools code base is complying with the PEP-8 and using Black to format our code, in order to avoid "nitpicky" comments during the code review process so we spend more time discussing about the logic, not code styles.

In order to enable the auto-formatting in the development process, you have to spend a few seconds setting up the pre-commit the first time you clone the repo:

  1. Install pre-commit by running: pip install pre-commit (or simply run pip install -r requirements.txt).
  2. Run pre-commit install to install the git hook.

Once you successfully install the pre-commit hook to this repo, the Black linter/formatter will be automatically triggered and run on this repo. Please make sure you followed the above steps, otherwise your commits might fail at the linting test!

If you really want to manually trigger the linters and formatters on your code, make sure Black and flake8 are installed in your Python environment and run flake8 DIR1 DIR2 and black DIR1 DIR2 --skip-string-normalization respectively.

Dependencies

When upgrading the dependencies of cromwell-tools, please make sure requirements.txt, requirements-test.txt and setup.py are consistent!

Documentation

To edit the docmentation and rebuild it locally, make sure you have Sphinx installed. You might also want to install the dependencies for building the docs: pip install requirements-docs.txt. Finally from within the root directory, run:

sphinx-build -b html docs/ docs/_build/

and then you could preview the built documentation by opening docs/_build/index.html in your web browser.

Publish on PyPI

To publish a new version of Cromwell-tools on PyPI:

  1. Make sure you have an empty dist folder locally.
  2. Make sure you have twine installed: pip install twine.
  3. Build the package: python setup.py sdist bdist_wheel
  4. Upload and publish on PyPI: twine upload dist/* --verbose, note you will need the username and password of the development account to finish this step.

Contribute

Coming soon... For now, feel free to submit issues and open a PR, we will try our best to address them.

cromwell-tools's People

Contributors

ambrosejcarr avatar benjamincarlin avatar danxmoran avatar jeremyhofer avatar jsotobroad avatar kgalens avatar lbergelson avatar rexwangcc avatar samanehsan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cromwell-tools's Issues

Convert unittest classes to pytest for parameterization

  • Testing could be significantly simplified by parametrization across, e.g. secrets files vs username + password + url.
  • unittest.TestCase objects are not compatible with pytest parametrization.
  • This change would require replacing setUp and tearDown with their pytest cognates

[Testing] Add tests for multiple versions of Crowmell

At this moment, there are only unit test cases for this package. As a user, I'd be more confident to use it against my Cromwell if it has been tested against multiple versions of Cromwell, (e.g. v34-v41). It would be great to provide a list of "fully supported" versions of Cromwell on the main page.

RFC: Drop python 2.7 compatibility

  • Python 2.7 compatibility prevents the use of some helpful tools (e.g. shutil.which for path checking)
  • Python 2.7 is the past

Can we drop support for 2.7 in cromwell-tools?

[Discussion] Whether to support storing credentials locally

Right now each time you run the cromwell-tools command, it's required to pass in the auth params, such as --username, --password and --service-account-key, it has been a burden to users, since it is too much to type when you just wanna abort a workflow but you have to:

cromwell-tools abort \
--url xxx \
--username xxx \
---password xxx \
UUID

Some other CLI tools, such as aws, gsutilor kubectl are supporting caching the credentials under ~/.aws or ~/.config, so that you could leverage xxx auth select or xxx config use-credentials to switch between stored credentials/roles. It would somehow make the tool less secure, but incredibly help improve the user-friendliness. This issue serves as a discussion area to talk if we should add that feature in cromwell-tools, which will cover some commands like:

  • cromwell-tools auth add
  • cromwell-tools auth select
  • cromwell-tools auth remove
  • cromwell-tools auth list

Starter being able to start non-subworkflow'd wdls

It came up that the starter currently expects that the wdl you are trying to start is not a single wdl but has imports/subworkflows. @dshiga said it doesn't work for a single wdl file wdl. We should update this so it works for more people.

[CI/CD] Migrate CI/CD to Github Actions

Github Actions runs much faster than Travis CI nowadays and provides more granular Github events monitoring than latter, it seems to be a good idea to migrate before setting up more tests for cromwell-tools.

Error when running with no input arguments

When I try to run cromwell-tools without specifying a command I get the following error:

๐Ÿ‹  ~ 1.8 cromwell-tools
Traceback (most recent call last):
  File "/usr/local/bin/cromwell-tools", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/cromwell_tools/cli.py", line 258, in main
    command, args = parser(arguments)
  File "/usr/local/lib/python3.7/site-packages/cromwell_tools/cli.py", line 250, in parser
    command = getattr(CromwellAPI, args['command'])
TypeError: getattr(): attribute name must be string

I would expect you'd get usage info instead of a crash.

Automate the publish process

As a developer, I want to automate the process of publishing this package on PYPI, basically I wish to publish the package by just cutting off a new release on Github.

Add utilities to inject & collect monitoring log info

Cromwell-on-GCP exposes a runtime option to inject a monitoring script into every task in a workflow, but (as far as I can tell) doesn't expose a way to collect the generated log paths via its API. For users who want to make use of monitoring, I think adding a new collect-monitoring-logs command to cromwell-tools would be useful. The command would:

  1. Take a workflow ID,
  2. Scrape the corresponding Cromwell execution directory for files named monitoring.log,
  3. Output a dict of task ID -> raw monitoring log output.

Users could then bolt their own interpreters onto this command to parse their specific log outputs.

@benjamincarlin has been working on code to do this, and we thought it'd be good to contribute back here instead of setting up yet-another-repo. Would a PR be welcome?

Make authentication optional?

Hi, would it be possible to make authentication optional? We run cromwell in a walled environment and would prefer not to deal with authentication. Currently if no form of authentication is passed to start_workflow this exception is raised from `harmonize_credentials:

 File "envs/pipeflow/src/cromwell-tools/cromwell_tools/cromwell_tools.py", line 184, in start_workflow
    auth, headers = _get_auth_credentials(cromwell_user=user, cromwell_password=password, caas_key=caas_key)
  File "envs/pipeflow/src/cromwell-tools/cromwell_tools/cromwell_tools.py", line 59, in _get_auth_credentials
    cromwell_user, cromwell_password = harmonize_credentials(secrets_file, cromwell_user, cromwell_password)
  File "envs/pipeflow/src/cromwell-tools/cromwell_tools/cromwell_tools.py", line 43, in harmonize_credentials
    raise ValueError('One form of cromwell authentication must be provided, please pass '
ValueError: One form of cromwell authentication must be provided, please pass either cromwell_user and cromwell_password or a secrets_file.

cromwell-tools does not support python3.5

Problem:
I'm trying to use cromwell-tools in a project that is built to support python3.5 (currently the minimum python3 version available on the hosts I'm targeting). When trying to execute there are syntax errors within the cromwell-tools distribution due to the use of f strings, which are a python3.6 feature.

Looking at closed issue tickets, as well as the PR for refactoring to remove support for python2, it looks like python3.5 was mentioned as a minimum supported version. Links to these items here:
#26
#43

Proposed Solutions:

  1. Update code to support python3.5 (and possibly other earlier versions) by removing the use of f strings.
  2. Update documentation to specify that python3.6 is the minimum supported version as opposed to simply python3. Additionally, include python_requires>=3.6 in the setup call in setup.py

[Bug] jes_gcs_root is always default

If using a service account key credentials file, the CromwellAPI.submit function ignores the existing jes_gcs_root attribute in the user-supplied options file. When compose_oauth_options_for_jes_backend_cromwell is called here, no execution_bucket is passed in to override the default of None.

def compose_oauth_options_for_jes_backend_cromwell(
auth: CromwellAuth,
cromwell_options_file: io.BytesIO = None,
execution_bucket: str = None,
) -> io.BytesIO:

jes_gcs_root then automatically defaults to the gs://%s-cromwell-execution/caas-cromwell-executions bucket each time (as long as you're using a service account key):

options_json.update(
{
'jes_gcs_root': execution_bucket
or 'gs://%s-cromwell-execution/caas-cromwell-executions' % google_project,
'google_project': google_project,
'user_service_account_json': json.dumps(auth.service_key_content),
'google_compute_service_account': auth.service_key_content['client_email'],
}
)

[Feature Request] A diagnostic/analysis toolkit for workflows development

It would be great if cromwell-tools can provide a toolkit focusing on diagnostic/analysis toolkit for workflows development, some use cases are:

  • I want to know the 5 statistics (avg, max, median...) and the average workflow runtime, given a workflow identifier or during a time window, so that I can have a better understanding of my workflow performance.
  • I want to know the 5 statistics (avg, max, median...) and the average of individual tasks and sub-workflows of a workflow, so that I can easily find the bottleneck of my workflow.
  • I want to get some serialized/well-parsed formats of the workflow failures in order to debug my workflows.
  • I want to have some sort of methods to predict the runtime, resource requirements of my workflow(s), such as Linear Regression, SVM, NNs, etc. based on my historical workflow data, in order to optimize and productionize my workflows.

...

HTTPError Raised without including requests.Response object

When handling requests.exceptions.HTTPError raised by cromwell_tools.api, it is useful to check the status code of the HTTPError. While this is included in the exception as part of a string, access would be easier if the response object was included on the HTTPError itself as opposed to trying to parse the status code from the string.

The HTTPError inherits from RequestException which accepts a 'response' keyword argument. See source: https://github.com/psf/requests/blob/master/requests/exceptions.py

Incoming pull request with suggested change.

Add a "validate" command?

On Green (and soon on Monster), CI for WDL development follows the pattern:

  1. Run a WDL against some collection of input JSONs
  2. Wait for all the runs to complete
  3. Feed the outputs of each run into a "validation WDL", along with a corresponding set of "truth" inputs
  4. Wait for all the validation runs to complete

The test runner has typically had special knowledge about each pipeline, but it feels like it should be possible to generate the core functionality.

I'm not sure if cromwell-tools would be the best place for that functionality to live (maybe there could be a new cromwell-test-tools package?), but here's an idea for what the command might look like:

$> cromwell-tools validate -h
usage: cromwell-tools validate [-h] [-c CROMWELL_URL] [-u USERNAME] [-p PASSWORD]
                               [-s SECRETS_FILE]
                               --wdl-file WDL_FILE
                               --validation-wdl-file WDL_FILE
                               [--dependencies-json DEPENDENCIES_JSON]
                               --inputs-batch INPUTS_JSON1:TRUTH_OUTPUTS1
                               [--inputs-batch INPUTS_JSON2:TRUTH_OUTPUTS2 ...]
                               --inputs-json INPUTS_JSON
                               [--inputs2-json INPUTS2_JSON]
                               [--options-file OPTIONS_FILE]

The command would do something like:

  1. Parse the inputs-jsons out of each inputs-batch arg
  2. Submit each input along with wdl-file, dependencies-json, inputs2-json, and options-file using the existing submit logic
  3. Wait for each workflow to finish using the existing wait logic
  4. Query the outputs of each workflow, getting something like:
     {
       "MyMainWorkflow.out1": "gs://....",
       "MyMainWorkflow.out2": 10
     }
  5. Strip the main workflow's name from the outputs, and combine with the truth outputs to build an input JSON for the validation WDL:
    {
      "ValidateMyMainWorkflow.test": {
        "out1": "gs://....",
        "out2": 10
      },
      "ValidateMyMainWorkflow.truth": {
        "out1": "gs://....",
        "out2": 20
      }
    }
  6. Run the validation WDL on each of these constructed inputs using the existing submit logic
  7. Wait for each validation run to finish using the existing wait logic

Does this sound like logic that'd make sense to live in cromwell-tools? I'd be happy to put a PR together for it if so.

Exception when trying to use "submit" command post-refactor

The stack trace is:

Traceback (most recent call last):
 File "/usr/local/bin/cromwell-tools", line 11, in <module>
   sys.exit(main())
 File "/usr/local/lib/python2.7/site-packages/cromwell_tools/cli.py", line 104, in main
   command, args = parser(arguments)
 File "/usr/local/lib/python2.7/site-packages/cromwell_tools/cli.py", line 95, in parser
   auth = CromwellAuth.harmonize_credentials(**args)
TypeError: harmonize_credentials() got an unexpected keyword argument 'inputs_file'

As far as I can tell from the code, I think this will be a problem for all commands that ask for auth, because all the args are getting fed into harmonize_credentials.

There needs to be "retry" around `wait`

Sometimes when you just submit a workflow, the /status endpoint, which is consumed by the wait command, will return 404 instead of the status of the actual workflow, this will break the wait. We need to add retry logic around the wait command!

[Feature Request] Add support for gs path

At this moment, the submit command supports local path and HTTP(s) path for WDLs and JSON files, it would be much more helpful if it could also support reading from gs:// paths!

Add contribute sections to the readme

Adding a contributors section like this would encourage more (external and internal) people to contribute to this tool.

There should also be a "How-to contribute" section along with this.

Duplicate logging when using CromwellAuth.from_no_authentication

Using cromwell-tools 1.1.2 (python 3.7.0), I was receiving duplicate log messages after importing CromwellAuth module and calling from_no_authentication function. This seems to be happening because the warning function was called on the logging module (instead of an instance of Logger). See below on how to replicate the issue:

import logging

logger = logging.getLogger("my_module")
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter(
        "[%(asctime)s %(name)s] [%(levelname)s] " +
        "[%(filename)s:%(lineno)s - %(funcName)s()] %(message)s"
)

stream = logging.StreamHandler()
stream.setFormatter(formatter)
stream.setLevel(logging.DEBUG)
logger.addHandler(stream)

logger.debug("test1")

from cromwell_tools.cromwell_auth import CromwellAuth
auth = CromwellAuth.from_no_authentication("http://localhost")

logger.debug("test2")
[2019-04-09 11:27:48,260 my_module] [DEBUG] [test-logger.py:16 - <module>()] test1
WARNING:root:You are not using any authentication with Cromwell. For security purposes, please consider adding authentication in front of your Cromwell instance!
[2019-04-09 11:27:48,547 my_module] [DEBUG] [test-logger.py:21 - <module>()] test2
DEBUG:my_module:test2

TypeError: harmonize_credentials() got an unexpected keyword argument 'caas_key'

I get the error TypeError: harmonize_credentials() got an unexpected keyword argument 'caas_key' when I attempt to run cromwell-tools on Python 3.6:

$ cromwell-tools status --uuid 305cafa1-111c-4ad1-bbae-a4a63c33a018 --url localhost:8000
Traceback (most recent call last):
  File "/usr/local/bin/cromwell-tools", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/cromwell_tools/cli.py", line 108, in main
    command, args = parser(arguments)
  File "/usr/local/lib/python3.6/dist-packages/cromwell_tools/cli.py", line 96, in parser
    auth = CromwellAuth.harmonize_credentials(**auth_arg_dict)
TypeError: harmonize_credentials() got an unexpected keyword argument 'caas_key'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.