Git Product home page Git Product logo

cloudos-cli's Introduction

lifebit-ai

cloudos-cli's People

Contributors

cgpu avatar dapineyro avatar ewartj avatar ilevantis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudos-cli's Issues

As a user, I want the jobs command to accept parameters value with `=` in the string

There are use cases where = needs to present in the string. For example - --parameter expression='GT="AA"'

cloudos job run \
    --cloudos-url $CLOUDOS \
    --apikey $MY_API_KEY \
    --workspace-id $WORKSPACE_ID \
    --project-name "$PROJECT_NAME" \
    --workflow-name $WORKFLOW_NAME \
    --parameter reads=s3://lifebit-featured-datasets/pipelines/rnatoy-data \
    --parameter genome=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa \
    --parameter annot=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff \
    --parameter expression='GT="AA"' \
    --resumable \
    --spot

Currently, this not able to parse last parameter string and give an error -

raise ValueError('Please, specify -p / --parameter using a single \'=\' ' +
ValueError: Please, specify -p / --parameter using a single '=' as spacer. E.g: input=value

As a user, I would like to be able to find existing workflows, based on a provided GitHub repo URL

Currently, we support to list the workflows present in CloudOS via cloudos workflow list

But it would be great if a user can find/filter the existing pipelines based on GitHub URL, something like

cloudos workflow find --github-url https://github.com/lifebit-ai/germline-parabricks

I think this can be done based on the raw JSON output from API call which have these GitHub URLs

Benefits -

  • This will save not create duplicate workflows with different names
  • Automate process to capture the CloudOS workflow name from a CI test

Provide machine readable API response json

When launching jobs or performing other operations, the cloudos-cli provides human readable messaging which is great for the end user experience.

In the case that the user would like to parse a machine readable file eg JSON, the cloudos-cli currently doesn't provide it.

Currently we provide a message after successful job submission similar to:

CloudOS python package: a package for interacting with CloudOS.

Version: 0.1.2

CloudOS job functionality: run and check jobs in CloudOS.

Executing run...
        Job successfully launched to CloudOS, please check the following link: https://cloudos.lifebit.ai/app/jobs/xxxxxxxxxxx
        Your assigned job id is: xxxxxxxxxxx
        Your current job status is: initializing
        To further check your job status you can either go to https://cloudos.lifebit.ai/app/jobs/62ddaa41f11b7301476dd768 or use the following command:
        cloudos job status \
                --apikey $MY_API_KEY \
                --cloudos-url https://cloudos.lifebit.ai \
                --job-id xxxxxxxxxxx

Add a response json file as well, possibly optional when the user specifies --output-format json.

As a user, I would like to have an option to print the raw API call

Request summary

As a user I would like to have the option to see what is the curl command that was formed to run this outside of the python client if I need to.

Suggestion

Implement an option named --print-api-call to also print this to the user.

This can go in combination with a --dry-run parameter, if the user want to only see the command and not send the API call.

Improve handling of SSL certificate verification for environments with specific network constraints

Overview of the issues

  1. The package doesn't work out of the box in all versions for all our end users environments
  2. The package in the edge version created for one of our end users organisation, is logging a lot of warnings that are confusing the end user, since warnings from a third part library are jointly emitted with our library's messages.

Steps to reproduce

This is observed in the edge version quay.io/lifebitaiorg/cloudos-py:v0.0.8bi when executed in the respective environment.

Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings

When TLS/SSL verification is set to False the user experiences a lot of logging related to the recommendation of enforcing verification and it is difficult to understand if the command was executed successfully or if there is an error.

Command used

cloudos job run     \
   --cloudos-url $CLOUDOS  \
   --apikey $MY_API_KEY     \
   --workspace-id $WORKSPACE_ID     \
   --project-name "$PROJECT_NAME"    \
   --job-name $JOB_NAME     \
   --workflow-name $WORKFLOW_NAME     \
   --nextflow-profile $NEXTFLOW_PROFILE   \
   --resumable   \
   --spot     \
   --batch

Command output

CloudOS python package: a package for interacting with CloudOS.

CloudOS job functionality: run and check jobs in CloudOS.

Executing run...
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
        Job successfully launched to CloudOS, please check the following link: https://prod.cloudos.aws.com/app/
jobs/630781f26d388a0149ecffb2
        Your assigned job id is: 630781f26d388a0149ecffb2
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
        Your current job status is: initializing
        To further check your job status you can either go to https://prod.cloudos.aws.com/app/jobs/630781f26d388a0149ecffb2 or use the following command:
cloudos job status \
    --apikey $MY_API_KEY \
    --cloudos-url https://prod.cloudos.aws.com \
    --job-id 630781f26d388a0149ecffb2

there are warnings that the user wouldn't need to view in a non-verbose mode

/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
 being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(

Suggestions for handling this case

a. Add global switch off flag for SSL verification as a flag cloudos job run --verify false

We should consider adding the verify ssl as a package metaflag and also introduce a verbosity level flag to allow the user to choose if they would like to see all the logs from dependencies or keep it clean to the cloudos-cli logs only, eg

cloudos job run --verify-certificates false --verbosity warn
cloudos job run --verify-certificates false --verbosity error

This could be achieved by using a context manager that monkey patches requests and changes it so that verify=False is the default and suppresses the warning (not recommended but possible).

Source: https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests

b. Add global switch off flag for warnings from third party libraries

Resources:

c. Add try/except handling to retry with no verification upon failure with verify=True (โŒ mitm attacks vulnerability)

Try/except would be a good approach that would ensure the package works out of the box for all our end users environments.

image

image

Note: This is bad practice since it leaves the library vulnerable for man-in-the-middle (MitM) attacks. Not recommended.

  #: SSL Verification default.
  #: Defaults to `True`, requiring requests to verify the TLS certificate at the
  #: remote end.
  #: If verify is set to `False`, requests will accept any TLS certificate
  #: presented by the server, and will ignore hostname mismatches and/or
  #: expired certificates, which will make your application vulnerable to
  #: man-in-the-middle (MitM) attacks.
  #: Only set this to `False` for testing.
  self.verify = True

d. Allow the user to provide a path to the SSL certificates (better practice โœ… )

Source: https://www.geeksforgeeks.org/ssl-certificate-verification-python-requests/

Allow the user to provide the location of the ssl certificates (very common usecase with our end users in elevated security environments)

Screenshot 2022-08-26 at 15 01 26

e. Set these configurations persistently to reduce duplication

image

Cohort class should use CohortBrowser methods instead of redefining them

Methods from CohortBrowser such as CohortBrowser.get_phenotype_metadata() and CohortBrowser.list_cohorts() will be useful within the Cohort class. To import these without causing circular imports one of two solutions should be used:

Option 1

  • remove import of Cohort submodule at the top of cohort_browser.py and import the Cohort submodule during the two relevant methods in the CohortBrowser class:
    def load_cohort(self, cohort_id=None, cohort_name=None):
        from cloudos.cohorts import Cohort
        return Cohort.load(self.apikey, self.cloudos_url, self.workspace_id,
                           cohort_id=cohort_id, cohort_name=cohort_name)
  • Add import of CohortBrowser submodule at the top of cohort.py. Within methods of the Cohort class, a CohortBrowser object can be constructed when needed to access its methods.

Option 2

  • import CohortBrowser submodule during the init function of the Cohort class and create a private variable to hold a CohortBrowser object:
    def __init__(...):
        ...
        from cloudos.cohorts import CohortBrowser
        self.__cb = CohortBrowser(apikey=self.apikey,
                                  cloudos_url=self.cloudos_url,
                                  workspace_id=self.workspace_id)
        ...
  • Within methods of the Cohort class, the private variable can be used to access the CohortBrowser methods

Convert all flag only parameters to flag-value pairs (--resumable, --spot)

The consistency will make the use in other workflow management systems (Nextflow, WDL) or wrappers (Github Action syntax for workflow_dispatch).

The current implementation requires idiomatic handling from Nextflow, example below

spot = params.spot ? "--spot" : ""


process this {
   input:
   
   output:
   
   script:
   """
   cloudos job run .. ${params.spot}
   """
}

Empty lines are not being ignored when parsing json with --job-params

Bug Description

When using the cloudOS package I ran into the following error:

ValueError: Please, specify your parameters in test.config using the '=' char as spacer. E.g: name = my_name

The test.config contained an empty line, which caused the package to break.

params {
    csv = "s3://lifebit-featured-datasets/pipelines/pcgr/testdata/test_1/testdata.csv"
    metadata = "$baseDir/testdata/metadata.csv"
    genome = 'grch38'

    max_cpus = 2
    max_memory = 4.GB
}

After removing this line, the problem was solved

Expected behaviour

I would expect that empty lines in the config file to be ignored, not breaking the program.

Version tested

This was tested on a local installation of the following commit id

image

Use explicit if clause to check workflow type

This is implicit (aka a bit anti-pythonic ๐Ÿ˜„ ), it says "when workflow is not WDL", but unless we read the if clause, we don't know what that means.

Additionally, in the future we will support a third type of workflow, docker for Docker Batch jobs in CloudOS.
eg https://cloudos.lifebit.ai/public/jobs/61793a0888e0c901db2e3603.

In that case you would need to be explicit and reimplement as

                if workflow_type == 'nextflow':

This would be extended for all the job types we will support in the future.

Yoda says:
https://speakerdeck.com/jennybc/code-smells-and-feels?slide=45
image

Originally posted by @cgpu in #73 (comment)

Rename --jobs-params to --pipeline-config

Currently the implementation only allows for a use of config file and not params.

This is a great first implementation as the format of the configs is interoperable with Nextflow and can be used as Nextflow configs as well. However the flag name is a bit deceiving for the user, as they are not allowed to add --job-params directly.

I recommend renaming it for the time being, to be more clear and explicit for the user.

Collect all available jobs

The response to a job list request made using cloudos job list command is paginated. It means that only a certain number (30) of the last jobs are actually returned.

Create an option to retrieve all the stored jobs.

As a user, I would like to have an option to create new projects in CloudOS

The current behaviour is that you have to provide an existing and valid project name in order to run a job using --project-name parameter. That project has to be already present in cloudOS:

Screenshot from 2022-09-08 12-31-44

As a user, I would like that the package first checks if the project already exist (linked to #30) and then create a new project if it does not exist.

As a user, I would like to get intuitive error messages if one of the resources I am using does not exist (workflow, project)

When the project or workflow don't exist, the suer doesn't get any intuitive feedback.

Some ideas below and we can discuss what is better for the users:

Add if clauses early to catch with workflow list or project list (note, we need to implement this as well), and if they don't exist

  1. fail early and give the user an intuitive message that nudges them to first create them and then submit the job
  2. create the project silently to abstract this from the user (the workflow could be defined as the repo link)

Multiple CloudOS environments support via local configuration

Multiple CloudOS environments support via local configuration will be helpful to use multiple CloudOS instances.

Example: A config in local system ~/.cloudos/envs

[prod]
cloudos_base_url="https://cloudos.lifebit.ai/"
cloudos_workspace=xxxx
cloudos_token=xxxx
[stg]
cloudos_base_url="https://staging.lifebit.ai/"
cloudos_workspace=xxxx
cloudos_token=xxxx

Usability in the pipeline -

cloudos run job list --cloudos-profile prod

This also needs to align with - https://github.com/lifebit-ai/cloudos#configure-cloudos

Cohort class has no __repr__ method

Cohort class has no .__repr__() method. Should add one to make interactive work nicer. Make sure to display key info:

  • cohort_id
  • cohort_name
  • cohort_desc
  • num_participants
  • query

--cloudos-url failing when trailing backlash is present

The following command is working as expected

cloudos job run --nextflow-profile vcf_ldak_gencor --cloudos-url https://staging.lifebit.ai --apikey XXXXXXXXXX --workspace-id 62569d97ab755c0136140579 --workflow-name bi-traits-nf --project-name downstream-benchmarking --resumable --spot

Resulting run -> https://staging.lifebit.ai/public/jobs/62fb8b85c0baa20147b021ba

Unfortunately, is the --cloudos-url has a trailing backlash it fails with the following error:

cloudos job run --nextflow-profile vcf_ldak_gencor --cloudos-url https://staging.lifebit.ai/ --apikey XXXXXXXX --workspace-id 62569d97ab755c0136140579 --workflow-name bi-traits-nf --project-name downstream-benchmarking --resumable --spot
CloudOS python package: a package for interacting with CloudOS.

Version: 1.0.0

CloudOS job functionality: run and check jobs in CloudOS.

Executing run...
Traceback (most recent call last):
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/bin/cloudos", line 8, in <module>
    sys.exit(run_cloudos_cli())
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/__main__.py", line 169, in run
    workflow_type = cl.detect_workflow(workflow_name, workspace_id)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/clos.py", line 258, in detect_workflow
    my_workflows = self.process_workflow_list(my_workflows_r)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/clos.py", line 234, in process_workflow_list
    my_workflows = json.loads(r.content)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Implement wdl workflow selection by checking name, importsFile and platform

camelCase -> snake_case consistently

                self.main_file,
                self.imports_file,
                self.platform

In cases when the same repo has been imported with the same name, because we use the first one from the workflows list that matches the name without checking the importsFile, this can create issues. The stack trace is not very informative atm so let's ensure we check before assigning the workflow apart from the name also importsFile and also the platform (which I wish was named git_provider ๐Ÿ˜„ to be more explicit, but let's keep the consistency with the API).
Screenshot 2022-07-15 at 21 11 39

Screenshot 2022-07-15 at 21 09 53

Originally posted by @cgpu in #53 (comment)

As a user, I would like to be able to define one or more profiles with cloudos job run

Here is the request payload from CloudOS to indicate the syntax where profiles should be defined.

relevant snippet:

{
   "workflow":"6256e159ab755c013614606d",
   "project":"62569f8fab755c01361408d0",
   "parameters":[{ "name":"config", "prefix":"--", "parameterKind":"textValue", "textValue":"conf/binary_gcta_gc.config"}],
   "executionPlatform":"aws",
   "storageMode":"regular",
   "name":"binary_gcta_gc | 2 profiles combo (standard for --config, awsbatch)",
   "saveProcessLogs":true,
   "revision":{ "commit":"9096ca04ea6baf7058be86c4db893eaca6b824fb","tag":"","branch":""},
+   "profile":"standard,awsbatch",
   "execution":{ "computeCostLimit":30, "optim":"test"},
   "spotInstances":null,
   "masterInstance":{ "requestedInstance":{"type":"c5.xlarge","asSpot":false}},
   "instanceType":"c5.xlarge"
}

formatted json (same content):

{
   "workflow":"6256e159ab755c013614606d",
   "project":"62569f8fab755c01361408d0",
   "parameters":[
      {
         "name":"config",
         "prefix":"--",
         "parameterKind":"textValue",
         "textValue":"conf/binary_gcta_gc.config"
      }
   ],
   "executionPlatform":"aws",
   "storageMode":"regular",
   "name":"binary_gcta_gc | 2 profiles combo (standard for --config, awsbatch)",
   "saveProcessLogs":true,
   "revision":{
      "commit":"9096ca04ea6baf7058be86c4db893eaca6b824fb",
      "tag":"",
      "branch":""
   },
   "profile":"standard,awsbatch",
   "execution":{
      "computeCostLimit":30,
      "optim":"test"
   },
   "spotInstances":null,
   "masterInstance":{
      "requestedInstance":{
         "type":"c5.xlarge",
         "asSpot":false
      }
   },
   "instanceType":"c5.xlarge"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.