lifebit-ai / cloudos-cli Goto Github PK
View Code? Open in Web Editor NEWPython library and Command Line Interface for interacting with Lifebit Applications
Python library and Command Line Interface for interacting with Lifebit Applications
Codefactor flags a method as complex.
this is possibly because of the very nested if elif streak.
Refactor to only use if
(at the extend possible)
https://github.com/lifebit-ai/cloudos-py/blob/dev/cloudos/__main__.py#L123-L255
There are use cases where =
needs to present in the string. For example - --parameter expression='GT="AA"'
cloudos job run \
--cloudos-url $CLOUDOS \
--apikey $MY_API_KEY \
--workspace-id $WORKSPACE_ID \
--project-name "$PROJECT_NAME" \
--workflow-name $WORKFLOW_NAME \
--parameter reads=s3://lifebit-featured-datasets/pipelines/rnatoy-data \
--parameter genome=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.Ggal71.500bpflank.fa \
--parameter annot=s3://lifebit-featured-datasets/pipelines/rnatoy-data/ggal_1_48850000_49020000.bed.gff \
--parameter expression='GT="AA"' \
--resumable \
--spot
Currently, this not able to parse last parameter string and give an error -
raise ValueError('Please, specify -p / --parameter using a single \'=\' ' +
ValueError: Please, specify -p / --parameter using a single '=' as spacer. E.g: input=value
Currently, we support to list the workflows present in CloudOS via cloudos workflow list
But it would be great if a user can find/filter the existing pipelines based on GitHub URL, something like
cloudos workflow find --github-url https://github.com/lifebit-ai/germline-parabricks
I think this can be done based on the raw JSON output from API call which have these GitHub URLs
Benefits -
Currently, when using cloudos job list
, the user only gets the first page of the jobs and not all of the jobs in the workspace.
Implement an option to allow the user to retrieve all the available information up to N jobs.
When launching jobs or performing other operations, the cloudos-cli
provides human readable messaging which is great for the end user experience.
In the case that the user would like to parse a machine readable file eg JSON, the cloudos-cli
currently doesn't provide it.
Currently we provide a message after successful job submission similar to:
CloudOS python package: a package for interacting with CloudOS.
Version: 0.1.2
CloudOS job functionality: run and check jobs in CloudOS.
Executing run...
Job successfully launched to CloudOS, please check the following link: https://cloudos.lifebit.ai/app/jobs/xxxxxxxxxxx
Your assigned job id is: xxxxxxxxxxx
Your current job status is: initializing
To further check your job status you can either go to https://cloudos.lifebit.ai/app/jobs/62ddaa41f11b7301476dd768 or use the following command:
cloudos job status \
--apikey $MY_API_KEY \
--cloudos-url https://cloudos.lifebit.ai \
--job-id xxxxxxxxxxx
Add a response json file as well, possibly optional when the user specifies --output-format json
.
As user, I would like to have the option to do cloudos job list --from dd-mm-yy to --dd-mm-yy
and have the jobs returned that have been launched between this date range.
As a user I would like to have the option to see what is the curl command that was formed to run this outside of the python client if I need to.
Implement an option named --print-api-call
to also print this to the user.
This can go in combination with a --dry-run
parameter, if the user want to only see the command and not send the API call.
Add warning here for the user to "Check if your workspace of CloudOS has AWS Batch enabled, before using this option"
+Add this in docs
Q: What happens if the user chooses this and lustre is not available in their workspace? @cgpu to test and report
This is observed in the edge version quay.io/lifebitaiorg/cloudos-py:v0.0.8bi
when executed in the respective environment.
Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
When TLS/SSL verification is set to False
the user experiences a lot of logging related to the recommendation of enforcing verification and it is difficult to understand if the command was executed successfully or if there is an error.
cloudos job run \
--cloudos-url $CLOUDOS \
--apikey $MY_API_KEY \
--workspace-id $WORKSPACE_ID \
--project-name "$PROJECT_NAME" \
--job-name $JOB_NAME \
--workflow-name $WORKFLOW_NAME \
--nextflow-profile $NEXTFLOW_PROFILE \
--resumable \
--spot \
--batch
CloudOS python package: a package for interacting with CloudOS.
CloudOS job functionality: run and check jobs in CloudOS.
Executing run...
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Job successfully launched to CloudOS, please check the following link: https://prod.cloudos.aws.com/app/
jobs/630781f26d388a0149ecffb2
Your assigned job id is: 630781f26d388a0149ecffb2
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Your current job status is: initializing
To further check your job status you can either go to https://prod.cloudos.aws.com/app/jobs/630781f26d388a0149ecffb2 or use the following command:
cloudos job status \
--apikey $MY_API_KEY \
--cloudos-url https://prod.cloudos.aws.com \
--job-id 630781f26d388a0149ecffb2
there are warnings that the user wouldn't need to view in a non-verbose mode
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is
being made to host 'prod.cloudos.aws.com'. Adding certificate verification is strongly advised. See: https://ur
llib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
cloudos job run --verify false
We should consider adding the verify ssl as a package metaflag and also introduce a verbosity level flag to allow the user to choose if they would like to see all the logs from dependencies or keep it clean to the cloudos-cli
logs only, eg
cloudos job run --verify-certificates false --verbosity warn
cloudos job run --verify-certificates false --verbosity error
This could be achieved by using a context manager that monkey patches requests and changes it so that verify=False
is the default and suppresses the warning (not recommended but possible).
Resources:
urllib3
: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warningsverify=True
(โ mitm attacks vulnerability)verify=True
fails https://stackoverflow.com/a/65363043/10312089Try/except would be a good approach that would ensure the package works out of the box for all our end users environments.
Note: This is bad practice since it leaves the library vulnerable for man-in-the-middle (MitM) attacks. Not recommended.
#: SSL Verification default.
#: Defaults to `True`, requiring requests to verify the TLS certificate at the
#: remote end.
#: If verify is set to `False`, requests will accept any TLS certificate
#: presented by the server, and will ignore hostname mismatches and/or
#: expired certificates, which will make your application vulnerable to
#: man-in-the-middle (MitM) attacks.
#: Only set this to `False` for testing.
self.verify = True
Source: https://www.geeksforgeeks.org/ssl-certificate-verification-python-requests/
Allow the user to provide the location of the ssl certificates (very common usecase with our end users in elevated security environments)
cloudos job list
--cloudos-url $CLOUDOS
--apikey $MY_API_KEY
--workspace-id $WORKSPACE_ID
--output-format csv
--all-fields
--owner "cgpu"
Methods from CohortBrowser
such as CohortBrowser.get_phenotype_metadata()
and CohortBrowser.list_cohorts()
will be useful within the Cohort
class. To import these without causing circular imports one of two solutions should be used:
Cohort
submodule at the top of cohort_browser.py
and import the Cohort
submodule during the two relevant methods in the CohortBrowser
class: def load_cohort(self, cohort_id=None, cohort_name=None):
from cloudos.cohorts import Cohort
return Cohort.load(self.apikey, self.cloudos_url, self.workspace_id,
cohort_id=cohort_id, cohort_name=cohort_name)
CohortBrowser
submodule at the top of cohort.py
. Within methods of the Cohort
class, a CohortBrowser
object can be constructed when needed to access its methods.CohortBrowser
submodule during the init function of the Cohort
class and create a private variable to hold a CohortBrowser object: def __init__(...):
...
from cloudos.cohorts import CohortBrowser
self.__cb = CohortBrowser(apikey=self.apikey,
cloudos_url=self.cloudos_url,
workspace_id=self.workspace_id)
...
Cohort
class, the private variable can be used to access the CohortBrowser
methodsThe consistency will make the use in other workflow management systems (Nextflow
, WDL
) or wrappers (Github Action syntax for workflow_dispatch
).
The current implementation requires idiomatic handling from Nextflow, example below
spot = params.spot ? "--spot" : ""
process this {
input:
output:
script:
"""
cloudos job run .. ${params.spot}
"""
}
Right now --wdl-importsfile
is mandatory from pipeline side, but the UI doesn't need them to specific
This will help parse the output of the API response, parse the extra flags and in general will serve as a utility to allow using the same container for cloudos-cli
and action-cloudos-cli
.
This will help us support the new github action repository https://github.com/lifebit-ai/action-cloudos-cli/.
This is an auxiliary change that would allow the use of the short and sweet -p
for something more repetititve as described in #63
cloudos-cli/cloudos/__main__.py
Line 66 in f8f703d
When using the cloudOS package I ran into the following error:
ValueError: Please, specify your parameters in test.config using the '=' char as spacer. E.g: name = my_name
The test.config
contained an empty line, which caused the package to break.
params {
csv = "s3://lifebit-featured-datasets/pipelines/pcgr/testdata/test_1/testdata.csv"
metadata = "$baseDir/testdata/metadata.csv"
genome = 'grch38'
max_cpus = 2
max_memory = 4.GB
}
After removing this line, the problem was solved
I would expect that empty lines in the config file to be ignored, not breaking the program.
This was tested on a local installation of the following commit id
This is implicit (aka a bit anti-pythonic ๐ ), it says "when workflow is not WDL", but unless we read the if clause, we don't know what that means.
Additionally, in the future we will support a third type of workflow, docker
for Docker Batch jobs in CloudOS.
eg https://cloudos.lifebit.ai/public/jobs/61793a0888e0c901db2e3603.
In that case you would need to be explicit and reimplement as
if workflow_type == 'nextflow':
This would be extended for all the job types we will support in the future.
Yoda says:
https://speakerdeck.com/jennybc/code-smells-and-feels?slide=45
Originally posted by @cgpu in #73 (comment)
nextflow run --input s3://lifebit/this.txt --count 5
cloudos job run -p "input=s3://lifebit/this.txt" -p "count=5"
Reference implementation in nteract/papermill#cli.py#L50-L52
@click.option(
'--parameters', '-p', nargs=2, multiple=True, help='Parameters to pass to the parameters cell.'
)
This would be handy for WDL workflow users to get quickly the --wdl-mainfile
and --wdl-importfile
information, as this is something that can make or break the WDL job submission command.
cloudos workflow list --workflow-name 'wdl-tests'
From this slack thread: https://lifebit-biotech.slack.com/archives/C0171HB4PBR/p1652864137066489
Currently the implementation only allows for a use of config file and not params.
This is a great first implementation as the format of the configs is interoperable with Nextflow and can be used as Nextflow configs as well. However the flag name is a bit deceiving for the user, as they are not allowed to add --job-params directly.
I recommend renaming it for the time being, to be more clear and explicit for the user.
The user chooses the EC2 instance and not the AMI, we will have to correct the phrasing in the docs and docstrings.
The response to a job list request made using cloudos job list
command is paginated. It means that only a certain number (30) of the last jobs are actually returned.
Create an option to retrieve all the stored jobs.
+ resumable,
+ batch {enabled: batch_boolean_flag}
This addition will enable the user to select if the job will run with ignite
(batch {enabled: false}
) or AWS Batch (batch {enabled: true}
)
The current behaviour is that you have to provide an existing and valid project name in order to run a job using --project-name
parameter. That project has to be already present in cloudOS:
As a user, I would like that the package first checks if the project already exist (linked to #30) and then create a new project if it does not exist.
When the project
or workflow
don't exist, the suer doesn't get any intuitive feedback.
Some ideas below and we can discuss what is better for the users:
Add if clauses early to catch with workflow list
or project list
(note, we need to implement this as well), and if they don't exist
project
silently to abstract this from the user (the workflow could be defined as the repo link)Multiple CloudOS environments support via local configuration will be helpful to use multiple CloudOS instances.
Example: A config in local system ~/.cloudos/envs
[prod]
cloudos_base_url="https://cloudos.lifebit.ai/"
cloudos_workspace=xxxx
cloudos_token=xxxx
[stg]
cloudos_base_url="https://staging.lifebit.ai/"
cloudos_workspace=xxxx
cloudos_token=xxxx
Usability in the pipeline -
cloudos run job list --cloudos-profile prod
This also needs to align with - https://github.com/lifebit-ai/cloudos#configure-cloudos
Currently the error the user gets looks like this.
It would be a nice UX addition to help the users that are not familiar with python or reading the python stack trace.
Similar to #54
There are no helpful error message when there is error due to a workflow or project not present in a workspace when submitting a job
It would be helpful to check them before head and also the user to list
them
More info to be added here by @sk-sahu
Cohort class has no .__repr__()
method. Should add one to make interactive work nicer. Make sure to display key info:
cohort_id
cohort_name
cohort_desc
num_participants
query
The following command is working as expected
cloudos job run --nextflow-profile vcf_ldak_gencor --cloudos-url https://staging.lifebit.ai --apikey XXXXXXXXXX --workspace-id 62569d97ab755c0136140579 --workflow-name bi-traits-nf --project-name downstream-benchmarking --resumable --spot
Resulting run -> https://staging.lifebit.ai/public/jobs/62fb8b85c0baa20147b021ba
Unfortunately, is the --cloudos-url
has a trailing backlash it fails with the following error:
cloudos job run --nextflow-profile vcf_ldak_gencor --cloudos-url https://staging.lifebit.ai/ --apikey XXXXXXXX --workspace-id 62569d97ab755c0136140579 --workflow-name bi-traits-nf --project-name downstream-benchmarking --resumable --spot
CloudOS python package: a package for interacting with CloudOS.
Version: 1.0.0
CloudOS job functionality: run and check jobs in CloudOS.
Executing run...
Traceback (most recent call last):
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/bin/cloudos", line 8, in <module>
sys.exit(run_cloudos_cli())
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/__main__.py", line 169, in run
workflow_type = cl.detect_workflow(workflow_name, workspace_id)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/clos.py", line 258, in detect_workflow
my_workflows = self.process_workflow_list(my_workflows_r)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/site-packages/cloudos/clos.py", line 234, in process_workflow_list
my_workflows = json.loads(r.content)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/inesmendes/opt/anaconda3/envs/cloudos-api/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
camelCase -> snake_case
consistently
self.main_file,
self.imports_file,
self.platform
In cases when the same repo has been imported with the same name, because we use the first one from the workflows list that matches the name without checking the importsFile
, this can create issues. The stack trace is not very informative atm so let's ensure we check before assigning the workflow apart from the name
also importsFile
and also the platform
(which I wish was named git_provider
๐ to be more explicit, but let's keep the consistency with the API).
Originally posted by @cgpu in #53 (comment)
Here is the request payload from CloudOS to indicate the syntax where profiles
should be defined.
relevant snippet:
{
"workflow":"6256e159ab755c013614606d",
"project":"62569f8fab755c01361408d0",
"parameters":[{ "name":"config", "prefix":"--", "parameterKind":"textValue", "textValue":"conf/binary_gcta_gc.config"}],
"executionPlatform":"aws",
"storageMode":"regular",
"name":"binary_gcta_gc | 2 profiles combo (standard for --config, awsbatch)",
"saveProcessLogs":true,
"revision":{ "commit":"9096ca04ea6baf7058be86c4db893eaca6b824fb","tag":"","branch":""},
+ "profile":"standard,awsbatch",
"execution":{ "computeCostLimit":30, "optim":"test"},
"spotInstances":null,
"masterInstance":{ "requestedInstance":{"type":"c5.xlarge","asSpot":false}},
"instanceType":"c5.xlarge"
}
formatted json (same content):
{
"workflow":"6256e159ab755c013614606d",
"project":"62569f8fab755c01361408d0",
"parameters":[
{
"name":"config",
"prefix":"--",
"parameterKind":"textValue",
"textValue":"conf/binary_gcta_gc.config"
}
],
"executionPlatform":"aws",
"storageMode":"regular",
"name":"binary_gcta_gc | 2 profiles combo (standard for --config, awsbatch)",
"saveProcessLogs":true,
"revision":{
"commit":"9096ca04ea6baf7058be86c4db893eaca6b824fb",
"tag":"",
"branch":""
},
"profile":"standard,awsbatch",
"execution":{
"computeCostLimit":30,
"optim":"test"
},
"spotInstances":null,
"masterInstance":{
"requestedInstance":{
"type":"c5.xlarge",
"asSpot":false
}
},
"instanceType":"c5.xlarge"
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.