lifebit-ai / cloudos Goto Github PK
View Code? Open in Web Editor NEWcloudos: An R package for the Lifebit CloudOS Platform
Home Page: http://lifebit-ai.github.io/cloudos/
License: Other
cloudos: An R package for the Lifebit CloudOS Platform
Home Page: http://lifebit-ai.github.io/cloudos/
License: Other
@sk-sahu thanks for being on the Jenny Bryan side of life π !
We can update R functions replacing else
with the explicit if function in the following places:
So in principle as an example the following snippet, when if
-ified would lokk like this:
Lines 37 to 46 in 1e1ec00
if (!r$status_code == 200) {
stop("Something went wrong. Not able to create a cohort")
}
if (r$status_code == 200){
message("Cohort named ", cohort_name, " created successfully. Bellow are the details")
res <- httr::content(r)
# into a dataframe
res_df <- do.call(rbind, res)
colnames(res_df) <- "details"
return(res_df)
}
Originally posted by @cgpu in #5 (comment)
cloudos_whoami() should output the location of the config file if there is one.
Documentation stating where the config file is located also needs to be changed.
At the moment only for few terms works in cb_search_phenotypic_filters()
cb_search_phenotypic_filters(cloudos = my_cloudos, term = "cancer")
term
-> phenotype
(take a vote on this)Error in GEL
Error coming from
Lines 126 to 127 in 9c4e581
Error Reason
Once of the columns ("Date Of Death") is completely empty thoughout the returned JSON
res <- httr::content(r)
not able to parse it properly. That's why when forming a dataframe one column is missing. That's the reason behind mismatch in number of column names and columns.
Currently "filter" and "query" are used somewhat randomly to mean different things.
We should rename functions and parameters to more consistantly mirror the Web-UI:
i.e.:
list("id" = 4, "value" = "Cancer")
should be called a phenotype.Defining CB queries in R via lists is not very user friendly, especially when there are multiple conditions. For example, even a relatively simple query with three conditions and two AND
operators:
adv_query <- list(
"operator" = "AND",
"queries" = list(
list( "id" = 13, "value" = list("from"="2016-01-21", "to"="2017-02-13")),
list(
"operator" = "AND",
"queries" = list(
list("id" = 4, "value" = "Cancer"),
list("id" = 21, "value" = "Consenting")
)
)
)
)
To simplify, we could include functions to define and combine individual phenotype definitions in a more modular fashion. As a starting point for discussion, the testing-new_query_syntax
branch adds two new functions: new_phenotype_cont
to define continuous variable phenotypes and new_phenotype_cat
to define categorical variable phenotypes. They can be combined using overloaded &
, |
and !
operators. For example:
Get the package from the PR branch:
> git clone 'https://github.com/lifebit-ai/cloudos.git'
> cd cloudos
> git checkout testing-new_query_syntax
In the cloudos directory enter an R session (or do so in Rstudio) and load the package + config:
> devtools::install(".")
> library(cloudos)
> cloudos_configure(base_url = "http://cohort-browser-dev-110043291.eu-west-1.elb.amazonaws.com/cohort-browser/",
token = "...api token...",
team_id = "5f7c8696d6ea46288645a89f")
Try building new queries:
A <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
B <- new_phenotype_cat(4, "Cancer")
C <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
D <- new_phenotype_cat(4, "Cancer")
########## test 1
AB <- A & B
cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)
########## test 2
AB <- A & !B
cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")
cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)
########## test 3
AB <- A | !B
cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")
cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)
########## test 3
AB <- (A | B) & (D | C)
cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")
cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)
It appears that creating a query with an AND
or OR
operator that only has a single subquery within it (unless it's the top node) makes the web-UI very unhappy.
Is this something to fix in the web-UI or do we need to ensure all queries have no such single nodes under operators?
example that breaks the UI:
{
"name": "test-apply-07",
"description": "",
"columns": [ ... ],
"query": {
"operator": "AND",
"queries": [
{
"operator": "AND",
"queries": [
{
"field": 13,
"instance": [
"0"
],
"value": {
"from": "2016-01-21",
"to": "2017-02-13"
}
}
]
},
{
"operator": "AND",
"queries": [
{
"field": 56,
"instance": [
"0"
],
"value": [
"Adult C1"
]
},
{
"field": 4,
"instance": [
"0"
],
"value": [
"Cancer"
]
}
]
}
]
}
}
This means a freshly created CBv1 cohort will have a cohort object that has no columns. This has knock on effects for cb_apply_filter()
when the user wants to add a new column in addition to keeping the default columns by using keep_existing_columns = TRUE
.
(and if Rmd is updated the md will also be updated here https://github.com/lifebit-ai/cloudos/blob/190ae48b967ca3229ad38b25d80bf181e0d4ed1a/README.md)
Take cohort as an input in this function based based on that return genotypic table
Lines 15 to 18 in 33e6294
When testing on http://dev-gel.lifebit.ai/ instance, each row for the same filter contains one of the possible values in the possibleValues
column:
> library(cloudos)
> cloudos_configure(base_url = "http://cohort-browser-dev-110043291.eu-west-1.elb.amazonaws.com/cohort-browser/",
token = "...api token...",
team_id = "5f7c8696d6ea46288645a89f")
> cb_search_phenotypic_filters("genetic", cb_version="v1")
Total number of phenotypic filters found - 3
display isMandatory possibleValues recruiterDescription group
1 dropdown FALSE Somatic, Somatic Genetic Test
2 dropdown FALSE Germline, Germline Genetic Test
3 dropdown FALSE cfDNA, cfDNA Genetic Test
4 checkBox FALSE Not known, Not known, 0
5 checkBox FALSE Li-Fraumeni syndrome, Li-Fraumeni syndrome, 1
6 checkBox FALSE Familial adenomatous polyposis, Familial adenomatous polyposis, 2
7 checkBox FALSE Beckwith-Wiedemann syndrome, Beckwith-Wiedemann syndrome, 3
8 checkBox FALSE Other, Other, 4
9 checkBox FALSE Chromosomes/FISH, Chromosomes/FISH, 0 NULL clinicalFeatures
10 checkBox FALSE Array CGH, Array CGH, 1 NULL clinicalFeatures
11 checkBox FALSE Fragile X Syndrome, Fragile X Syndrome, 2 NULL clinicalFeatures
12 checkBox FALSE Single gene/panel testing, Single gene/panel testing, 3 NULL clinicalFeatures
13 checkBox FALSE Exome/genome testing, Exome/genome testing, 4 NULL clinicalFeatures
14 checkBox FALSE Newborn screening testing, Newborn screening testing, 5 NULL clinicalFeatures
clinicalForm bucket500 bucket1000 bucket2500 bucket5000 bucket300 bucket10000 categoryPathLevel1
1 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
2 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
3 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
4 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
5 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
6 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
7 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
8 cancerForm FALSE FALSE FALSE FALSE FALSE FALSE Cancer diagnosis
9 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
10 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
11 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
12 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
13 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
14 udForm FALSE FALSE FALSE FALSE FALSE FALSE Undiagnosed disease form
categoryPathLevel2 id instances name type Sorting valueType units coding
1 Genetic findings 57 1 Genetic Test bars Categorical single
2 Genetic findings 57 1 Genetic Test bars Categorical single
3 Genetic findings 57 1 Genetic Test bars Categorical single
4 Genetic syndromes 52 1 Genetic syndromes bars Categorical single
5 Genetic syndromes 52 1 Genetic syndromes bars Categorical single
6 Genetic syndromes 52 1 Genetic syndromes bars Categorical single
7 Genetic syndromes 52 1 Genetic syndromes bars Categorical single
8 Genetic syndromes 52 1 Genetic syndromes bars Categorical single
9 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
10 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
11 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
12 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
13 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
14 Previous Investigation results 113 1 Previous genetic and genomic testing bars Categorical multiple
description descriptionParticipantsNo link array descriptionStability descriptionCategoryID descriptionItemType
1 Genetic Test Not provided 20
2 Genetic Test Not provided 20
3 Genetic Test Not provided 20
4 ] Not provided 5
5 ] Not provided 5
6 ] Not provided 5
7 ] Not provided 5
8 ] Not provided 5
9 Not provided 20
10 Not provided 20
11 Not provided 20
12 Not provided 20
13 Not provided 20
14 Not provided 20
descriptionStrata descriptionSexed orderPhenotype instance0Name instance1Name instance2Name instance3Name instance4Name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
instance5Name instance6Name instance7Name instance8Name instance9Name instance10Name instance11Name instance12Name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
instance13Name instance14Name instance15Name instance16Name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
We had a vote and brainstorming session with the bioinfo team, taking into account best practices for naming R packages.
The name cloudos
is the one we voted for.
In the future we can have a python lib name pycloudos
etc.
Rename the repo from cloudos-R
to cloudos
tl;dr
-
A,a
Slack link with relevant convo:
https://lifebit-biotech.slack.com/archives/C013UHE9MQQ/p1596185485021800
PR comment with best practices suggestions:
#1 (review)
First release:
usethis::use_cran_comments()
Title:
and Description:
@returns
and @examples
Authors@R:
includes a copyright holder (role 'cph')Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
Submit to CRAN:
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
@eBsowka we were discussing with @hms1 about the newly added functionalities.
A main struggle with assertions if using a demo env is the volatility of the app and the collections.
We have attempted in the past to use a mongo db to connect to with success that is created on the fly in the CI with mongo dumps. In this scenario, we would need to take it a step further, where we have an instance of CB running that we have full control of, and for the test we would
This would reassure us that we are not breaking anything in the package without having to account for CB app unpredictable changes.
This is low priority for now, but keeping it as we have committed to ensure reproducibility for mongo related CI tests.
currently looks like this:
> cb_create_cohort("ilya-r-test1", cb_version="v1")
Error in .cb_create_cohort_v1(cohort_name = cohort_name, cohort_desc = cohort_desc, :
Bad Request (HTTP 400). Failed to create a cohort.
> cb_create_cohort("ilya-r-test2", cb_version="v2")
Error in .cb_create_cohort_v2(cohort_name = cohort_name, cohort_desc = cohort_desc, :
Conflict (HTTP 409). Failed to create a cohort.
This can be added as a separate function or part of cb_get_samples_table()
There is no clear way to find out in which CB version this cohort is created
cb_create_cohort("test-400-desc", cohort_desc = "for testing", cb_version="v1")
Cohort created successfully.
Cohort ID: 60ec0c2f9dd71b22256fcfe0
Cohort Name: test-400-desc
Cohort Description: for testing
Number of filters applied: 0
When a CBv1 cohort is used in cb_apply_filter()
the function messages with the number of participants in the newly updated cohort. With a CBv2 cohort the number of participants is not given in a message.
The CBv2 API does not provide a response with the updated number of participants for PUT /cohort-browser/v2/cohort/{cohort_id}
. Instead cb_participant_count()
can be used to get this info.
Additionally, it would be useful to provide a web address for the cohort in the output of this (and other functions like cb_cohort_create()
and cb_cohort_load()
).
While reviewing #80 I noticed a potential bug in cb_get_participants_table
. Using the relevant branch:
library(cloudos)
cohortv2 <- cb_load_cohort("61f1bd4b3de81e52ec46d0ea")
cb_get_participants_table(cohortv2)
Error: All elements must be size one, use `list()` to wrap.
x Element `f4i0a0` is of size 0.
Run `rlang::last_error()` to see where the error occurred.
It looks like the error is happening here:
Lines 324 to 336 in a8b8767
I am not familiar with this code, so someone with a better understanding should look properly, but it appears it can be fixed using something like:
for (dta in c(list(emptyrow), res$data)) {
# drop all null elements entirely
dta[sapply(dta, is.null)] <- NULL
# change types according to col_type
for (name in names(dta)) {
type_func <- col_types[[name]]
dta[[name]] <- type_func(dta[[name]])
}
dta <- tibble::as_tibble_row(dta)
df_list <- c(df_list, list(dta))
}
Use a file ~/.cloudos/credential.json, convert them into environment variables.
Something like - Ref
github_pat <- function() {
pat <- Sys.getenv('GITHUB_PAT')
if (identical(pat, "")) {
stop("Please set env var GITHUB_PAT to your github personal access token",
call. = FALSE)
}
pat
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.