lifebit-ai / cloudos Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 1.0 876 KB

cloudos: An R package for the Lifebit CloudOS Platform

Home Page: http://lifebit-ai.github.io/cloudos/

License: Other

R 42.04% CSS 0.07% Makefile 0.06% Jupyter Notebook 57.82%

api-client cloudos lifebit r-package

cloudos's People

Contributors

Stargazers

Watchers

Forkers

abrahamlifebit

cloudos's Issues

Replace else with explicit if statement in R functions

@sk-sahu thanks for being on the Jenny Bryan side of life 😄 !

We can update R functions replacing else with the explicit if function in the following places:

So in principle as an example the following snippet, when if-ified would lokk like this:

cloudos/R/create_cohort.R

Lines 37 to 46 in 1e1ec00

 if (!r$status_code == 200) { 

 stop("Something went wrong. Not able to create a cohort") 

 }else{ 

 message("Cohort named ", cohort_name, " created successfully. Bellow are the details") 

 res <- httr::content(r) 

 # into a dataframe 

 res_df <- do.call(rbind, res) 

 colnames(res_df) <- "details" 

 return(res_df) 

 }

if (!r$status_code == 200) { 
    stop("Something went wrong. Not able to create a cohort") 
 }
if (r$status_code == 200){
    message("Cohort named ", cohort_name, " created successfully. Bellow are the details") 
    res <- httr::content(r) 
    # into a dataframe 
    res_df <- do.call(rbind, res) 
    colnames(res_df) <- "details" 
    return(res_df) 
 }

Originally posted by @cgpu in #5 (comment)

cloudos_whoami() should give info about the config file

cloudos_whoami() should output the location of the config file if there is one.

Documentation stating where the config file is located also needs to be changed.

Add a function to list out all the available phenotypic filters

At the moment only for few terms works in cb_search_phenotypic_filters()

cb_search_phenotypic_filters(cloudos = my_cloudos, term = "cancer")

Change the arg name from term -> phenotype (take a vote on this)
Make a function to list out all the available phenotypic filters, so a user no need to search and can directly take the phenotypic filters id to next step.

Add more information to cohort object

Number of filters applied bc747d9
Last updated info

rename the dataframe with column names fails

Error in GEL

Error coming from

cloudos/R/cb_extract_cohort.R

Lines 126 to 127 in 9c4e581

 # rename the dataframe with column names 

 colnames(res_df_new) <- columns_names

Error Reason
Once of the columns ("Date Of Death") is completely empty thoughout the returned JSON

res <- httr::content(r) not able to parse it properly. That's why when forming a dataframe one column is missing. That's the reason behind mismatch in number of column names and columns.

Nomenclature for filters/queries is confusing

Currently "filter" and "query" are used somewhat randomly to mean different things.

We should rename functions and parameters to more consistantly mirror the Web-UI:

i.e.:

criterion based on a particular phenotype e.g. list("id" = 4, "value" = "Cancer") should be called a phenotype.
a set of phenotypes combined with logical operators to narrow down the dataset to make a cohort is a query.

Add proper error message

At the moment although it a failed api request stops and gives error message, its not intuitive, for example -

Check the error message coming from the request. (If few end-points missing error from backend, ask for it.)
Do error handling in few situations to respond to the error.

Improve query definitions in R

Defining CB queries in R via lists is not very user friendly, especially when there are multiple conditions. For example, even a relatively simple query with three conditions and two AND operators:

adv_query <- list(
  "operator" = "AND",
  "queries" = list(
    list( "id" = 13, "value" = list("from"="2016-01-21", "to"="2017-02-13")),
    list(
      "operator" = "AND",
      "queries" = list(
        list("id" = 4, "value" = "Cancer"),
        list("id" = 21, "value" = "Consenting")
        )
    )
  )
)

To simplify, we could include functions to define and combine individual phenotype definitions in a more modular fashion. As a starting point for discussion, the testing-new_query_syntax branch adds two new functions: new_phenotype_cont to define continuous variable phenotypes and new_phenotype_cat to define categorical variable phenotypes. They can be combined using overloaded &, | and ! operators. For example:

Get the package from the PR branch:

> git clone 'https://github.com/lifebit-ai/cloudos.git'                                                                                                                                   
> cd cloudos
> git checkout  testing-new_query_syntax

In the cloudos directory enter an R session (or do so in Rstudio) and load the package + config:

> devtools::install(".")
> library(cloudos)
> cloudos_configure(base_url = "http://cohort-browser-dev-110043291.eu-west-1.elb.amazonaws.com/cohort-browser/", 
token = "...api token...",
team_id = "5f7c8696d6ea46288645a89f")

Try building new queries:

A <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
B <- new_phenotype_cat(4, "Cancer")
C <- new_phenotype_cont(13, "2016-01-21", "2017-02-13")
D <- new_phenotype_cat(4, "Cancer")

########## test 1
AB <- A & B

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 2
AB <- A & !B

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 3
AB <- A | !B

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

########## test 3
AB <- (A | B) & (D | C) 

cohort <- cloudos::cb_load_cohort("60f96a97f2395b7f16a93c3a")

cloudos::cb_apply_filter(cohort, adv_query = unclass(AB), keep_existing_filter = F)

Applying CBv2 query sometimes breaks the web-UI

It appears that creating a query with an AND or OR operator that only has a single subquery within it (unless it's the top node) makes the web-UI very unhappy.

Is this something to fix in the web-UI or do we need to ensure all queries have no such single nodes under operators?

example that breaks the UI:

PUT 'v2/cohort/{cohort_id}' with request body:

{
  "name": "test-apply-07",
  "description": "",
  "columns": [ ... ],
  "query": {
    "operator": "AND",
    "queries": [
      {
        "operator": "AND",
        "queries": [
          {
            "field": 13,
            "instance": [
              "0"
            ],
            "value": {
              "from": "2016-01-21",
              "to": "2017-02-13"
            }
          }
        ]
      },
      {
        "operator": "AND",
        "queries": [
          {
            "field": 56,
            "instance": [
              "0"
            ],
            "value": [
              "Adult C1"
            ]
          },
          {
            "field": 4,
            "instance": [
              "0"
            ],
            "value": [
              "Cancer"
            ]
          }
        ]
      }
    ]
  }
}

CB v1 api returns blank column info when only default columns are selected

This means a freshly created CBv1 cohort will have a cohort object that has no columns. This has knock on effects for cb_apply_filter() when the user wants to add a new column in addition to keeping the default columns by using keep_existing_columns = TRUE.

Require raw output json as api call response instead of text

Not able apply certain pehnotypic filters

For filter id - 72 (SARS-CoV-2 positive) Not able to apply filter

This because they way filter-value json is being created is different from current implantation.

It is requesting -

"value":["Positive"]

But from it should have -

Need to investigate why that 1 in the end.

Remove occurences of URL

cloudos/DESCRIPTION

Line 21 in 01e4e5d

CloudOS <https://cloudos.lifebit.ai/> in the R environment for analysis.
https://github.com/lifebit-ai/cloudos/blob/master/README.Rmd#L25

(and if Rmd is updated the md will also be updated here https://github.com/lifebit-ai/cloudos/blob/190ae48b967ca3229ad38b25d80bf181e0d4ed1a/README.md)

Get genotypic table based on cohort

Take cohort as an input in this function based based on that return genotypic table

cloudos/R/cb_cohort_extract.R

Lines 15 to 18 in 33e6294

 cb_get_genotypic_table <- function(cloudos, 

 page_number = 0, 

 page_size = 10, 

 filters = "") {

cb_search_phenotypic_filters produces a table with multiple rows for each filter

When testing on http://dev-gel.lifebit.ai/ instance, each row for the same filter contains one of the possible values in the possibleValues column:

> library(cloudos)
> cloudos_configure(base_url = "http://cohort-browser-dev-110043291.eu-west-1.elb.amazonaws.com/cohort-browser/", 
token = "...api token...",
team_id = "5f7c8696d6ea46288645a89f")


> cb_search_phenotypic_filters("genetic", cb_version="v1")
Total number of phenotypic filters found - 3
    display isMandatory                                                    possibleValues recruiterDescription            group
1  dropdown       FALSE                                                  Somatic, Somatic         Genetic Test                 
2  dropdown       FALSE                                                Germline, Germline         Genetic Test                 
3  dropdown       FALSE                                                      cfDNA, cfDNA         Genetic Test                 
4  checkBox       FALSE                                           Not known, Not known, 0                                      
5  checkBox       FALSE                     Li-Fraumeni syndrome, Li-Fraumeni syndrome, 1                                      
6  checkBox       FALSE Familial adenomatous polyposis, Familial adenomatous polyposis, 2                                      
7  checkBox       FALSE       Beckwith-Wiedemann syndrome, Beckwith-Wiedemann syndrome, 3                                      
8  checkBox       FALSE                                                   Other, Other, 4                                      
9  checkBox       FALSE                             Chromosomes/FISH, Chromosomes/FISH, 0                 NULL clinicalFeatures
10 checkBox       FALSE                                           Array CGH, Array CGH, 1                 NULL clinicalFeatures
11 checkBox       FALSE                         Fragile X Syndrome, Fragile X Syndrome, 2                 NULL clinicalFeatures
12 checkBox       FALSE           Single gene/panel testing, Single gene/panel testing, 3                 NULL clinicalFeatures
13 checkBox       FALSE                     Exome/genome testing, Exome/genome testing, 4                 NULL clinicalFeatures
14 checkBox       FALSE           Newborn screening testing, Newborn screening testing, 5                 NULL clinicalFeatures
   clinicalForm bucket500 bucket1000 bucket2500 bucket5000 bucket300 bucket10000       categoryPathLevel1
1    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
2    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
3    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
4    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
5    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
6    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
7    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
8    cancerForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE         Cancer diagnosis
9        udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
10       udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
11       udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
12       udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
13       udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
14       udForm     FALSE      FALSE      FALSE      FALSE     FALSE       FALSE Undiagnosed disease form
               categoryPathLevel2  id instances                                 name type Sorting            valueType units coding
1                Genetic findings  57         1                         Genetic Test bars           Categorical single             
2                Genetic findings  57         1                         Genetic Test bars           Categorical single             
3                Genetic findings  57         1                         Genetic Test bars           Categorical single             
4               Genetic syndromes  52         1                    Genetic syndromes bars           Categorical single             
5               Genetic syndromes  52         1                    Genetic syndromes bars           Categorical single             
6               Genetic syndromes  52         1                    Genetic syndromes bars           Categorical single             
7               Genetic syndromes  52         1                    Genetic syndromes bars           Categorical single             
8               Genetic syndromes  52         1                    Genetic syndromes bars           Categorical single             
9  Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
10 Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
11 Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
12 Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
13 Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
14 Previous Investigation results 113         1 Previous genetic and genomic testing bars         Categorical multiple             
    description descriptionParticipantsNo link array descriptionStability descriptionCategoryID descriptionItemType
1  Genetic Test              Not provided         20                                                               
2  Genetic Test              Not provided         20                                                               
3  Genetic Test              Not provided         20                                                               
4             ]              Not provided          5                                                               
5             ]              Not provided          5                                                               
6             ]              Not provided          5                                                               
7             ]              Not provided          5                                                               
8             ]              Not provided          5                                                               
9                            Not provided         20                                                               
10                           Not provided         20                                                               
11                           Not provided         20                                                               
12                           Not provided         20                                                               
13                           Not provided         20                                                               
14                           Not provided         20                                                               
   descriptionStrata descriptionSexed orderPhenotype instance0Name instance1Name instance2Name instance3Name instance4Name
1                                                                                                                         
2                                                                                                                         
3                                                                                                                         
4                                                                                                                         
5                                                                                                                         
6                                                                                                                         
7                                                                                                                         
8                                                                                                                         
9                                                                                                                         
10                                                                                                                        
11                                                                                                                        
12                                                                                                                        
13                                                                                                                        
14                                                                                                                        
   instance5Name instance6Name instance7Name instance8Name instance9Name instance10Name instance11Name instance12Name
1                                                                                                                    
2                                                                                                                    
3                                                                                                                    
4                                                                                                                    
5                                                                                                                    
6                                                                                                                    
7                                                                                                                    
8                                                                                                                    
9                                                                                                                    
10                                                                                                                   
11                                                                                                                   
12                                                                                                                   
13                                                                                                                   
14                                                                                                                   
   instance13Name instance14Name instance15Name instance16Name
1                                                             
2                                                             
3                                                             
4                                                             
5                                                             
6                                                             
7                                                             
8                                                             
9                                                             
10                                                            
11                                                            
12                                                            
13                                                            
14

Rename repo to cloudos

Overview

We had a vote and brainstorming session with the bioinfo team, taking into account best practices for naming R packages.
The name cloudos is the one we voted for.
In the future we can have a python lib name pycloudos etc.

What needs to be done

Rename the repo from cloudos-R to cloudos

Why and more context:

tl;dr

R package names should not have dashes -
R package names should use a mix of CAPS and lowercase letters A,a

Unfortunately, this means you can’t use either hyphens or underscores, i.e., '-' or '', in your package name

Avoid using both upper and lower case letters: doing so makes the package name hard to type and even harder to remember. For example, I can never remember if it’s Rgtk2 or RGTK2 or RGtk2.

Slack link with relevant convo:
https://lifebit-biotech.slack.com/archives/C013UHE9MQQ/p1596185485021800

PR comment with best practices suggestions:
#1 (review)

Release cloudos 0.2.0

First release:

usethis::use_cran_comments()
Proof read Title: and Description:
Check that all exported functions have @returns and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()

Submit to CRAN:

devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()
Update install instructions in README

Create a mongo dump for use in CI for predictable assertions and testing

@eBsowka we were discussing with @hms1 about the newly added functionalities.
A main struggle with assertions if using a demo env is the volatility of the app and the collections.

We have attempted in the past to use a mongo db to connect to with success that is created on the fly in the CI with mongo dumps. In this scenario, we would need to take it a step further, where we have an instance of CB running that we have full control of, and for the test we would

purge all collections available
resurrect with the db that we have saved from the dumped collections

This would reassure us that we are not breaking anything in the package without having to account for CB app unpredictable changes.

This is low priority for now, but keeping it as we have committed to ensure reproducibility for mongo related CI tests.

Apply column name filtering

cloudos/R/cb_extract_cohort.R

Line 92 in e240e8d

#"columns" = columns, # TODO

Based on

cloudos/R/cb_json.R

Line 79 in e240e8d

 # TODO work on column - At this point NO end-point that returns this information, there are cards 

Plot for phenotypic filters

As earlier discussed with Pablo, we need to have a plot (based on ggplot) function to get all the filters showing in a single plot.

Something like -

plot_phenotypic_filters(cohort_object)

Output -

output better error when trying to create already existing cohort

currently looks like this:

> cb_create_cohort("ilya-r-test1", cb_version="v1")
Error in .cb_create_cohort_v1(cohort_name = cohort_name, cohort_desc = cohort_desc,  : 
  Bad Request (HTTP 400). Failed to create a cohort.
> cb_create_cohort("ilya-r-test2", cb_version="v2")
Error in .cb_create_cohort_v2(cohort_name = cohort_name, cohort_desc = cohort_desc,  : 
  Conflict (HTTP 409). Failed to create a cohort.

Add options to add and remove columns from samples table

This can be added as a separate function or part of cb_get_samples_table()

Clear message for which version of CB cohort is created

There is no clear way to find out in which CB version this cohort is created

cb_create_cohort("test-400-desc", cohort_desc = "for testing", cb_version="v1")
Cohort created successfully.
Cohort ID:  60ec0c2f9dd71b22256fcfe0 
Cohort Name:  test-400-desc 
Cohort Description:  for testing 
Number of filters applied:  0

cb_apply_filter should give more useful message on success

When a CBv1 cohort is used in cb_apply_filter() the function messages with the number of participants in the newly updated cohort. With a CBv2 cohort the number of participants is not given in a message.

The CBv2 API does not provide a response with the updated number of participants for PUT /cohort-browser/v2/cohort/{cohort_id}. Instead cb_participant_count() can be used to get this info.

Additionally, it would be useful to provide a web address for the cohort in the output of this (and other functions like cb_cohort_create() and cb_cohort_load()).

potential bug in `cb_get_participants_table`?

While reviewing #80 I noticed a potential bug in cb_get_participants_table. Using the relevant branch:

library(cloudos)

cohortv2 <- cb_load_cohort("61f1bd4b3de81e52ec46d0ea")
cb_get_participants_table(cohortv2)

 Error: All elements must be size one, use `list()` to wrap.
x Element `f4i0a0` is of size 0.
Run `rlang::last_error()` to see where the error occurred.

It looks like the error is happening here:

cloudos/R/cb_cohort_extract.R

Lines 324 to 336 in a8b8767

 for (n in c(list(emptyrow), res$data)) { 

 # important to change NULL to NA using .null_to_na_nested 

 dta <- .null_to_na_nested(n) 

 # change types within lists according to col_type 

 for (name in names(dta)) { 

 if (is.list(dta[[name]])){ 

 type_func <- col_types[[name]] 

 dta[[name]] <- list(type_func(dta[[name]])) 

 } 

 } 

 dta <- tibble::as_tibble_row(dta) 

 df_list <- c(df_list, list(dta)) 

 }

I am not familiar with this code, so someone with a better understanding should look properly, but it appears it can be fixed using something like:

  for (dta in c(list(emptyrow), res$data)) {
    # drop all null elements entirely
    dta[sapply(dta, is.null)] <- NULL
    # change types according to col_type
    for (name in names(dta)) {
      type_func <- col_types[[name]]
      dta[[name]] <- type_func(dta[[name]])
    }
    dta <- tibble::as_tibble_row(dta)
    df_list <- c(df_list, list(dta))
  }

Change the option name from `term` to `filter_name` in `cb_search_phenotypic_filters` function

Currently, option name from term it somewhat misleading, to make it clear we can change this to filter_name to align with UI -

Use credential from environment variable

Use a file ~/.cloudos/credential.json, convert them into environment variables.

Something like - Ref

github_pat <- function() {
  pat <- Sys.getenv('GITHUB_PAT')
  if (identical(pat, "")) {
    stop("Please set env var GITHUB_PAT to your github personal access token",
      call. = FALSE)
  }

  pat
}

nested list phenotypes break cb_get_phenotype_statistics()

No else block

#9 (comment)

	if (!r$status_code == 200) {
	stop("Something went wrong. Not able to create a cohort")
	}else{
	message("Cohort named ", cohort_name, " created successfully. Bellow are the details")
	res <- httr::content(r)
	# into a dataframe
	res_df <- do.call(rbind, res)
	colnames(res_df) <- "details"
	return(res_df)
	}

	# rename the dataframe with column names
	colnames(res_df_new) <- columns_names

	cb_get_genotypic_table <- function(cloudos,
	page_number = 0,
	page_size = 10,
	filters = "") {

	for (n in c(list(emptyrow), res$data)) {
	# important to change NULL to NA using .null_to_na_nested
	dta <- .null_to_na_nested(n)
	# change types within lists according to col_type
	for (name in names(dta)) {
	if (is.list(dta[[name]])){
	type_func <- col_types[[name]]
	dta[[name]] <- list(type_func(dta[[name]]))
	}
	}
	dta <- tibble::as_tibble_row(dta)
	df_list <- c(df_list, list(dta))
	}