Git Product home page Git Product logo

scone's Introduction

SCONE

Lifecycle: stable BioC status R-CMD-check

Single-Cell Overview of Normalized Expression data

SCONE (Single-Cell Overview of Normalized Expression), a package for single-cell RNA-seq data quality control (QC) and normalization. This data-driven framework uses summaries of expression data to assess the efficacy of normalization workflows.

Install from Bioconductor

We recommend installation of the package via bioconductor.

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("scone")

Install from Github

Usually not recommended. To download the development version of the package, use

BiocManager::install("YosefLab/scone")

Install for R 3.3

You can download the latest release of SCONE for R 3.3 here. This is useful only for reproducing old results.

scone's People

Contributors

allonw avatar ckmah avatar drisso avatar henrikbengtsson avatar hpages avatar jdblischak avatar jwokaty avatar kayla-morrell avatar link-ny avatar lshep avatar mbcole avatar nturaga avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scone's Issues

Apply weights to scoring.

Weighted PCA should be applied in evaluation step. One weighting scheme should be applied to evaluate all methods.

Update scone output

evaluation should be called metrics
metrics * scale factors should be called scores (greater = better)
and the mean of the scores is the way we sort the methods
Update manual page to reflect this.

do not compare bio and batch

I am not familiar with R but I think even if nlevels are equal, you still cannot compare bio and batch.

    if(nlevels(bio)==nlevels(batch)) {
      if(all(bio==batch)) {
        stop("Biological conditions and batches are confounded. They cannot both be included in the model, please set at least one of 'adjust_bio' and 'adjust_batch' to 'no.'")
      }
    }

Zero Likelihood in ZINB

Zero-Inflated Negative Binomial fails at initialization for Pollen et al. Data Set: (for yoseflab) /data/yosef2/Published_Data/Fluidigm/filt_expr.Rdat

fnr_obj = estimate_zinb(verbose = TRUE,Y = e)
Error in while (abs((ll_old - ll_new)/ll_old) > 1e-04 & iter < maxiter) { : 
  missing value where TRUE/FALSE needed

After investigating this error, ll_old = -INF

Better use of params

If params is passed as an argument, the user should be able to just pass param instead of passing the correct parameters again.

Complains about confounding even though adjust_batch="no" and adjust_bio="no"

  if(!is.null(bio) & !is.null(batch)) {
    tab <- table(bio, batch)
    if(all(colSums(tab>0)==1)){
      if(nlevels(bio) == nlevels(batch)) {
        stop("Biological conditions and batches are confounded. They cannot both be included in the model, please set at least one of 'adjust_bio' and 'adjust_batch' to 'no.'")
      } else {
        nested <- TRUE
      }
    }
  }

Custom adjustment

Add a way for the user to use their adjustment function rather than our linear model (e.g., ComBat).

R CMD check gives warning

* checking for missing documentation entries ... WARNING
Undocumented data sets:
  ‘cell_cycle_Tirosh’ ‘house_keeping_mouse_TitleCase’ ‘macklis_markers’

scone() documentation error - bio argument

In the scone function documentation it says under the bio argument: "Ignored, if adjust_bio=0."

However, setting adjust_bio=0 gives the following error: "Error in match.arg(adjust_bio) : 'arg' must be NULL or a character vector"

Stabilize PAM param interface

The user currently selects a range of k for pam clustering (passed to fpc::pamk) but the resulting PAM_SIL score is based on complex considerations of that range. A few options:

  • Have the user select one value of eval_kclust.
  • (Optional) automatic eval_kclust selection.
  • Permit a range of eval_kclust, but limit options and/or wrap pamk more effectively.

vignette

  • input: counts + QC matrix
  • gene + sample filtering
  • scone
  • exploration / visualization : heatmap + bipolar
  • extract normalized matrix

error when eval=TRUE and only one method

If there is only one normalization method to "compare", scone won't work because trying to apply(., 1, .) to a vector (line 325 of scone_main.R in develop).

I don't see any obvious reason why one should evaluate only one normalization, but it should be fixed.

Handling of ties in FQ_FN

Add an option to have ties=TRUE (perhaps it should be the default).

Alternatively, it could be a different function FQ_T_FN, to make it easier to add both to scone comparison.

Any preference @mbcole ?

Adding RLE metric to evaluation

Copying from an email conversation with @mbcole (quotes are by @mbcole)

On a different topic, I was wondering if we should add one more evaluation metric to scone. I was helping Sandrine running scone on Russell's data, and as in the Fluidigm data, no scaling ranks often higher than FQ, TMM and DESeq.

I think this is not what we want because when you look at box plots / RLE plots, you clearly see that without a scaling step the distributions are far from aligned. What do you think of a metric that compares the median of the distributions of the counts of each cells and penalizes methods for which the medians are very far from each other?

This definitely sounds like a good thing to keep track of - did you get a sense of why scale-free methods were scoring higher - which scores were inflating their approval? I have only 2 concerns with adding this one in: 1) This score is tailored for normalization by the median 2) Is the median always non-zero? I know that in some of the data sets we’ve had median zero across many samples.

I think these are not big issues because 1) we usually don't consider median normalization in the comparison; 2) we can use the median of the RLE distribution rather than that of the counts.

I will implement this and see if it's useful at all. If not, we can get rid of it later.

develop branch: R CMD check fails

R CMD check on the develop branch fails with an error:

[HB-X201]{hb}: R CMD check scone_0.0.2.tar.gz
* using log directory 'C:/Users/hb/braju.com.R/_GITHUB_forks/scone.Rcheck'
* using R version 3.2.4 Patched (2016-03-10 r70306)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* checking for file 'scone/DESCRIPTION' ... OK
* this is package 'scone' version '0.0.2'
* checking CRAN incoming feasibility ... NOTE
Maintainer: 'Michael Cole <[email protected]>'
New submission

Strong dependencies not in mainstream repositories:
  scde

The Title field should be in title case, current version then in title case:
'Single Cell Overview of Normalized Expression data'
'Single Cell Overview of Normalized Expression Data'

The Description field should not start with the package name,
  'This package' or similar.

The Date field is not in ISO 8601 yyyy-mm-dd format.
* checking package namespace information ... OK
* checking package dependencies ... ERROR
Packages required but not available:
  'EDASeq' 'RUVSeq' 'diptest' 'fpc' 'mixtools' 'scde'

Namespace dependency not required: 'clusterCells'

See section 'The DESCRIPTION file' in the 'Writing R Extensions' manual.
* DONE

Status: 1 ERROR, 1 NOTE

PS. I'd like to suggest that the package version of the develop branch uses suffix -9000 (e.g. 0.0.2-9000) so that it is clear from the version itself that develop is being used. This is style has gathered a fair bit use recently (Hadley-verse of course).

Prepare data

Cf. scRNAseq package

  • data processing
  • metadata for running scone and evaluating performance

bug in scone_eval when batch is NULL

if( !is.null(batch) | !any(!is.na(batch)) ){
  KNN_BATCH = mean(attributes(knn(train = proj[!is.na(batch),],test = proj[!is.na(batch),],cl = batch[!is.na(batch)], k = eval_knn,prob = TRUE))$prob)
}else{
  KNN_BATCH = NA
}

fails when batch=NULL because !any(!is.na(NULL)) is TRUE

PAM Stability error

Line 144 of R/scone_eval.R is:

submat = subsampleClustering(proj, k=k)

It should be

submat = subsampleClustering(t(proj), k=k)

because the subsampleClustering documentation says it needs samples in columns.

R version documentation

It seems that R >= 3.3 is required for the library. If this is true, it would be helpful to put that information at the install instructions.

Have a great day!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.