baderlab / scclustviz Goto Github PK

View Code? Open in Web Editor NEW

45.0 13.0 9.0 712.5 MB

Explore and share your scRNAseq clustering results

Home Page: https://baderlab.github.io/scClustViz/

License: MIT License

R 100.00% HTML 0.01%

r ui scrna-seq scrnaseq scrna-seq-analysis single-cell shiny differential-expression clustering-evaluation

scclustviz's Introduction

scClustViz

An interactive R Shiny tool for visualizing single-cell RNAseq clustering results from common analysis pipelines. Its main goal is two-fold: A: to help select a biologically appropriate resolution or K from clustering results by assessing differential expression between the resulting clusters; and B: help annotate cell types and identify marker genes.
See our paper for details!

Example Output

Before installing a package it's always nice to see what it is. See how we share our published single-cell RNAseq datasets online using scClustViz here

Usage

Installation

Install scClustViz using devtools:

# install devtools
install.packages("devtools")

# install scClustViz
devtools::install_github("BaderLab/scClustViz")

# install presto for 1000x faster differential expression testing (optional)
devtools::install(immunogenomics/presto)

Common installation challenges

If you're on linux and getting errors running devtools::install_github, make sure RCurl is working - you might need to install libcurl4-openssl-dev.
If you're on mac and having trouble installing presto, make sure you have the Xcode developer tools installed, since it requires Rcpp to compile the C++ backend.
If you're trying to save figures as .pdf or .eps and running into problems, your computer is probably missing the cairo graphics library. You can check this by running capabilities("cairo").

Basic Usage

Following normalization, dimensionality reduction (include 2D cell embedding), and clustering using a workflow of your choice, scClustViz can be used to do differential expression testing (using the Wilcoxon rank-sum test) to both assess different clustering solutions and explore your results. First, run the DE testing as follows:

library(scClustViz)

# if using Seurat, this regex can grab 
# the metadata columns representing cluster results:
your_cluster columns <- grepl("res[.0-9]+$",
                              names(getMD(your_Seurat_object)))
your_cluster_results <- getMD(your_Seurat_object)[,your_cluster_columns]


sCVdata_list <- CalcAllSCV(
  inD=your_scRNAseq_data_object,
  clusterDF=your_cluster_results,
  assayType=NULL, #specify assay slot of data
  DRforClust="pca",#reduced dimensions for silhouette calc
  exponent=exp(1), #log base of normalized data
  pseudocount=1,
  DRthresh=0.1, #gene filter - minimum detection rate
  testAll=F, #stop testing clusterings when no DE between clusters
  FDRthresh=0.05,
  calcSil=T, #use cluster::silhouette to calc silhouette widths
  calcDEvsRest=T,
  calcDEcombn=T
)

save(your_scRNAseq_data_object,sCVdata_list,
     file="for_scClustViz.RData")
# This file can now be shared so anyone 
# can view your results with the Shiny app!

Once the previous setup step has been performed once and the output saved, you can explore the data in the interactive Shiny interface by simply pointing it to the saved file:

# Lets assume this is data from an embryonic mouse cerebral cortex:
# (This is the call wrapped by MouseCortex::viewMouseCortex("e13"))
runShiny(
  filePath="for_scClustViz.RData",
  
  outPath="./",
  # Save any further analysis performed in the app to the
  # working directory rather than library directory.
  
  annotationDB="org.Mm.eg.db",
  # This is an optional argument, but will add annotations.
  
  cellMarkers=list("Cortical precursors"=c("Mki67","Sox2","Pax6",
                                           "Pcna","Nes","Cux1","Cux2"),
                   "Interneurons"=c("Gad1","Gad2","Npy","Sst","Lhx6",
                                    "Tubb3","Rbfox3","Dcx"),
                   "Cajal-Retzius neurons"="Reln",
                   "Intermediate progenitors"="Eomes",
                   "Projection neurons"=c("Tbr1","Satb2","Fezf2",
                                          "Bcl11b","Tle4","Nes",
                                          "Cux1","Cux2","Tubb3",
                                          "Rbfox3","Dcx")
  ),
  # This is a list of canonical marker genes per expected cell type.
  # The app uses this list to automatically annotate clusters.
  
  imageFileType="png"
  #Set the file format of any saved figures from the app.
)

Use Your Own Cluster Names

scClustViz has a very basic cluster annotation method built into runShiny implemented by the labelCellTypes function. It uses a user-defined list of marker genes per expected cell type to assign labels to each cluster. The median gene expression for each set of marker genes is calculated for each cluster, and clusters are assigned the label of the highest-ranking marker gene set. This is provided as a convenience function, as there are many more sophisticated cluster annotation methods in the literature, and expert curation is probably still the gold standard. With that in mind, you can assign your own labels to clusters for any cluster solution (the same labels can be assigned to multiple clusters).

levels(Clusters(sCVdata_list$chosen_cluster_solution))
# Your cluster labels should be in the same order as the existing cluster levels

your_cluster_names <- c("Cell type zero",
                        "Cell type one",
                        "Third cell type",
                        "Cell type 3 (thanks Seurat)",
                        "Last cell type (4,5,who knows?)")
ClusterNames(sCVdata_list$chosen_cluster_solution) <- your_cluster_names

save(your_scRNAseq_data_object,sCVdata_list,
     file="for_scClustViz.RData") 
# ^ Don't forget to save!

Iterative Clustering With scClustViz

Incorporating the scClustViz cluster assessment metric into your analysis pipeline is simply a matter of running the differential expression testing after every clustering run, instead of post-hoc. This allows you to systematically increase the resolution or K parameter of the clustering algorithm until statistically significant differential expression between nearest neighbour clusters is lost. An example using the Seurat(v2) clustering method is shown here.

DE_bw_clust <- TRUE
seurat_resolution <- 0
sCVdata_list <- list()

while(DE_bw_clust) {
  seurat_resolution <- seurat_resolution + 0.2
  # ^ Iteratively incrementing resolution parameter

  your_seurat_obj <- Seurat::FindClusters(your_seurat_obj,
                                          resolution=seurat_resolution)
  # ^ Calculate clusters using method of choice.
  
  if (length(levels(your_seurat_obj@ident)) <= 1) { next } 
  # ^ Only one cluster was found, need to bump up the resolution!
  
if (length(sCVdata_list) >= 1) {
  temp_cl <- length(levels(Clusters(sCVdata_list[[length(sCVdata_list)]])))
  if (temp_cl == length(levels(seurat_resolution@ident))) { 
    temp_cli <- length(levels(interaction(
      Clusters(sCVdata_list[[length(sCVdata_list)]]),
      seurat_resolution@ident,
      drop=T
    )))
    if (temp_cli == length(levels(seurat_resolution@ident))) { 
      next 
    }
  }
}
# ^ if clustering results are identical to previous, move on.

  curr_sCVdata <- CalcSCV(
    inD=your_seurat_obj,
    cl=your_seurat_obj@ident, #factor containing cluster assignments
    assayType=NULL, #specify assay slot of data
    DRforClust="pca", #reduced dimensions for silhouette calc
    exponent=exp(1), #log base of normalized data
    pseudocount=1,
    DRthresh=0.1, #gene filter - minimum detection rate
    calcSil=T, #use cluster::silhouette to calc silhouette widths
    calcDEvsRest=T,
    calcDEcombn=T
  )

  DE_bw_NN <- sapply(DEneighb(curr_sCVdata,0.05),nrow)
  # ^ counts # of DE genes between neighbouring clusters at 5% FDR

  if (min(DE_bw_NN) < 1) { DE_bw_clust <- FALSE }
  # ^ If no DE genes between nearest neighbours, don't loop again.

  sCVdata_list[[paste0("res.",seurat_resolution)]] <- curr_sCVdata
  # Add sCVdata object to list with an appropriate name.
}

save(your_seurat_obj,sCVdata_list,
     file="for_scClustViz.RData")

runShiny(filePath="for_scClustViz.RData")
# ^ see ?runShiny for detailed argument list

Use Your Own Differential Expression Results

scClustViz uses the wilcoxon rank-sum test for its differential expression testing. You can provide your own DE results from a testing method of your choice instead, skipping sCV's testing steps. In both CalcAllSCV and CalcSCV there are arguments calcDEvsRest and calcDEcombn, which can be set to false to skip those differential expression calculations. You can then use DEvsRest(your_sCVdata_object) <- your_DE_dataframe_list and DEcombn(your_sCVdata_object) <- your_DE_dataframe_list to pass your results into the sCVdata objects. DEvsRest represents differential expression tests between each cluster and the remaining cells, and should be a named list of data frames where each name refers to the tested cluster (see ?CalcDEvsRest for details). DEcombn represents differential expression tests between all pairwise combinations of clusters, and should be a named list of data frames were each name refers to the cluster pair, with cluster names separated by "-" (see ?CalcDEcombn for details). In both cases, data frames must contain variables logGER (an effect size measure: gene expression ratio in log space, often referred to as logFC) and FDR (significance measure: false discovery rate), as well as dDR (an effect size measure: difference in detection rate) for DEcombn. An example using Seurat(v2) is shown here:

# One vs all testing ----
MAST_oneVSall <- FindAllMarkers(your_seurat_obj,
                                logfc.threshold=0,
                                min.pct=0.1,
                                test.use="MAST",
                                latent.vars="nUMI")
# ^ FindAllMarkers and CalcDEvsRest do equivalent comparisons 

names(MAST_oneVSall)[names(MAST_oneVSall) == "avg_logFC"] <- "logGER"
# ^ Effect size variable must be named 'logGER'
names(MAST_oneVSall)[names(MAST_oneVSall) == "p_val_adj"] <- "FDR"
# ^ Significance variable must be named 'FDR'

MAST_oneVSall_list <- sapply(levels(MAST_oneVSall$cluster),
                             function(X) {
                               temp <- MAST_oneVSall[MAST_oneVSall$cluster == X,]
                               rownames(temp) <- temp$gene
                               # ^ Rownames must be gene names.
                               return(temp)
                             },simplify=F)
# ^ Dataframe converted to list of dataframes per cluster

DEvsRest(your_sCV_obj) <- MAST_oneVSall_list
# ^ Slot MAST results into sCVdata object


# Pairwise testing ----
MAST_pw <- apply(combn(levels(your_seurat_obj@ident),2),2,
                 function(X) {
                   FindMarkers(your_seurat_obj,
                               ident.1=X[1],
                               ident.2=X[2],
                               logfc.threshold=0,
                               min.pct=0.1,
                               test.use="MAST",
                               latent.vars="nUMI")
                 })
# ^ Test DE between every pairwise combination of clusters
# equivalent to testing performed by CalcDEcombn
names(MAST_pw) <- apply(combn(levels(your_seurat_obj@ident),2),2,
                        function(X) paste(X,collapse="-"))
# ^ Names must be in "X-Y" format

for (i in names(MAST_pw)) {
  MAST_pw[[i]]$dDR <- MAST_pw[[i]]$pct.1 - MAST_pw[[i]]$pct.2
  # ^ Diff in detect rate (dDR) must be a variable in each dataframe
  names(MAST_pw[[i]])[names(MAST_pw[[i]]) == "avg_logFC"] <- "logGER"
  # ^ Effect size variable must be named 'logGER'
  names(MAST_pw[[i]])[names(MAST_pw[[i]]) == "p_val_adj"] <- "FDR"
  # ^ Significance variable must be named 'FDR'
  # Note: rownames of each dataframe must be gene names, 
  # but FindMarkers should already do this.
}
DEcombn(your_sCV_obj) <- MAST_pw
# ^ Slot MAST results into sCVdata object

Data Packages

The following data packages can be used to explore the features of scClustViz. You can also follow the vignette below to build your own data package to easily share your analysed scRNAseq data with collaborators and the public.

Embryonic Mouse Cerebral Cortex

The data from the 2017 Cell Reports paper Developmental Emergence of Adult Neural Stem Cells as Revealed by Single-Cell Transcriptional Profiling by Yuzwa et al. are available to explore at our website or by installing the R package MouseCortex. These are DropSeq data from timepoints spanning neurogenesis and filtered for cortically-derived cells, processed on an earlier version of the pipeline outlined below (using scran for normalization and Seurat for clustering) and imported into scClustViz using the steps outlined above.

Install MouseCortex using devtools as follows:

# install devtools
install.packages("devtools")

# install MouseCortex (demo data from Yuzwa et al, Cell Reports 2017)
devtools::install_github("BaderLab/MouseCortex") 
# this takes a minute or two

# install mouse gene annotations from bioconductor (optional)
source("https://bioconductor.org/biocLite.R")
biocLite("org.Mm.eg.db")

Then run the scClustViz Shiny app to view your dataset of choice! There's a wrapper function in the MouseCortex package that handles the call to scClustViz, so it's nice and simple. If you're interested, ?runShiny has example code showing the function call used by the wrapper function.

library(MouseCortex)
viewMouseCortex("e13")

Human Liver Atlas

The data from the 2018 Nature Communications paper Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations by MacParland et al. are available to explore at our website or by installing the R package HumanLiver. These are 10X Chromium data from the livers of 5 human donors, processed on the pipeline outlined below (using scran for normalization and Seurat for clustering) and imported into scClustViz using the steps outlined above.

Install HumanLiver using devtools as follows:

# install devtools
install.packages("devtools")

# install HumanLiver 
# (R data package for MacParland et al., Nat Commun 2018)
devtools::install_github("BaderLab/HumanLiver") 
# this takes a minute or two

# install human gene annotations from bioconductor (optional)
source("https://bioconductor.org/biocLite.R")
biocLite("org.Hs.eg.db")

library(HumanLiver)
viewHumanLiver()

Make Your Own Data Package!

Building an R package is a relatively easy task thanks to RStudio and the roxygen2 and devtools packages. The following vignette will show you how to take your saved output from the scClustViz setup and share it as an R package on github as seen in the data packages above. It is entirely based on the invaluable book R packages by Hadley Wickham.
First, you must have generated your input file for the runShiny command in scClustViz by following the steps in the usage guide above.
Then, create a new project in RStudio, selecting "New directory" -> "R package" and making sure to check "Create a git repository". If you haven't already set up git/github in RStudio, check out this blogpost for an explanation. If you only want to make a package to share with colleagues, you can skip github and simply send them the bundled package when you're done.
Once you've opened your new package in RStudio, make sure to have both "Use devtools package functions" and "Generate documentation with Roxygen" selected under "Project Options" -> "Build Tools". Also, delete the existing NAMESPACE file, since Roxygen will create a new one when you build the package.
You're now ready to build your package. First, make a folder in the package directory called "inst", and put your input file for runShiny there. All files in "inst" become part of the root directory of the package after installation, so it's best to store your data in a folder within inst.

dir.create("inst/packageData/",recursive=T)
save(data_for_scClustViz,DE_for_scClustViz,
     file="inst/packageData/MyDataTitle.RData")

If you'd like a default resolution to load when the user views your data in scClustViz, now's the time to save that.

runShiny("inst/packageData/MyDataTitle.RData")

Save your selected cluster resolution as default in the app. It will be saved as inst/packageData/MyDataTitle_savedRes.RData. You will also see a file called inst/packageData/MyDataTitle_intro.md. This is a markdown file that stores the text displayed at the top of the scClustViz GUI. You can edit it to say what you want (perhaps a link to the paper the data is from, and maybe the abstract?).
Now all you need to do is write the wrapper function to call runShiny. Here is an example R script (overwrite R/HelloWorld.R) to save in the "R" directory of the package.

#' View MyData data in the scClustViz Shiny app
#'
#' A wrapper function to view the \code{MyData} dataset in the
#' \code{scClustViz} Shiny app.
#'
#' @param outPath Default = "./" (the working directory). Specify the 
#'   directory used to save/load any analysis files you generate while 
#'   exploring the \code{MyData} data.
#'
#' @return The function causes the scClustViz Shiny GUI app to open in a
#'   seperate window.
#'
#' @examples
#'   viewMyData()
#'
#' @seealso \url{https://baderlab.github.io/scClustViz} for information on
#'   \code{scClustViz}.
#'
#' @export

viewMyData <- function(outPath="./",imageFileType="pdf") {
  filePath <- system.file("packageData/MyDataTitle.RData",
                          package="MyDataPackage")
  cellMarkers <- list()
  # If you have a list of cell-type marker genes for you data,
  # add them here!
  
  # Change "org.Hs.eg.db" to the appropriate AnnotationDbi object for your 
  # data. This way if your user has the library installed, it will be used, 
  # otherwise it will be skipped without causing any errors.
  if (require("org.Hs.eg.db",quietly=T)) {
    annotationDB <- org.Hs.eg.db
    scClustViz::runShiny(filePath=filePath,
                         outPath=outPath,
                         cellMarkers=cellMarkers,
                         annotationDB=annotationDB,
                         imageFileType=imageFileType)

  } else {
    scClustViz::runShiny(filePath=filePath,
                         outPath=outPath,
                         cellMarkers=cellMarkers,
                         imageFileType=imageFileType)
  }
}

Now that you have a wrapper function, all that's left to do is fix up the DESCRIPTION file. The most important entries for functionality in the file are the following:

Suggests: org.Hs.eg.db
Imports: scClustViz
Remotes: BaderLab/scClustViz

Change "org.Hs.eg.db" to the appropriate AnnotationDbi library. This lets the user know that they would benefit from having it installed. More importantly, Imports: scClustViz tells R devtools to install scClustViz when installing your package. Since scClustViz isn't in CRAN, the line Remotes: BaderLab/scClustViz lets devtools know where to find it.
Now that everything's ready, use the "Install and Restart" button in RStudio or hit Ctrl+Shift+B to build and install the package locally. You should now be able to use the wrapper command to open scClustViz with your data. If you're happy with everything, it's time to push to github!
First you must create a new repository on github for your package. Then it's as simple as pushing your first commit (commands here are in the bash shell):

# Set the remote to the github account:
git remote add origin https://github.com/YourGithubAccount/MyDataPackage.git 

# Stage your directory
git add .

# Make your first commit
git commit -m "MyData is now an R package!"

# Push your first commit to github 
# (could be slow, since you're uploading data files)
git push -u origin master

Now all you need to do is edit the README file to tell the world how to install and run your package:

devtools::install_github("YourGithubAccount/MyDataPackage")
library(MyDataPackage)
viewMyData()

Citation

Innes BT and Bader GD. scClustViz – Single-cell RNAseq cluster assessment and visualization [version 2; peer review: 2 approved]. F1000Research 2019, 7:1522 (doi: 10.12688/f1000research.16198.2)

Contact

You can contact me for questions about this repo. For general scRNAseq questions, do what I do and ask the Toronto single-cell RNAseq working group on Slack!

scclustviz's People

Contributors

Stargazers

Watchers

Forkers

thinh-tran sabyadg gaelder whtns mlebeur inambioinfo lclindu bacemdatascience gongkecun

scclustviz's Issues

DE by GUI fails - unselected set to NA, breaks indexing of sparse matrices

Error in intI: 'NA' indices are not (yet?) supported for sparse Matrices

Capitalization for embeddings needs fixing

If embedding names are capitalized, getEmb fails.

sce <- SingleCellExperiment(
  assays=list(counts=matrix(nrow=10,ncol=10),
              logcounts=matrix(nrow=10,ncol=10),
  reducedDims=list(PCA=matrix(nrow=10,ncol=10),
                   TSNE=matrix(nrow=10,ncol=10)
)
seur <- as.Seurat(sce)

getEmb(seur,"TSNE") #fails
getEmb(seur,"tsne") #fails

spreadLabels2 for cell projection

Expand functionality to include situations like cell projection where repulsion from initial point isn't necessary.

Colour MA plot by FDR

runShiny not working

I feel like I have followed the example code for using your shiny app to a tee, yet runShiny is not working. I keep getting the following error when I call runShiny():

Error in FUN(X[[i]], ...) :
Unexpected input object. Missing single-cell data object.

Could you please help? I will attach my R script below. 'norm' is a Seurat v3 object. Louvain clustering has been done and the sc RNA assay data is normalized and scaled. Attached is the script and the file for 'norm'.

scClustViz.R.zip
norm.h5seurat.zip

Support visualizing other DR methods (UMAP, PCA, etc)

Include an argument in runShiny to designate the dimensions to use for 2D projection of cells. Maybe a pulldown menu?

Add testing and coverage reporting

See https://jef.works/blog/2019/02/17/automate-testing-of-your-R-package/https://jef.works/blog/2019/02/17/automate-testing-of-your-R-package/

Memory doubling?

Concerned that when selDE inputs are present, the objects used in the function are assigned to their own memory slots which would functionally double (or more) the memory footprint. Need to take a look at this.

A question on using scClustViz on integrated data (Seurat)

Hi!

Satijalab(Seurat) recommend using the RNA assay when exploring markers and differential expression. But in their vignettes that do integration between datasets, or comparison between control and treatment, they use the "integrated" dataset assay for pca, neighbors, and clustering.

So as far as I understood this, scClustViz would run best with its iterative clustering using the integrated assay but for the marker analysis (which I find very useful in the shiny app) it should run on the RNA assay. How can I solve this?

I'll include the code I use for iterative clustering below. Perhaps there is a way to have assaytype set to integrated for clustering and then somehow setting marker analysis to use RNA-counts, but I'm not sure how to do this.

Thanks for all the great tools that you develop!

max_seurat_resolution <- 1.5 # Stop at the max
output_filename <- paste0("./data/our_data/clustviz_clustering/",currentjob,"_.RData")
FDRthresh <- 0.01 # FDR threshold for statistical tests
min_num_DE <- 1
seurat_resolution <- 0.15 # Starting resolution is this plus the jump value below.
seurat_resolution_jump <- 0.05

sCVdata_list <- list()
DE_bw_clust <- TRUE

while(DE_bw_clust) {
  if (seurat_resolution >= max_seurat_resolution) { break }
  seurat_resolution <- seurat_resolution + seurat_resolution_jump 
  # ^ iteratively incrementing resolution parameter 
    pan_seurat <- FindClusters(pan_seurat,resolution=seurat_resolution,verbose=F)
    message(" ")
  message("------------------------------------------------------")
  message(paste0("--------  res.",seurat_resolution," with ",
                 length(levels(Idents(pan_seurat)))," clusters --------"))
  message("------------------------------------------------------")
    if (length(levels(Idents(pan_seurat))) <= 1) { 
    message("Only one cluster found, skipping analysis.")
    next 
  } 
  # ^ Only one cluster was found, need to bump up the resolution!
  
  if (length(sCVdata_list) >= 1) {
    temp_cl <- length(levels(Clusters(sCVdata_list[[length(sCVdata_list)]])))
    if (temp_cl == length(levels(Idents(pan_seurat)))) { 
      temp_cli <- length(levels(interaction(
        Clusters(sCVdata_list[[length(sCVdata_list)]]),
        Idents(pan_seurat),
        drop=T
      )))
      if (temp_cli == length(levels(Idents(pan_seurat)))) { 
        message("Clusters unchanged from previous, skipping analysis.")
        next 
      }
    }
  }
    curr_sCVdata <- CalcSCV(
    inD=pan_seurat,
    assayType="RNA",
    assaySlot = "counts",
    cl=Idents(pan_seurat), 
    # ^ your most recent clustering results get stored in the Seurat "ident" slot
    exponent=NA, 
    pseudocount=NA,
    DRthresh=0.1,
    DRforClust="umap",
    calcSil=T,
    calcDEvsRest=T,
    calcDEcombn=T
  )
  
  DE_bw_NN <- sapply(DEneighb(curr_sCVdata,FDRthresh),nrow)
  # ^ counts # of DE genes between neighbouring clusters at your selected FDR threshold
  message(paste("Number of DE genes between nearest neighbours:",min(DE_bw_NN)))
  
  if (min(DE_bw_NN) < min_num_DE) { DE_bw_clust <- FALSE }
  # ^ If no DE genes between nearest neighbours, don't loop again.
  
  sCVdata_list[[paste0("res.",seurat_resolution)]] <- curr_sCVdata
}

save(sCVdata_list,pan_seurat,file=output_filename)

Is there a size limit?

Hi,
I'd like to know if there is a limit with the size of the single cell data matrix that can be installed.
I am asking as previously I tried a similar package that can install datasets as "packages", and load using the lazydb load R function:
satijalab/seurat-data#7

Briefly, there is a limitation for that method, that single cell data matrix greater that 70K cells will not work. Because when data is bigger than 30k genes * 70k cells, it exceeds 2^31 and will hit error on lazyload step (trying to build index).

Thank you,
Shuoguo

Advice to adapt Iterative Clustering Workflow with Seurat v4.x (throws error)

Hi, I really like scClustViz' principle of using biologically relevant parameters to help select 'best' resolution - many thanks for developing and publishing clustViz!

I'm trying to adapt the Iterative Clustering workflow code (from README.md) to an object created with Seurat v4.4
It seems there are some differences in the (default) meta.data column names in the Seurat v4 object vs the Seurat version used in the published workflow (v2?) that create problems

Changing all occurrences of your_seurat_obj@ident to [email protected] works fine for the first iteration of the while{} loop. However the following code block (not executed on the first iteration) returns an error on the second iteration:

if (length(sCVdata_list) >= 1) {                                                                  
    temp_cl <- length(levels(Clusters(sCVdata_list[[length(sCVdata_list)]])))
    if (temp_cl == length(levels([email protected]))) {    ## throws ERROR
      temp_cli <- length(levels(interaction(
        Clusters(sCVdata_list[[length(sCVdata_list)]]),
        [email protected],
        drop=T
        )))
      if (temp_cli == length(levels([email protected]))) { 
        next 
        }
    }

At the point the error is thrown, the Seurat object's metadata slot contains:
SCT_snn_res.0.2 (resolution=0.2, created in the 1st iteration: 14 clusters in my example)
SCT_snn_res.0.4 (resolution=0.4, created in the 2nd iteration: 17 clusters in my example)

In the second line of the 'offending' codeblock (copied above), I do not understand what [email protected] refers to (its original version is seurat_resolution@ident, which is similarly unclear to me)

Would you be so kind to explain the logic of the two if conditions in the code block so that I can try to find the appropriate data from the Seurat object? Or ideally, let us now which Seurat v4 data should be used ;-)

Many thanks in advance!

Embrace Shiny modular design

See https://shiny.rstudio.com/articles/modules.html

Incorporate cluster specificity measures not based on differential expression

https://constantamateur.github.io/2020-04-10-scDE/ and https://github.com/mahmoudibrahim/genesorteR describe methods to measure cluster specificity of gene expression outside of typical DE tests. Thought this might be relevant for scclustviz

ERROR: dependency ‘TeachingDemos’

Hi,
Thanks for the excellent work and data availability.
I'm getting the following error regarding a TeachingDemos dependency.
Can you please clarify where this package is located or an alternate solution.

Thanks,
Alex

devtools::install_github("BaderLab/scClustViz")
Downloading GitHub repo BaderLab/scClustViz@master
from URL https://api.github.com/repos/BaderLab/scClustViz/zipball/master
Installing scClustViz
ERROR: dependency ‘TeachingDemos’ is not available for package ‘scClustViz’
Installation failed: Command failed (1)

Better file names for GOI overlay plots

Hi Brendan,

Throwing this issue here as a friendly reminder that it would be greatly appreciated! When you have a moment could you update the filename for goi downloads to name the gene selected since generally I'm downloading 10+ of these files in quick succession!

e.g: name this downloaded file as CD3E.pdf (sample-name-CD3E.pdf) instead of goi1.pdf

Silhouette plot cluster labels

Silhouette plot labels clusters by integers from 1 - # of clusters. If clusters are named, or numbered differently (Seurat counts clusters from zero because apparently they forgot they weren't Python?), this creates a great deal of confusion.

Dot painting in cell selection plot occasionally goes wacky.

For example:
MouseCortex::viewMouseCortex("e17")
There's two cells overlapping on the tSNE - when selecting one of them, two dots appear where no cells are.
Select cluster 5 (at default res) as a filter.
Circle the all cells in the tSNE plot the to the left of cluster 5 - there's one cell in clust5 hiding between clusters 1 and 4 (ATCACGCAATAA). The other cell from cluster 1 overlapping it is TTCGAACCCGAG
Add that cell to one of the sets.
It isn't highlighted, but two new dots (ostensibly the highlighted selected cell) appear near (1,-5)ish.

seurat v3 concerns again

Hey again,

I'm still having issues with Seurat v3 as in #27

I'm emailing you a copy of the minimal test dataset and rmarkdown. There seems to be an issue accessing embeddings? The errors I'm getting:

## Error in CalcSCV(seu, clusters, exponent = exp(1)): Input data object must be one of: seurat, SingleCellExperiment.

## Error in slot(eb1S@reductions[[tolower(DRtype)]], "cell.embeddings"): object 'eb1S' not found

Reduce number of warnings returned to console by Shiny

Functions dependent on unfinished calculations are returning errors that Shiny returns as warnings to the console until the calculations they depend on are completed. These warnings aren't indicative of errors, just mistimed function evaluations, but they are spamming the console and if there are important warnings/errors, they're being buried in the noise.

Volcano plots

Add option to plot volcano plots instead of MA plots.

Compatibility with Seurat v3?

Hi,

I'm trying to load data from a Seurat objection using v3 of that software. readfromSeurat fails. I'm attempting to use readfromManual as in #25, but am struggling a bit.

I need to use v3 because I am combining datasets as shown in ttps://doi.org/10.1101/460147. Though I know there are changes in the structure of seurat objects between v2 and [v3] (https://satijalab.org/seurat/essential_commands.html)

Do you have any advice for configuring readfromManual for seurat v3?

Colour cells by silhouette width?

It would be nice to see which cells are the problem according to the silhouette plot.

Error in get(lfc[X])[[1]] : subscript out of bounds

Hi,

I´m getting an error on the merged dataset, however runshiny is working fine for individual runs
The error i'm getting after following the scClustviz vignette:

**Error in get(lfc[X])[[1]] : subscript out of bounds**
5.
is(get(lfc[X])[[1]])
4.
FUN(X[[i]], ...)
3.
lapply(X = X, FUN = FUN, ...)
2.
sapply(seq_along(lfc), function(X) { if (is(get(lfc[X]))[1] %in% findMethodSignatures(getExpr)) { return("inD") } ...
1.
runShiny(filePath = "Merged_runsClustViz.RData", outPath = "./", annotationDB = "org.Hs.eg.db", imageFileType = "png")

Highly appreciate any pointers to rectify the error!

Thanks in Advance

Error in validObject(.Object) while running CalcAllSCV

Hi,
I am trying to run scClustViz. I have an object with my normalized, dimension reduced and clustered dataset (using Seurat). I first used function grep and then getMD and that gave me metadata with very low (0-3) values for nCount_RNA and nFeature_RNA. I am not sure if this is expected and has nothing to do with my error.
Then, when I try to use function "CalcAllSCV", I get the following error:
"--------------------------------------

Processing cluster solution: orig.ident

-- Calculating cluster-wise gene statistics --
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=53s
-- Calculating differential expression cluster vs rest effect size --
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=11s
-- Testing differential expression cluster vs rest --
-- Testing differential expression between clusters --
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=09s

Processing cluster solution: nCount_RNA

Error in validObject(.Object) :
invalid class “sCVdata” object: invalid object for slot "Clusters" in class "sCVdata": got class "numeric", should be or extend class "factor""

Thank you

GUI for subsetting cells for reclustering?

Add a function to subset the input data based on the cell selection tool? For people that want to recluster a subset of the data. Should be do-able in the upcoming paradigm (v1) where the input data object is used instead generating a sCV-specific data object.

Dotplot Labels

When the dots shrink down in size from increasing the # of genes per cluster to show, the gene names don't shrink down so there is more than one dot per gene name.

Open default res automatically

Skip "show clusts at res" button push when savedRes exists.

Cluster numbers ordered correctly

Converting interger cluster assignments from character (commonly output by Seurat) to factor results in the levels being ordered as characters (0,1,10,11,...) instead of numerically.

Clicking non-existent plot results in crash.

Accidentally clicking on a plot that accepts clicks when the plot hasn't been drawn yet will still trigger R Shiny to attempt to interpret the click. Since the plot and data behind it doesn't exist yet, this will cause a crash.

unable to find an inherited method for function ‘getMD’ for signature ‘"data.frame

Hey! I am trying to implement the basic usage but when I run :
your_cluster_columns <- grepl("res[.0-9]+$",
names(getMD([email protected])))
)

I get the following error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘getMD’ for signature ‘"data.frame"’

I am unsure what the correct input object format is.

My meta.data df has the following columns:

Thank you for your time.

Problem accessing DR embeddings in Seurat object

Trying to run scClustViz on a Seurat object (v4.1.3) I'm getting the following error in the Shiny app on launch:

DRtype 'pca_sct_integrated.CC' not found.
  The following cell embeddings are available in this object:
  pca, pca_integrated, umap_integrated, pca_sct_integrated, umap_sct_integrated, pca_sct_integrated.CC, umap_sct_integrated.CC

This results in no dimensional reduction (DR) being plotted, and the selection widget for choice of DR is also missing.

As you can see from the message I've added multiple DRs to the object, and not all DR names are lowercase. In dataAccess.R (haven't checked elsewhere) there's the assumption that the DR names will be all lowercase, so in this case the DR coordinates aren't found.

if ([tolower](https://rdrr.io/r/base/chartr.html)(DRtype) %in% [names](https://rdrr.io/r/base/names.html)([slot](https://rdrr.io/r/methods/slot.html)(x,"reductions"))) {
      [return](https://rdrr.io/r/base/function.html)([slot](https://rdrr.io/r/methods/slot.html)(x@reductions[[](https://rdrr.io/r/base/Extract.html)[tolower(DRtype)]],"cell.embeddings"))
    } [else](https://rdrr.io/r/base/Control.html) {
      [stop](https://rdrr.io/r/base/stop.html)([paste](https://rdrr.io/r/base/paste.html)([paste0](https://rdrr.io/r/base/paste.html)("DRtype '",DRtype,"' not found."),
                 "The following cell embeddings are available in this object:",
                 [paste0](https://rdrr.io/r/base/paste.html)([names](https://rdrr.io/r/base/names.html)([slot](https://rdrr.io/r/methods/slot.html)(x,"reductions")),collapse=", "),sep="\n  "))
    }

This assumption is presumably safe for DR names generated by Seurat, but can fail for user-supplied DR names. I can see the obvious workaround for my case (change DR name to lower), but a more generic solution might be better (e.g. only tolower() if the supplied DR name is some capitalisation of PCA, tSNE, or UMAP).

Thanks,

Chris

Weird Shiny bugs when using seurat v1 input files.

One of the gene-of-interest input boxes only shows up after swapping radio buttons, and volcano plots with significant genes throw an error until swapping radio buttons for gene labels.

Bugs only occur with seurat version 1 input data. Specifically output of BorrettNSC/forSCV.R

Will hunt this down later, as its only a minor inconvenience.

Parameters for sctransform based data

Would you have suggestions for how to best parametrize when using the Seurat/sctransform workflow?
Thanks!

End Shiny session when browser window is closed

See https://stackoverflow.com/questions/35306295/how-to-stop-running-shiny-app-by-closing-the-browser-window

Labels missing from MA plot

Installation error: syntax issue in deTest.R line 180

When installing I get the following error: "Error in parse(outFile) :
/private/var/folders/bz/2k93vv6d30n3fkfbwjf0nyjsgw_94g/T/RtmpSEFALX/R.INSTALL126c958b36b76/scClustViz/R/deTest.R:180:16: unexpected symbol
179: "The decision to stop testing depends on the results of caclDEcombn."
180: sep"

I looked at lines 179 and 180 and it looks like there is a missing comma before the "sep" argument.

Fx to pass your own DE results

With a well-defined S4 class for sCVdata, this shouldn't be hard.

clustWiseDEtest fails when < 3 clusters in a resolution?

Updated Ubuntu & R, unable to runShiny()

I recently had to update Ubuntu and R on our labs bioinformatics computer. My PI was previously able to use the runShiny command from scClustCiz to open up and look at scRNASeq data. Files that were able to be opened previously (and any new files we've tried to make through the pipeline) are unable to be opened.

R Version: 4.3.1

library(Seurat)
library(scClustViz)

runShiny(filePath = "/home/bret/Madeline/MadelineRNAseq/RMS_8_MET_Feb2023/RMS_8_MET_Feb28.RData") #loading already existing pipeline output

And the error is:

Error in if (!grepl("^|<html", lines, ignore.case = TRUE)) { :
argument is of length zero

It looks like the command partially executes as there is some objects in the workspace-but there's not interactive page that pops up-just the above error.

I'm not a programmer but I've some basic knowledge. Not sure if I'm missing something that needs to be installed on Linux or a package. Any help is appreciated!

Thanks,

Alex

Save as PDF buttons

Move Save as PDF button for geneTest boxplot to right side.

Global option for saving: PDF, EPS, PNG, TIFF

Automated SCE reading

Add a readFromSingleCellExperiment function?
Note that there's a gene symbol slot in the rowData.

Todo list for v0.30

Log colour scale breaks when zeroes exist (metadata overlays)
Remove log-axis option when that axis is a boxplot (metadata scatterplot)
Log axis with boxplots breaks when zeroes exist
Label angle in MA plot shifted closer to horizontal
Colour bar of cluster colours for dotplot rows
selDE metadata legend goes outside margins
Default scatterplot inputs should be first and second entries of metadata
Legend needed for dotplot!

Biaxial plot of genes' expression per cell?

Sonya's immunology buddies asked if genes could be viewed on a biaxial plot, a la FACS analysis. Given that each cell library is an incomplete (~10%) sample of that cell's transcriptome, this probably isn't the greatest idea, since most cells will pile up on one/both axes since the genes of interest will be missing. However, it could be done, and colour each cell by cluster, as well as including a meta-dot (with whiskers for IQR/SE/SD?) for cluster-wise average gene expression. This might be a way to move toward the "cell-type signature" idea of having multiple genes uniquely representing a cluster, rather than just single marker genes.

Fx to pass custom deVS and generate deMarker/deNeighb?

Add a function that bypasses clustWiseDEtest, allowing user to pass DE results using their method of choice as a list formatted like deVS, which would generate deMarker, deDist, and deNeighb.
Somehow handle generating CGS as well?

Select cells for selDE using metadata

Metadata + clusters

Are GOI plots ordered by expression?

Need to check on this. If not, here's the code to fix it.

  plot_tsne(getEmb(seur,"umap")[order(getExpr(seur,"RNA")[GOI,]),],
            md=getExpr(seur,"RNA")[i,][order(getExpr(seur,"RNA")[GOI,])],
            md_title=GOI)

Marker combinations?

Think about sets of genes that uniquely mark clusters (ie. CD4+ CD8-), rather than single gene markers.

error reading Seurat object

Hello

I am getting the following error when running readFromSeurat:

Error in readFromSeurat(immune.combined) :
trying to get slot "cell.embeddings" from an object of a basic class ("NULL") with no slots

This seurat object was generated using canonical correlation analysis RunCCA

thanks