shbrief / genomicsupersignature Goto Github PK

Interpretation of RNAseq experiments through robust, efficient comparison to public databases

Home Page: https://shbrief.github.io/GenomicSuperSignature/

R 99.27% Shell 0.73%

transferlearning principal-component-analysis rna-sequencing-profiles gsea mesh exploratory-data-analysis bioconductor-package

genomicsupersignature's Issues

Some questions about using Pathway convergence

Hi, I find that using the codes given in the readme file, the installation process of GSSP will fail.

The error looks like this. Could you please help me resolve it? Thanks a lot!

Allow configurable number of PCs in `validate`

GenomicSuperSignature/R/validate.R

Lines 15 to 37 in 5e73d06

 .loadingCor <- function(dataset, avgLoading, method = "pearson", scale = FALSE) { 

 if (any(class(dataset) == "ExpressionSet")) { 

 dat <- Biobase::exprs(dataset) 

 } else if (any(class(dataset) %in% c("SummarizedExperiment", "RangedSummarizedExperiment"))) { 

 dat <- SummarizedExperiment::assay(dataset) 

 } else if (any(class(dataset) == "matrix")) { 

 dat <- dataset 

 } else { 

 stop("'dataset' should be one of the following objects: ExpressionSet, 

  SummarizedExperiment, RangedSummarizedExperiment, and matrix.") 

 } 

 if (isTRUE(scale)) {dat <- rowNorm(dat)} # row normalization 

 dat <- dat[apply(dat, 1, function (x) {!any(is.na(x) | (x==Inf) | (x==-Inf))}),] 

 gene_common <- intersect(rownames(avgLoading), rownames(dat)) 

 prcomRes <- stats::prcomp(t(dat[gene_common,])) # centered, but not scaled by default 

 loadings <- prcomRes$rotation[, 1:8] 

 loading_cor <- abs(stats::cor(avgLoading[gene_common,], loadings[gene_common,], 

 use = "pairwise.complete.obs", 

 method = method)) 

 return(loading_cor) 

 }

How to create new models?

Hi,

Is it possible to create new models based on GenomicSuperSignature? Let's say for example, using GTEx as and input and different prior knowledge.

Questions about R version in this research

Hi, I tried to run the codes related to RAV model building and met errors related to package installation. My R version is 4.1.3.

For the GF package, I cannot install it based on my R. Therefore, could you please share the version of your software? Thanks a lot.

Allow pluggable similarity measure to `validate`.

Spearman
other methods from dist
straight function

GenomicSuperSignature/R/validate.R

Lines 15 to 37 in 5e73d06

 .loadingCor <- function(dataset, avgLoading, method = "pearson", scale = FALSE) { 

 if (any(class(dataset) == "ExpressionSet")) { 

 dat <- Biobase::exprs(dataset) 

 } else if (any(class(dataset) %in% c("SummarizedExperiment", "RangedSummarizedExperiment"))) { 

 dat <- SummarizedExperiment::assay(dataset) 

 } else if (any(class(dataset) == "matrix")) { 

 dat <- dataset 

 } else { 

 stop("'dataset' should be one of the following objects: ExpressionSet, 

  SummarizedExperiment, RangedSummarizedExperiment, and matrix.") 

 } 

 if (isTRUE(scale)) {dat <- rowNorm(dat)} # row normalization 

 dat <- dat[apply(dat, 1, function (x) {!any(is.na(x) | (x==Inf) | (x==-Inf))}),] 

 gene_common <- intersect(rownames(avgLoading), rownames(dat)) 

 prcomRes <- stats::prcomp(t(dat[gene_common,])) # centered, but not scaled by default 

 loadings <- prcomRes$rotation[, 1:8] 

 loading_cor <- abs(stats::cor(avgLoading[gene_common,], loadings[gene_common,], 

 use = "pairwise.complete.obs", 

 method = method)) 

 return(loading_cor) 

 }

'droplist' for `drawWordcloud()` function

Add a 'droplist' argument to drawWordcloud() function, which removes most and least common MeSH terms in the universe to avoid 'outliers' skew the word cloud.

e.g. droplist = c("Human", "RNA sequencing", ..., "Publication", "Utah")

Command line for galaxy integration

Something like:

https://www.r-bloggers.com/2015/09/passing-arguments-to-an-r-script-from-command-lines/

Questions about heatmap table function

Hi, I found that this step took me a pretty long time to run:

I waited for more than one hour with one 30 GB GPU from an HPC. Is it normal? Thanks a lot.

PCA is very slow on large single-cell matrices

Consider using irlba for very large matrices, or allowing use of pre-computed PCA from other packages as input to 1validate()1.

Simplify browsing of studies

Currently, looking up which studies contributed to a RAV requires something like the following:

> ravc2 <- GenomicSuperSignature::getModel(prior = "C2")
> colData(ravc2)[colData(ravc2)$RAV == "RAV272", "studies"] 
$Cl4764_272
[1] "ERP020977" "SRP039361" "SRP045352"

And then searching for the accession numbers e.g. in the European Nucleotide Browser. And it's more difficult to find the PMIDs of studies contributing to the RAV.

It would be very convenient to have a function that outputs these directly, either with a message saying to search them in the European Nucleotide Browser or giving a direct link (e.g. https://www.ebi.ac.uk/ena/browser/view/ERP020977 for the first ERP above, although ENA browser annoyingly takes you down to the "reads" section of the page). Being able to browse PMIDs of the studies would also be very useful.

documentation on normalization are unclear

It's not clear from ?validate whether users should normalize their data in advance. As I understand from the code, no normalization is done except for z-score if scale = TRUE, so users should do a log(x+1) transformation?

shbrief / genomicsupersignature Goto Github PK

genomicsupersignature's Issues

Some questions about using Pathway convergence

Allow configurable number of PCs in `validate`

How to create new models?

Questions about R version in this research

Allow pluggable similarity measure to `validate`.

'droplist' for `drawWordcloud()` function

Command line for galaxy integration

Questions about heatmap table function

PCA is very slow on large single-cell matrices

Simplify browsing of studies

documentation on normalization are unclear

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	.loadingCor <- function(dataset, avgLoading, method = "pearson", scale = FALSE) {

	if (any(class(dataset) == "ExpressionSet")) {
	dat <- Biobase::exprs(dataset)
	} else if (any(class(dataset) %in% c("SummarizedExperiment", "RangedSummarizedExperiment"))) {
	dat <- SummarizedExperiment::assay(dataset)
	} else if (any(class(dataset) == "matrix")) {
	dat <- dataset
	} else {
	stop("'dataset' should be one of the following objects: ExpressionSet,
	SummarizedExperiment, RangedSummarizedExperiment, and matrix.")
	}

	if (isTRUE(scale)) {dat <- rowNorm(dat)} # row normalization
	dat <- dat[apply(dat, 1, function (x) {!any(is.na(x) \| (x==Inf) \| (x==-Inf))}),]
	gene_common <- intersect(rownames(avgLoading), rownames(dat))
	prcomRes <- stats::prcomp(t(dat[gene_common,])) # centered, but not scaled by default
	loadings <- prcomRes$rotation[, 1:8]
	loading_cor <- abs(stats::cor(avgLoading[gene_common,], loadings[gene_common,],
	use = "pairwise.complete.obs",
	method = method))
	return(loading_cor)
	}