randel / mind Goto Github PK

View Code? Open in Web Editor NEW

43.0 3.0 9.0 2.29 MB

Using Bulk Gene Expression to Estimate Cell-Type-Specific Gene Expression via Deconvolution

Home Page: https://randel.github.io/MIND/

R 2.81% HTML 97.19%

bulk-gene-expression scrna-seq-data

mind's People

Contributors

Stargazers

Watchers

Forkers

joonan30 veatcho general0710 shulp2211 huwenboshi linanzhang biaocai1993 b8307038 changsubiostats

mind's Issues

Definition of "Measures" in MIND

I am interested in applying this tool to other datasets. However the "measure" required for this tool is still unclear for me.
For example, in a triplicate sample with 2 groups (total 6 samples). Should I format the sample into the matrix below or something differently?
"1"
Gene * (group1_1, group2_1)
"2"
Gene * (group1_2, group2_2)
"3"
Gene * (group1_3, group2_3)

In addition, are these triplicates interchangeable? For common bulk analysis information for which sample were measured together is not available, can I randomly assign measures label in such case? Thanks for response!

Problems with 3D input arrays

Hello,

I find this package awesome I would like to consult your opinion on its potential application to our specific problem/question and also report on some difficulties that I am getting on setting up the input data.

I am testing this approach to identify protein expression signatures from cancer xenografts using mass-spec proteomics. We want to identify signature proteins from tumors or stroma and potentially identify expression patterns within tumors or stroma.

We have bulk mass-spec data from the whole tumor+stromal region, but we can distinguish both by identifying proteins that are either specific to human (tumor) or mouse (stroma). Therefore we have two expression matrices that can each be associated with a specific tissue region.

We don't have single-cell data, but I managed to generate a signature matrix from mining the Human Protein Atlast for single-cell-specific expression patterns of cells that I consider to be potentially found in stroma or tumor tissue.

With this, I can generate cell fraction matrices using est_frac for both human(tumor) and mouse(stroma).

Then I am generating two 3D arrays: bulk input and frac input.

Bulk input:

bulk_array <- abind::abind(data_log2_med_normimpaft_mat, 
                           data_log2_med_normimpaft_mat_hs, 
                           along = 3)

Frac input:

frac_array <- abind::abind(cell_fraction_mouse,
                           cell_fraction_human, 
                           along = 3)

I am having an error when executing bMIND2, which I understand has relation with the way I set up my arrays:

deconv_bayes <- bMIND2(bulk_array, frac_array)

## [1] "1470 errors"
## List of 1
##  $ :List of 2
##   ..$ message: chr "incorrect number of dimensions"
##   ..$ call   : language X[j, ]
##   ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
## NULL
## Error in `rownames<-`(`*tmp*`, value = names(res)): attempt to set 'rownames' on an object with no dimensions

I am comparing my arrays with the example arrays and it seems evident that my arrays need some tunning, but I am having problems now setting them up.

Two questions:

Could you share a little bit more on how to generate the 3D arrays (in R) that you use in your example in the context of your biological question that you very nicely address in the publication(s)?
Would you have any comments on the approach I am describing in which I'll use your tool? I would like to corroborate that I am understanding its application correctly and that it would actually be helpful for my current analysis.

Many thanks for taking the time to read!

Best wishes,
Miguel

R vesion

i have tried some version of R, but all of them defeat to install MIND package from github, which version could success to install this package?

extract brain tissue from gtex gene count

Hello,
I have downloaded GTEx rna-seq data from this site:
wget https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct.gz
I want to extract only brain tissue gene expression data from this file.
I checked yarn package and does it has any function that can perform this function?
Can you please provide me some guidance on how to proceed with this?
Thank you

Error running bmind_de

Hi, thank you for this package! Similar to other posters I also have only a few samples in the single cell dataset (10 total, 4 and 6 in each group) and get a non-positive definite matrix when I call get_prior. I used the new version of get_prior and set filter_pd = F and then it works. I then use the results of get_prior and get an error when I call bmind_de though, below. Could you help me figure out what's going wrong? Thanks, Fatima

"56564 errors"
List of 1
$ :List of 2
..$ message: chr "'data' must be of a vector type, was 'NULL'"
..$ call : language matrix(prior$B$mu, length(prior$B$mu), 1)
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
NULL
Error in rownames<-(*tmp*, value = names(res)) :
attempt to set 'rownames' on an object with no dimensions

interpreting negative CTS expression estimates from mind() function

Hello,

Firstly thank you for a very interesting paper, method and package! I just had a quick question about the units for output estimates of cell-type specific (CTS) gene expression after applying the mind() function? e.g. in Fig.4C of your paper (https://academic.oup.com/bioinformatics/article/36/3/782/5545976) some of the CTS estimates were negative, and how we should interpret this?

Thanks very much

Error in if (!is.symmetric.matrix(x)) stop("argument x is not a symmetric matrix") : missing value where TRUE/FALSE needed

Hi! I got some troubles when I used bMIND. How could I solve this?

suppressPackageStartupMessages(library(MIND))
library(data.table)
library(tidytable)
sce <- fread("ref_tpm_ndLV_GSE121893.txt", sep = '\t', data.table= F) %>% data.frame()
rownames(sce) <- sce$GeneSymbol
ref <- as.matrix(apply(sce[-1,-1],2, as.numeric))
rownames(ref) <- sce$GeneSymbol[-1]


metadata <- fread("ref_meta_GSE121893.txt", sep = '\t', data.table= F) %>% select.(ID,cell_type) %>% data.frame(); rownames(metadata) <- metadata$ID; colnames(metadata)[1] <- "sample"
ref_meta <- metadata

#table(ref_meta$cell_type, ref_meta$sample)

be <- fread('wp40_tpm.txt', sep= '\t', data.table = F)%>% data.frame(); rownames(be) <- be$GeneSymbol
bulk <- as.matrix(be[,-1])
dim(bulk);dim(ref);dim(ref_meta)
#[1] 20643    42
#[1] 19948  1151
#[1] 1151   10
all(colnames(ref) == rownames(ref_meta))
#[1] TRUE

#Check for the overlapping genesymbol
length(intersect(rownames(sce), rownames(be)))
#[1] 15017

is.numeric(bulk);is.numeric(ref)
#[1] TRUE
#[1] TRUE

prior = get_prior(sc = ref, meta_sc = ref_meta)
Error in if (!is.symmetric.matrix(x)) stop("argument x is not a symmetric matrix") :
  missing value where TRUE/FALSE needed

> head(ref_meta)
                         sample cell_type
SC_100355_0_19   SC_100355_0_19        FB
SC_100355_0_44   SC_100355_0_44        FB
SC_100355_0_45   SC_100355_0_45        FB
SC_100355_10_35 SC_100355_10_35        CM
SC_100355_10_59 SC_100355_10_59        CM
SC_100355_10_6   SC_100355_10_6        CM
> head(ref)[,1:4]
         SC_100355_0_19 SC_100355_0_44 SC_100355_0_45 SC_100355_10_35
TSPAN6                0         0.0000         0.0000               0
DPM1                  0         0.0000         0.0000               0
SCYL3                 0         0.0000         0.0000               0
C1orf112              0         0.0000         0.0000               0
FGR                   0         0.0000         0.0000               0
CFH                   0       278.8155       204.3904               0

Best,
Qin

得到结果后的可视化

当我得到一个含有四个元素的list: deconv1,我应该怎么对特定基因进行可视化，可以分享一些代码吗

How to use MIND package.

Hi,
I am really excited to use this tool. However, I am a bit confused about the parameters and the kind of input required.
hence I kindly request your comments on the following. A vignette with examples of input data would be much appreciated.

What I have:-

Single Cell Reference Data (as a Seurat Object) coming from 10 patients. (Combined with case and control, I can split them easily though if required to run MIND and construct separate gene expression matrices)
Bulk RNA seq expression Matrix (Gene X Sample matrix) from 50 patients.

What I have to do:-

Use get_prior function to generate a CTS prior profile matrix from the combind (case and control) single cell data.
Run bMIND where I intend to use Bisque for deconvolution.
Arguments that are not clear:-
Sample_ID (colnames of my Bulk RNA seq expression Matrix?)
profile (Output of the 1st step (get_prior)?)
covariance (From Output of the 1st step (get_prior)?)
Then there is profile_co, covariance_co .. etc for controls and cases separately. In my case, what can I do here? My single cell data is combined. If necessary I can split i into two matrices of case/control. After I have separate gene exp matrix for each condition, should I run get_prior for each of them? and use the output here?
Also how can I deal with the argument 'y' in this case? Does it refer to the samples of the bulk (whether they are case/ control)? In that case how is it dfferent from case_bulk at the end?
Also for the 'covariate' related arguments, can you please provide some examples of usually used covariates, or how covariate matrix look?
noRE (Can I use this argument to get a single deconvoluted matrix for the whole bulk data?-If thats what I would prefer?)
np (What is non informative prior?)
max_samp (Is there a default value that I can use or should I try different values?)

Thanks and Kind regards,
Saeed

bMIND error using prior: "V is not positive definite for some prior$G/prior$R elements"

I am trying to use bMIND with single cell data as prior:
Here is the command
deconv = bMIND(bulk = bulk, frac =frac, profile =prior$profile, covariance = prior$covariance, y = y, ncore = 20)

The tool works fine without prior and also works when I am using only "profile =prior$profile" without "covariance = prior$covariance". But when I use the prior covariance I get following error:

[1] "13680 errors"
List of 1
$ :List of 2
..$ message: chr "V is not positive definite for some prior$G/prior$R elements"
..$ call : language priorformat(if (NOpriorG) { NULL ...
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
NULL
Error in mind1[[1]] : subscript out of bounds

Please let me know if I am doing anything wrong and how to fix this error. Also, use of prior$profile without the prior$covariance, how much impact it will have on the results and are they interpretable?

This package cannot work normally!

This package is quite useful and I really like this paper. However, it cannot work well in the recent update. When I run the example of bmind_de function, it show the following error. Hope it can be fixed. Thanks!

[1] "12 errors"
List of 1
$ :List of 2
..$ message: chr "could not find function "bmind1_y""
..$ call : language bmind1_y(X[j, ], W, y, max_samp = max_samp, np = np, nu = nu, noRE = noRE, covariate = covariate, covariate_bulk | truncated
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
NULL
Error in apply(pval, 2, function(x) p.adjust(x, method = "fdr")) :
dim(X) must have a positive length

Empty object returned by MIND::get_prior()

Hi there,

I'm very interested in this method and managed to get the bMIND() method running using some TCGA bulk RNA-seq data and our own predicted cell type proportions based on our scRNA-seq data.
The results didn't make too much sense, so I was hoping to improve the accuracy by supplying a scRNA-seq prior.

I couldn't find any examples in your tutorial/paper of how exactly the priors you used for the brain datasets were generated - if these exist, could you please point me to them? When I try to create a prior using the get_prior() function, the object returned contains only NULL values. I have put a reproducible example below using random data (2 samples, 4 cell types, 100 cells, 100 genes).

sc_matrix <- matrix(sample(1:10, 10000, T), ncol = 100, nrow = 100)
colnames(sc_matrix) <- as.character(1:100)
rownames(sc_matrix) <- paste0("Gene_", c(1:100))
sc_dataframe <- data.frame(sample = sample(1:2, 100, T), cell_type = rep(c("Endothelial", "Epithelial", "Myeloid", "Lymphoid"), 25))
prior <- MIND::get_prior(sc_matrix, sc_dataframe)

The "prior" object contains the expected cell type names, but rather than having numeric values for the profile and covariance slots it contains NULL. Am I doing something incorrectly?

Thank you,
Daniel

Error in the bmind_de function

Hi, I'm tseting the pipeline of bMind, however, I have some problem in the process. Briefly, when I am running the bmind_de function, there will be a error named "all connections are in use". I checked the rownames and colnames of frac and bulk files and revised them following your suggestions, but this problem still appeared. Could you give some suggestions about it? Thank you.

Error running bMind2

Dear Randel:

I tried to use "bMIND2" to deconvolute a bulk tumor expression data. I created reference GEP from scRNA-Seq data, and was successful in estimating prior and fractions for different cell types. The deconvolution step worked for a small number of genes e.g. n = 100. However, I ran into an error when trying to deconvolute the complete GEP with ~3,000 genes (see below).

> deconv = bMIND2(bulk = log2(1+bulk[rownames(prior$profile),]), 
+                 frac = frac, profile = prior$profile, 
+                 covariance = prior$covariance, noRE = F, ncore = 20);
Error in deconv1_A[i, , ] <- mind1[[i]]$A : 
  number of items to replace is not a multiple of replacement length
In addition: Warning message:
In rownames(X) : NaNs produced

Could you please help me out? I would really appreciate it.

Can bMIND use for microarray datasets?

Many of the datasets I have studied are microarray datasets. Can bMIND use for microarray datasets? Thank you very much.

bMIND

Hi, Randel:

I run the command deconv = bMIND2(bulk,frac), but got the following error. what might be the reason?
The frac I used is from single cell deconvoluted bulk sample.

thanks for your time

[1] "1310 errors"
List of 1
$ :List of 2
..$ message: chr "object 'T.CD8' not found"
..$ call : language FUN(X[[i]], ...)
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
NULL
Error in rownames<-(*tmp*, value = names(res)): attempt to set 'rownames' on an object with no dimensions
Traceback:

bMIND2(bulk1, frac1)
bmind_all(X = bulk, W = frac, sample_id = sample_id, ncore = ncore,
. mu = profile, var_fe = covariance, covariate = covariate,
. covariate_bulk = covariate_bulk, covariate_cts = covariate_cts,
. np = np, noRE = noRE, nu = nu, nitt = nitt, burnin = burnin,
. thin = thin)
rownames<-(*tmp*, value = names(res))
stop("attempt to set 'rownames' on an object with no dimensions")