xzhoulab / idea Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 11.0 21.72 MB

Differential expression (DE); gene set Enrichment Analysis (GSEA); single cell RNAseq studies (scRNAseq)

License: GNU General Public License v3.0

C++ 52.36% R 47.64%

differential-gene-expression gsea rna-seq single-cell

idea's People

Contributors

Stargazers

Watchers

Forkers

sqsun yingma0107 biostatpzeng xiaoqiwang19 hegu2692 nailouzhang pcarbo apollowuu seninfobio fanyue322 saharns

idea's Issues

No significant pathways potentially due to p_values of 0?

Hi, I've performed DE analysis within seurat on a specific cluster across two conditions and used the DE summary table within iDEA to try and identify certain pathways that are significant using the inbuilt mouse gene sets.

After fitting the model with the R data my results had no significant pathways and I'm worried that this is due to my DE data having p-values that were 0 which then makes the beta_variance for the summary_data also 0.

There were quite alot of DE genes in my summary data so there being no significant pathways or any that were near significant is unexpected.
Do you know if that does cause problems with the analysis?

Installation error on macOS Catalina

I'm having a problem installing the package.
When using the devtools command

devtools::install_github('xzhoulab/iDEA', dependencies = T)

it returns an error:

clang: error: unsupported option '-fopenmp'
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘iDEA’

I guess it's a problem with mac and the clang compiler?

Could there be an easy fix in the package? Or do you have a suggestion how I can solve this?

Thanks in advance,
Anne

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] rstudioapi_0.11 magrittr_1.5 usethis_1.6.0 devtools_2.3.0 pkgload_1.0.2 R6_2.4.1
[7] rlang_0.4.6 fansi_0.4.1 tools_3.6.3 pkgbuild_1.0.8 packrat_0.5.0 sessioninfo_1.1.1
[13] cli_2.0.2 withr_2.2.0 ellipsis_0.3.0 remotes_2.1.1 assertthat_0.2.1 digest_0.6.25
[19] rprojroot_1.3-2 crayon_1.3.4 processx_3.4.2 callr_3.4.3 fs_1.4.1 ps_1.3.2
[25] curl_4.3 testthat_2.3.2 memoise_1.1.0 glue_1.4.1 compiler_3.6.3 desc_1.2.0
[31] backports_1.1.6 prettyunits_1.1.1`

Getting error when creating an idea object

I followed the instruction in the tutorial to analyze my data. After run the CreateiDEAObject command, I got the following output.

  > idea <- iDEA.fit(idea,
+                  fit_noGS=FALSE,
+                  init_beta=NULL, 
+                  init_tau=c(-2,0.5),
+                  min_degene=5,
+                  em_iter=15,
+                  mcmc_iter=1000, 
+                  fit.tol=1e-5,
+                  modelVariant = F,
+                  verbose=TRUE)
## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations:  100 
## number of genes:  1384 
## number of cores:  1 
## fitting the model with gene sets information...

However, looking at the result of idea@gsea, all the 100 annotations have pvalues >0.5. I wonder how to test more annotations and find the most significant ones for my gene set.

Thank you!

iDEA on SEURAT object

Hi , anyone can help with how to integrate iDEA with SEURAT object?
Thank you

Change mouse gene names using BiomaRt

I am running a mouse data sets for iDEA. The gene names in the mouseGeneSets are in the form of ‘MGI id’. How can I turn them into gene names like ‘Egfr’ and ‘Zzz3’? You mentioned to use biomaRt, but it is not clear to me how to do it. Can you help me with this issue

Using GSEA Gene Set Annotations

Hi I am able to run the programme without problems, but want to ask in terms of the usage of the programme.

When creating an iDEA object, one would need to insert the gene annotations. Say I'd like to run three gene sets from msigdbr (hallmark, immune, and all GO Terms), should I run them separately, or all together? If I run them together, 20 cores take 12 hours for one dataset, and the results seem to be different as supposed to if I run hallmark set only separately.

Thanks,

Output explanations

Hello! Thank you for this package. I am wondering if you have any details or information about how to interpret the outputs from iDEA@gsea. More specifically, what are the annot_coef, annot_var, annot_var_louis, and sigma2_b? This information is probably in the manuscript but I think it would be helpful to have it in the documentation for the package.

| annot_id | annot_coef | annot_var | annot_var_louis | sigma2_b | pvalue_louis | pvalue
-- | -- | -- | -- | -- | -- | -- | --
1 | REACTOME_FATTY_ACID_METABOLISM | -1.750754 | 0.09029776 | 0.1001064 | 238.8785 | 3.140330e-08 | 5.669802e-09
2 | REACTOME_GLYCOLYSIS | -1.135788 | 0.13735320 | 0.1483048 | 239.1861 | 3.184886e-03 | 2.179394e-03
3 | REACTOME_INTERFERON_SIGNALING | 1.775729 | 0.07376514 | 0.0796045 | 239.3457 | 3.099353e-10 | 6.230865e-11

Thanks,
Joselynn

Error in EMMCMCStepSummary

HI, I encountered an error when performing iDEA.fit.
There is my code:

markers  = FindAllMarkers(seurat_obj, test.use = 'DESeq2' 
x = subset(markers, cluster == 4)
pvalue <- x$p_val#### the pvalue column
zscore <- qnorm(pvalue/2.0, lower.tail=FALSE) #### convert the pvalue to z-score
beta <- x$avg_log2FC## effect size
se_beta <- abs(beta/zscore) ## to approximate the standard error of beta
beta_var = se_beta^2  ### square
summary = data.frame(beta = beta,beta_var = beta_var)
rownames(summary) = x$gene ### or the gene id column in the res_DE results
library(iDEA)
data(humanGeneSets)
dim(humanGeneSets)
x = read.gmt('/mnt/Data/weiyu/ref/GSEA/c2.all.v7.5.1.symbols.gmt')
Gene_set = zwy_matrix(unique(unlist(x)), names(x))
Gene_set[is.na(Gene_set)] = 0
for (i in names(x)) {
    Gene_set[x[[i]], i] = 1
}
idea <- CreateiDEAObject(summary, Gene_set, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)
idea <- iDEA.fit(idea, fit_noGS=FALSE, init_beta=NULL, init_tau=c(-2,0.5), min_degene=5, em_iter=15, mcmc_iter=1000, fit.tol=1e-5, modelVariant = F, verbose=TRUE)

And the error is:

rror in EMMCMCStepSummary(object@summary[, 1], object@summary[, 2], as.matrix(Annot),  :
  pinv(): svd failed

Thanks!

Time to run extremely high

I'm attempting to run your tutorial on a human dataset with a gene set of only ~3500 genes on a HPC. This is with a new R installation and a new iDEA installation. Any idea why I'm getting an ETA in the multiple days for iDEA.fit? This is following your tutorial to the letter.

Order of running iDEA.BMA() and iDEA.fit(modelVariant = T) after running iDEA.fit(modelVariant = F)?

Hi there,

Thank you very much for the tool.

I was hoping to get clarification regarding the order in which one should run iDEA.BMA() and the variant iDEA.fit model.

After running iDEA.fit (modelVariant = F) followed by iDEA.louis(), do I need to create a new idea object to run iDEA.BMA()?

Similarly, do I need a new object if I want to compare the "original" model and the variant model by running iDEA.fit(modelVariant = F)?

Thank you for your help!

pvalue_louis from idea@gsea

I have a question about pvalue_louis as below. When p value =2.12E-28, pvalue_louis = 7.20E-26 (second row from bottom). this make sense to me. But when pvalue is 0, pvalue_louis is 0.99999XXX. it seems weird to me. do you think it is right? any comments will be greatly appreciated.
Best reargds,
Tingfen

annot_id	pvalue_louis	pvalue
KEGG_CITRATE_CYCLE_TCA_CYCLE	0.99999838	0
GO_ER_TO_GOLGI_TRANSPORT_VESICLE_MEMBRANE	0.99999838	0
GO_NEGATIVE_REGULATION_OF_MUSCLE_CELL_DIFFERENTIATION	0.99999839	0
MARIADASON_REGULATED_BY_HISTONE_ACETYLATION_DN	0.9999984	0
NIKOLSKY_BREAST_CANCER_12Q13_Q21_AMPLICON	0.9999984	0
FONTAINE_THYROID_TUMOR_UNCERTAIN_MALIGNANCY_UP	0.9999984	0
GO_SMALL_SUBUNIT_PROCESSOME	0.99999841	0
DUTTA_APOPTOSIS_VIA_NFKB	0.99999842	0
GO_RESPONSE_TO_ZINC_ION	7.20E-26	2.12E-28
GO_CELLULAR_RESPONSE_TO_INORGANIC_SUBSTANCE	9.86E-21	2.63E-23

Does not appear to be an R package (no DESCRIPTION)

when i conducted“
devtools::install_github('xzhoulab/iDEA')
”
it reminded me that

‘’‘
Downloading GitHub repo xzhoulab/iDEA@HEAD
xzhoulab-iDEA-8233443/docs/MethodPipline.png: truncated gzip input
tar.exe: Error exit delayed from previous errors.
Error: Failed to install 'iDEA' from GitHub:
Does not appear to be an R package (no DESCRIPTION)
In addition: Warning messages:
1: In utils::untar(tarfile, ...) :
‘tar.exe -xf "C:\Users\favid\AppData\Local\Temp\RtmpKGW9I7\file11fc3b125dd5.tar.gz" -C "C:/Users/favid/AppData/Local/Temp/RtmpKGW9I7/remotes11fc3d524f2e"’ returned error code 1
2: In system(cmd, intern = TRUE) :
running command 'tar.exe -tf "C:\Users\favid\AppData\Local\Temp\RtmpKGW9I7\file11fc3b125dd5.tar.gz"' had status 1
‘’‘

Could you help me？
Best wishes

EMMCMCStepSummary error

Hi there, I'm encountering the following error when executing the iDEA.fit.

Error in EMMCMCStepSummary(object@summary[, 1], object@summary[, 2], as.matrix(Annot), : pinv(): svd failed

Here is my code.

DEG <- read.table("my_data.csv")
pvalue <- DEG$p_val
zscore <- qnorm(pvalue/2.0, lower.tail=FALSE) #### convert the pvalue to z-score
log2FC <- DEG$avg_log2FC ## effect size
se_beta <- abs(log2FC/zscore)
lfcSE2 = se_beta^2
summary = data.frame(beta = log2FC,beta_var = lfcSE2)
rownames(summary) = rownames(DEG)
> head(summary)
                 beta    beta_var
MGI:1918911 0.2594995 0.008204673
MGI:1913300 0.6050112 0.014540803
MGI:1915609 0.4179491 0.013050508
MGI:1916013 1.0654890 0.053041579
MGI:2152337 0.2732631 0.007393953
MGI:1913456 0.3679529 0.013129402
> nrow(summary)
[1] 2091


data(mouseGeneSets)
> mouseGeneSets[1:3,1:3]
           GO:0000002 GO:0000003 GO:0000009
MGI:101757          0          0          0
MGI:101758          0          0          0
MGI:101759          0          0          0

idea <- CreateiDEAObject(summary, mouseGeneSets, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)

idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
	         init_beta=NULL, 
	         init_tau=c(-2,0.5),
	         min_degene=5,
	         em_iter=15,
	         mcmc_iter=1000, 
	         fit.tol=1e-5,
                 modelVariant = F,
	         verbose=TRUE)

However, when I try to subset only 4 gene sets, when I create iDEA object. There is no error.
(I referenced this page. [https://github.com//issues/21])

idea <- CreateiDEAObject(summary, mouseGeneSets[,c(1,2,13,14)], max_var_beta = 100, min_precent_annot = 0.0025, num_core = 10)
> head(idea@annotation[[1]])
[1]  426  549  552  556  865 1342
> head(idea@annotation[[2]])
[1] 28 33 37 38 57 58
> head(idea@annotation[[3]])
[1]   26   95  191  548  930 1037
> head(idea@annotation[[4]])
[1]  85  90 264 411 586 623


idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
	         init_beta=NULL, 
	         init_tau=c(-2,0.5),
	         min_degene=5,
	         em_iter=15,
	         mcmc_iter=1000, 
	         fit.tol=1e-5,
                 modelVariant = F,
	         verbose=TRUE)

## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations:  4 
## number of genes:  2091 
## number of cores:  10 
## fitting the model with gene sets information... 
  |======================================================================================================================================================================| 100%, Elapsed 00:11

For information, here is my sessionInfo.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/Soryung/.conda/envs/ENS/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.7    biomaRt_2.50.3 iDEA_1.0.1    

loaded via a namespace (and not attached):
 [1] KEGGREST_1.34.0        progress_1.2.2         tidyselect_1.1.1      
 [4] purrr_0.3.4            generics_0.1.1         vctrs_0.3.8           
 [7] doSNOW_1.0.20          snow_0.4-4             stats4_4.1.1          
[10] BiocFileCache_2.2.1    utf8_1.2.2             blob_1.2.2            
[13] XML_3.99-0.8           rlang_0.4.12           pillar_1.6.4          
[16] withr_2.4.2            glue_1.5.0             DBI_1.1.1             
[19] rappdirs_0.3.3         BiocGenerics_0.40.0    bit64_4.0.5           
[22] dbplyr_2.1.1           GenomeInfoDbData_1.2.7 foreach_1.5.2         
[25] lifecycle_1.0.1        stringr_1.4.0          zlibbioc_1.40.0       
[28] Biostrings_2.62.0      codetools_0.2-18       memoise_2.0.0         
[31] Biobase_2.54.0         IRanges_2.28.0         fastmap_1.1.0         
[34] doParallel_1.0.17      GenomeInfoDb_1.30.0    curl_4.3.2            
[37] parallel_4.1.1         fansi_0.5.0            AnnotationDbi_1.56.2  
[40] pbmcapply_1.5.1        Rcpp_1.0.8.3           filelock_1.0.2        
[43] cachem_1.0.6           S4Vectors_0.32.2       XVector_0.34.0        
[46] bit_4.0.4              hms_1.1.1              png_0.1-7             
[49] digest_0.6.28          stringi_1.7.5          tools_4.1.1           
[52] bitops_1.0-7           magrittr_2.0.1         tibble_3.1.6          
[55] RCurl_1.98-1.5         RSQLite_2.2.10         crayon_1.4.2          
[58] pkgconfig_2.0.3        ellipsis_0.3.2         xml2_1.3.2            
[61] prettyunits_1.1.1      assertthat_0.2.1       httr_1.4.2            
[64] iterators_1.0.14       R6_2.5.1               compiler_4.1.1

Any idea what's causing this error?
Thanks!

Encountering Error when Correcting p-values with iDEA.louis() method

Hi there,
I'm encountering the following error when executing the Louis method p-value correction method as demonstrated in your tutorial.

Error in if (!is.na(res$annot_coef[2])) { : argument is of length zero

In trying to disagnose whats wrong, I notice that the de slot conatins annotations of Type NULL

For the summary statistics, I calculated (using your example from the tutorial) the beta and beta_var vlues from p-value and log2FC from the FindConservedMarkers function from within Seurat.

> head(cluster0_DEstats)
             beta    beta_var
ABCA10  0.4089271 0.015240638
ABCA6   0.5550807 0.017470111
ABCA8   0.3439423 0.024959189
ABCA9   0.4771893 0.013693380
ABLIM1 -0.5137350 0.006145706
ACACB  -0.6458339 0.019331025

For the gene specific annotations, I selected the rows from within the humanGeneSets data included with iDEA that corresponded with the genes returned from the DE analysis.

> cluster0_annotations[1:3,1:3]
       NAKAMURA_CANCER_MICROENVIRONMENT_DN WEST_ADRENOCORTICAL_TUMOR_MARKERS_UP WINTER_HYPOXIA_UP
ABCA10                                   0                                    0                 0
ABCA6                                    0                                    0                 0
ABCA8                                    0                                    0                 0

This is the command I used to create the iDEA object:

cluster0_idea <- CreateiDEAObject(cluster0_DEstats, cluster0_annotations, max_var_beta = 100, min_precent_annot = 0.0025, num_core=1)

And this is the command I used to fit the model:

cluster0_idea <- iDEA.fit(cluster0_idea,
                 fit_noGS=FALSE,
	               init_beta=NULL, 
	               init_tau=c(-2,0.5),
	               min_degene=5,
	               em_iter=15,
	               mcmc_iter=1000, 
	               fit.tol=1e-5,
                       modelVariant = F,
	               verbose=TRUE)

For information, here is my session info:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /shared/centos7/r-project/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /shared/centos7/r-project/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] devtools_2.4.3              usethis_2.1.5               forcats_0.5.1               stringr_1.4.0               purrr_0.3.4                
 [6] readr_2.1.2                 tidyr_1.2.0                 tibble_3.1.7                tidyverse_1.3.1             iDEA_1.0.1                 
[11] HGNChelper_0.8.1            ggsignif_0.6.3              metap_1.8                   DropletUtils_1.10.3         SingleCellExperiment_1.12.0
[16] SummarizedExperiment_1.20.0 Biobase_2.50.0              GenomicRanges_1.42.0        GenomeInfoDb_1.26.7         IRanges_2.28.0             
[21] S4Vectors_0.32.3            BiocGenerics_0.40.0         MatrixGenerics_1.6.0        matrixStats_0.62.0          glue_1.6.2                 
[26] patchwork_1.1.1             cowplot_1.1.1               sctransform_0.3.3           ggplot2_3.3.6               SeuratObject_4.0.4         
[31] Seurat_4.1.0                dplyr_1.0.8                

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                reticulate_1.24           R.utils_2.11.0            tidyselect_1.1.2          htmlwidgets_1.5.4        
  [6] grid_4.0.2                BiocParallel_1.24.1       Rtsne_0.15                munsell_0.5.0             codetools_0.2-16         
 [11] mutoss_0.1-12             ica_1.0-2                 future_1.24.0             miniUI_0.1.1.1            withr_2.5.0              
 [16] spatstat.random_2.1-0     colorspace_2.0-3          knitr_1.39                rstudioapi_0.13           ROCR_1.0-11              
 [21] tensor_1.5                pbmcapply_1.5.1           listenv_0.8.0             Rdpack_2.1.4              GenomeInfoDbData_1.2.4   
 [26] mnormt_2.0.2              polyclip_1.10-0           rhdf5_2.34.0              rprojroot_2.0.2           parallelly_1.30.0        
 [31] vctrs_0.4.1               generics_0.1.2            TH.data_1.1-0             xfun_0.30                 R6_2.5.1                 
 [36] doParallel_1.0.17         locfit_1.5-9.4            cachem_1.0.6              bitops_1.0-7              rhdf5filters_1.2.1       
 [41] spatstat.utils_2.3-0      DelayedArray_0.20.0       assertthat_0.2.1          promises_1.2.0.1          scales_1.2.0             
 [46] multcomp_1.4-18           gtable_0.3.0              beachmat_2.6.4            globals_0.14.0            processx_3.5.3           
 [51] goftest_1.2-3             sandwich_3.0-1            rlang_1.0.2               splines_4.0.2             lazyeval_0.2.2           
 [56] spatstat.geom_2.3-2       broom_0.8.0               modelr_0.1.8              reshape2_1.4.4            abind_1.4-5              
 [61] backports_1.4.1           httpuv_1.6.5              tools_4.0.2               ellipsis_0.3.2            spatstat.core_2.4-0      
 [66] RColorBrewer_1.1-3        sessioninfo_1.2.2         ggridges_0.5.3            TFisher_0.2.0             Rcpp_1.0.8.3             
 [71] plyr_1.8.6                sparseMatrixStats_1.6.0   zlibbioc_1.36.0           RCurl_1.98-1.6            prettyunits_1.1.1        
 [76] ps_1.7.0                  rpart_4.1-15              deldir_1.0-6              pbapply_1.5-0             zoo_1.8-9                
 [81] haven_2.5.0               ggrepel_0.9.1             cluster_2.1.0             fs_1.5.2                  magrittr_2.0.3           
 [86] data.table_1.14.2         scattermore_0.8           openxlsx_4.2.5            reprex_2.0.1              lmtest_0.9-39            
 [91] RANN_2.6.1                tmvnsim_1.0-2             mvtnorm_1.1-3             fitdistrplus_1.1-6        pkgload_1.2.4            
 [96] hms_1.1.1                 mime_0.12                 xtable_1.8-4              readxl_1.3.1              gridExtra_2.3            
[101] testthat_3.1.4            compiler_4.0.2            KernSmooth_2.23-17        crayon_1.5.1              R.oo_1.24.0              
[106] htmltools_0.5.2           tzdb_0.3.0                mgcv_1.8-31               later_1.3.0               snow_0.4-4               
[111] lubridate_1.8.0           DBI_1.1.2                 dbplyr_2.1.1              MASS_7.3-51.6             Matrix_1.4-0             
[116] brio_1.1.3                cli_3.3.0                 R.methodsS3_1.8.1         rbibutils_2.2.7           parallel_4.0.2           
[121] qqconf_1.2.1              igraph_1.2.11             pkgconfig_2.0.3           sn_2.0.1                  numDeriv_2016.8-1.1      
[126] plotly_4.10.0             scuttle_1.0.4             spatstat.sparse_2.1-0     xml2_1.3.3                foreach_1.5.2            
[131] dqrng_0.3.0               multtest_2.50.0           XVector_0.30.0            rvest_1.0.2               callr_3.7.0              
[136] digest_0.6.29             RcppAnnoy_0.0.19          spatstat.data_2.1-2       cellranger_1.1.0          leiden_0.3.9             
[141] uwot_0.1.11               edgeR_3.32.1              DelayedMatrixStats_1.16.0 shiny_1.7.1               lifecycle_1.0.1          
[146] nlme_3.1-148              jsonlite_1.8.0            Rhdf5lib_1.12.1           desc_1.4.1                viridisLite_0.4.0        
[151] limma_3.46.0              fansi_1.0.3               pillar_1.7.0              lattice_0.20-41           pkgbuild_1.3.1           
[156] fastmap_1.1.0             httr_1.4.3                plotrix_3.8-2             survival_3.1-12           remotes_2.4.2            
[161] zip_2.2.0                 png_0.1-7                 iterators_1.0.14          stringi_1.7.6             HDF5Array_1.18.1         
[166] memoise_2.0.1             doSNOW_1.0.20             mathjaxr_1.6-0            irlba_2.3.5               future.apply_1.8.1

Could I have some assistance?

The only thing I can think may be problematic is that I have provided a fairly small set of genes, totaling at 68 for this group, as that was the list returned by the Seurat FindConservedMarkers() method. The only other thing I can think of is that when I was calculating the beta and beta_var, my input data as log2FC not logFC. Should I adjust the calculation accoridngly to address this?

Thank you in advance, I am excited to use this tool as I analyze these data!
Best,
Christian

bulk RNAseq data

Hi iDEA team,Thank you for the great package. Is this method applicable to bulk rnaseq fpkm data?

Tutorial error

I'm trying to run the code provided in the tutorial and gets the following error:

idea <- iDEA.louis(idea)
Error in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 1 worker of 1 failed to connect.

I've tried to change the num_core but it didn't solve the problem.

python version of iDEA

Dear authors,

Thanks for contributing this wonderful tool. I am wondering if you will develop a python version of it? Thanks

can not install iDEA

Hello,
Thank you for developing the nice tool.
I am interested in using this and trying to install following the instructions.
But I am having the error in installing iDEA. I got below errors.

library(devtools)
devtools::install_github('xzhoulab/iDEA')
Downloading GitHub repo xzhoulab/iDEA@HEAD
Error: Failed to install 'iDEA' from GitHub:
Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.
pkgbuild::check_build_tools(debug = TRUE)
Error: Could not find tools necessary to compile a package
Call pkgbuild::check_build_tools(debug = TRUE) to diagnose the problem.

I would appreciate if you could give me an advice. Thank you in advance.

session info
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] scales_1.1.1 sf_0.9-6 leidenbase_0.1.2 ggpubr_0.4.0
[5] ggsci_2.9 mygene_1.22.0 GenomicFeatures_1.38.2 AnnotationDbi_1.48.0
[9] garnett_0.2.16 ggplot2_3.3.2.9000 magrittr_1.5 dplyr_1.0.2
[13] pheatmap_1.0.12 reticulate_1.18 tibble_3.0.4 stringr_1.4.0
[17] Seurat_3.2.2 viridis_0.5.1 viridisLite_0.3.0 VGAM_1.1-4
[21] doSNOW_1.0.19 snow_0.4-3 doParallel_1.0.16 iterators_1.0.13
[25] foreach_1.5.1 RcppArmadillo_0.10.1.2.0 Rcpp_1.0.5 pkgconfig_2.0.3
[29] devtools_2.3.2 usethis_1.6.3 pkgbuild_1.2.0 monocle3_0.2.3.0
[33] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.3 BiocParallel_1.20.1
[37] matrixStats_0.57.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 IRanges_2.20.2
[41] S4Vectors_0.24.4 Biobase_2.46.0 BiocGenerics_0.32.0

loaded via a namespace (and not attached):
[1] proto_1.0.0 tidyselect_1.1.0 RSQLite_2.2.1 htmlwidgets_1.5.2 grid_3.6.3
[6] Rtsne_0.15 munsell_0.5.0 units_0.6-7 codetools_0.2-18 ica_1.0-2
[11] chron_2.3-56 future_1.20.1 miniUI_0.1.1.1 withr_2.3.0 colorspace_1.4-1
[16] knitr_1.30 rstudioapi_0.11 ROCR_1.0-11 ggsignif_0.6.0 tensor_1.5
[21] listenv_0.8.0 GenomeInfoDbData_1.2.2 polyclip_1.10-0 bit64_4.0.5 rprojroot_1.3-2
[26] parallelly_1.21.0 vctrs_0.3.4 generics_0.1.0 xfun_0.19 BiocFileCache_1.10.2
[31] R6_2.5.0 rsvd_1.0.3 bitops_1.0-6 spatstat.utils_1.17-0 assertthat_0.2.1
[36] promises_1.1.1 nnet_7.3-14 gtable_0.3.0 globals_0.13.1 processx_3.4.4
[41] goftest_1.2-2 rlang_0.4.8 rstatix_0.6.0 rtracklayer_1.46.0 lazyeval_0.2.2
[46] broom_0.7.2 checkmate_2.0.0 reshape2_1.4.4 abind_1.4-5 backports_1.2.0
[51] httpuv_1.5.4 Hmisc_4.4-1 tools_3.6.3 ellipsis_0.3.1 RColorBrewer_1.1-2
[56] sessioninfo_1.1.1 ggridges_0.5.2 gsubfn_0.7 plyr_1.8.6 base64enc_0.1-3
[61] progress_1.2.2 zlibbioc_1.32.0 classInt_0.4-3 purrr_0.3.4 RCurl_1.98-1.2
[66] ps_1.4.0 prettyunits_1.1.1 sqldf_0.4-11 rpart_4.1-15 openssl_1.4.3
[71] deldir_0.2-3 pbapply_1.4-3 cowplot_1.1.0 zoo_1.8-8 haven_2.3.1
[76] ggrepel_0.8.2 cluster_2.1.0 fs_1.5.0 data.table_1.13.2 openxlsx_4.2.3
[81] lmtest_0.9-38 RANN_2.6.1 fitdistrplus_1.1-1 pkgload_1.1.0 hms_0.5.3
[86] patchwork_1.1.0 mime_0.9 xtable_1.8-4 XML_3.99-0.3 rio_0.5.16
[91] jpeg_0.1-8.1 readxl_1.3.1 gridExtra_2.3 testthat_3.0.0 compiler_3.6.3
[96] biomaRt_2.42.1 KernSmooth_2.23-18 crayon_1.3.4 htmltools_0.5.0 mgcv_1.8-33
[101] later_1.1.0.1 Formula_1.2-4 tidyr_1.1.2 DBI_1.1.0 dbplyr_2.0.0
[106] MASS_7.3-53 rappdirs_0.3.1 car_3.0-10 Matrix_1.2-18 cli_2.1.0
[111] igraph_1.2.6 forcats_0.5.0 GenomicAlignments_1.22.1 foreign_0.8-75 plotly_4.9.2.1
[116] XVector_0.26.0 callr_3.5.1 digest_0.6.27 sctransform_0.3.1 RcppAnnoy_0.0.16
[121] spatstat.data_1.4-3 Biostrings_2.54.0 cellranger_1.1.0 leiden_0.3.5 htmlTable_2.1.0
[126] uwot_0.1.8 curl_4.3 shiny_1.5.0 Rsamtools_2.2.3 lifecycle_0.2.0
[131] nlme_3.1-150 jsonlite_1.7.1 carData_3.0-4 desc_1.2.0 askpass_1.1
[136] fansi_0.4.1 pillar_1.4.6 lattice_0.20-41 fastmap_1.0.1 httr_1.4.2
[141] survival_3.2-7 glue_1.4.2 remotes_2.2.0 zip_2.1.1 spatstat_1.64-1
[146] png_0.1-7 bit_4.0.4 class_7.3-17 stringi_1.5.3 blob_1.2.1
[151] latticeExtra_0.6-29 memoise_1.1.0 e1071_1.7-4 irlba_2.3.3 future.apply_1.6.0

NES from GSEA

Hi,
What is the equivalent NES score from GSEA in iDEA? How to get the enrichment score in iDEA?
Thanks

How to get SE for LogFC

Hi,
This may not be relevant to iDEA but its a dependency in the input. How to get the SE for log2FC . Packages like EdgeR gives only logfc and pvalue. Is there any way to calculate or fetch for this purpose?

Not able to increase the number of cores to use

I am trying to use iDEA with my dataset, but unfortunately fitting the model is extremely slow.

When I create the iDEA object, I set the number of cores to 10:

idea <- CreateiDEAObject(iDEA_input, mouseGeneSets_reduced, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)

However, when I check inside the iDEA object, what I see is:

idea@num_core

##output

[1] 1

As a matter of fact, when I try to fit the model, what I get is:

idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
	         init_beta=NULL, 
	         init_tau=c(-2,0.5),
	         min_degene=5,
	         em_iter=15,
	         mcmc_iter=1000, 
	         fit.tol=1e-5,
                 modelVariant = F,
	         verbose=TRUE)

## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations:  3767 
## number of genes:  11777 
## number of cores:  1 
## fitting the model with gene sets information...

I guess this is why the step is so slow. Any ideas how could I fix this?

Thank you very much in advance!

Unable to install iDEA

Hi, I'm having trouble with installing iDEA, I'm using mac os catalina and catalina seems to cause a lot of trouble, I've tried several solutions and they don't help.
The error code is attached.
Do you have any solution for me?

devtools::install_github('xzhoulab/iDEA', dependencies = T)
Downloading GitHub repo xzhoulab/iDEA@HEAD
─ installing the package to process help pagesw54l5fj09l33__wdwmrtqg80000gn/T/Rtmpi5wAgA/remotes15edd6693abb1/xzhoulab-iDEA-8233443/DESCRIPTION’ ...
-----------------------------------
─ installing source package ‘iDEA’ ...
** using staged installation
** libs
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppArmadillo/include' -I/usr/local/include -fPIC -Wall -g -O2 -Wall -pedantic -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppArmadillo/include/RcppArmadillo.h:34:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/Rcpp.h:57:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/Rcpp/DataFrame.h:136:18: warning: unused variable 'data' [-Wunused-variable]
SEXP data = Parent::get__();
^
1 warning generated.
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppArmadillo/include' -I/usr/local/include -fPIC -Wall -g -O2 -Wall -pedantic -c iDEASummary.cpp -o iDEASummary.o
In file included from iDEASummary.cpp:3:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppArmadillo/include/RcppArmadillo.h:34:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/Rcpp.h:57:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include/Rcpp/DataFrame.h:136:18: warning: unused variable 'data' [-Wunused-variable]
SEXP data = Parent::get__();
^
iDEASummary.cpp:70:12: warning: unused variable 'a_0beta' [-Wunused-variable]
double a_0beta = 3.0;
^
---------------------------------------
ERROR: package installation failed
エラー: Failed to install 'iDEA' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o iDEA.so RcppExports.o iDEASummary.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0 -L/usr/local/gfortran/lib -lgfortran -lquadmath -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
E> ld: warning: directory not found for option '-L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin18/8.2.0'
E> ld: warning: directory not found for option '-L/usr/local/gfortran/lib'
E> ld: library not found for -lgfortran
E> clang: error: linker command failed with exit code 1 (use -v to see in

How can we draw the result into a figure?

After we bulit the iDEA model uesd the data, there is gsea result in the iDEA@gsea. Could we draw them into a figure but not write in a table? I tried to used the other function but the structure of the object is different. So I download the program in the https://github.com/xzhoulab/iDEA-Analysis , but it did not tell us how to draw the gsea result into a figrue. So I turn to you for help, could you give us an example ahout how to make the result into the figure? Thank you.! @sqsun @YingMa1993

Unable to install Package

Hello,

I was hoping to use your tool to analyze some scSeq datasets I generated-

On the install however I am getting this error-

─ installing source package ‘iDEA’ ...
** using staged installation
** libs
clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppArmadillo/include" -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fopenmp -fPIC -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
clang: error: unsupported option '-fopenmp'
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘iDEA’
─ removing ‘/private/var/folders/6n/zbwc61bx7_d3dpjbx4lkn_9ctqxgjh/T/RtmpoSxrNP/Rinst8e635314891/iDEA’
-----------------------------------
ERROR: package installation failed
Error: Failed to install 'iDEA' from GitHub:
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> * installing source package ‘iDEA’ ...
E> ** using staged installation
E> ** libs
E> clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppArmadillo/include" -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fopenmp -fPIC -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
E> clang: error: unsupported option '-fopenmp'
E> make: *** [RcppExports.o] Error 1
E> ERROR: compilation failed for package ‘iDEA’
E> * removing ‘/private/var/folders/6n/zbwc61bx7_d3dpjbx4lkn_9ctqxgjh/T/RtmpoSxrNP/Rinst8e635314891/iDEA’
E> -----------------------------------
E> ERROR: package installation failed
In addition: Warning message:
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’

Any idea why? I have already tried re-installing gcc through homebrew

Session info below
─ Session info ───────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os macOS Mojave 10.14.6
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2020-10-13

─ Packages ───────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
AnnotationDbi 1.48.0 2019-10-29 [1] Bioconductor
ape 5.4 2020-06-03 [1] CRAN (R 3.6.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.8 2020-06-17 [1] CRAN (R 3.6.2)
Biobase 2.46.0 2019-10-29 [1] Bioconductor
BiocGenerics 0.32.0 2019-10-29 [1] Bioconductor
BiocParallel 1.20.1 2019-12-21 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 3.6.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 3.6.2)
blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.0)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
cluster 2.1.0 2019-06-19 [1] CRAN (R 3.6.1)
codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
cowplot 1.0.0 2019-07-11 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.0)
data.table 1.12.8 2019-12-09 [1] CRAN (R 3.6.0)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.3.0 2020-04-10 [1] CRAN (R 3.6.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
DO.db 2.9 2020-05-05 [1] Bioconductor
DOSE * 3.10.2 2019-06-24 [1] Bioconductor
dplyr 1.0.0 2020-05-29 [1] CRAN (R 3.6.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 3.6.0)
fgsea 1.10.1 2019-08-21 [1] Bioconductor
fitdistrplus 1.1-1 2020-05-19 [1] CRAN (R 3.6.2)
fs 1.4.2 2020-06-30 [1] CRAN (R 3.6.2)
future 1.18.0 2020-07-09 [1] CRAN (R 3.6.2)
future.apply 1.6.0 2020-07-01 [1] CRAN (R 3.6.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
ggplot2 3.3.2 2020-06-19 [1] CRAN (R 3.6.2)
ggrepel 0.8.2 2020-03-08 [1] CRAN (R 3.6.0)
ggridges 0.5.2 2020-01-12 [1] CRAN (R 3.6.0)
globals 0.12.5 2019-12-07 [1] CRAN (R 3.6.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 3.6.2)
GO.db 3.8.2 2020-05-05 [1] Bioconductor
GOSemSim 2.10.0 2019-05-02 [1] Bioconductor
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 3.6.2)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.0)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
ica 1.0-2 2018-05-24 [1] CRAN (R 3.6.0)
igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.0)
IRanges 2.20.2 2020-01-13 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 3.6.0)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 3.6.2)
KernSmooth 2.23-17 2020-04-26 [1] CRAN (R 3.6.2)
lattice 0.20-41 2020-04-02 [1] CRAN (R 3.6.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
leiden 0.3.3 2020-02-04 [1] CRAN (R 3.6.0)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
listenv 0.8.0 2019-12-05 [1] CRAN (R 3.6.0)
lmtest 0.9-37 2019-04-30 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.6 2020-04-26 [1] CRAN (R 3.6.2)
Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
nlme 3.1-148 2020-05-24 [1] CRAN (R 3.6.2)
patchwork 1.0.1 2020-06-22 [1] CRAN (R 3.6.2)
pbapply 1.4-2 2019-08-31 [1] CRAN (R 3.6.0)
pillar 1.4.6 2020-07-10 [1] CRAN (R 3.6.2)
pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 3.6.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.2)
plotly 4.9.2.1 2020-04-04 [1] CRAN (R 3.6.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.1)
png 0.1-7 2013-12-03 [1] CRAN (R 3.6.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.0)
processx 3.4.3 2020-07-05 [1] CRAN (R 3.6.2)
ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
qvalue 2.16.0 2019-05-02 [1] Bioconductor
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
RANN 2.6.1 2019-01-08 [1] CRAN (R 3.6.0)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.6.0)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.2)
RcppAnnoy 0.0.16 2020-03-08 [1] CRAN (R 3.6.0)
remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.0)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 3.6.2)
reticulate 1.16 2020-05-27 [1] CRAN (R 3.6.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 3.6.2)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 3.6.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.0)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
rsvd 1.0.3 2020-02-17 [1] CRAN (R 3.6.0)
Rtsne 0.15 2018-11-10 [1] CRAN (R 3.6.0)
S4Vectors 0.24.4 2020-04-09 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.2)
sctransform 0.2.1 2019-12-17 [1] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
Seurat 3.1.5 2020-04-16 [1] CRAN (R 3.6.2)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
survival 3.2-3 2020-06-13 [1] CRAN (R 3.6.2)
testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.1)
tibble 3.0.3 2020-07-10 [1] CRAN (R 3.6.2)
tidyr 1.1.0 2020-05-20 [1] CRAN (R 3.6.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.2)
tsne 0.1-3 2016-07-15 [1] CRAN (R 3.6.0)
usethis 1.6.1 2020-04-29 [1] CRAN (R 3.6.2)
uwot 0.1.8 2020-03-16 [1] CRAN (R 3.6.0)
vctrs 0.3.4 2020-08-29 [1] CRAN (R 3.6.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.2)
zoo 1.8-8 2020-05-02 [1] CRAN (R 3.6.2)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Fitting the iDEA model stuck at 0%

Hi,

I'm trying to use iDEA in my scRNA-Seq mouse datasets using your tutorial but I'm facing an issue.

I got my (cluster-specific) DE (and non-DE) genes using MAST on a Seurat object and calculated summary statistics. This is how the dataframe holding summary statistics looks like this:

	beta	beta_var
Dynlt1c	-0.004788767	1.532690e-04
Dynlt1f	-0.018210094	8.189469e-04
Dynlt3	0.099825669	1.633012e-03
Dyrk1a	-0.153623041	5.770188e-03
Dyrk1b	-0.047206616	7.495446e-04
Dyrk2	-0.009460417	3.289918e-05
Dyrk3	0.014333879	3.397681e-05
Dysf	0.032546407	1.065623e-03
Dzank1	-0.027355513	3.783456e-04
Dzip1	0.020114404	2.959934e-04
Dzip1l	0.002514212	7.914841e-07

I'm also using the mouseGeneSets you provide through the function data(mouseGeneSets). Here is the output of the command mouseGeneSets[1:3,1:3]:

	GO:0000002	GO:0000003	GO:0000009
Cfl1	0	0	0
Dcaf8l	0	0	0
Syt4	0	0	0

As you can see, I'm using gene names (instead of IDs) in both the mouseGeneSets and MAST results.

Subsequently, I'm creating an iDEA object using the following command:

idea <- CreateiDEAObject(summary, mouseGeneSets, max_var_beta = 100, min_precent_annot = 0.0025, num_core = 10)

After a few seconds, the iDEA object is ready.

The format of the idea@summary table looks like this:

	beta	beta_var
Fam220a	0.003484590	0.0000513549
Fam221a	-0.010439597	0.0002479317
Fam221b	0.008453979	0.0008646627
Fam222a	0.011624834	0.0005858999
Fam222b	0.014648323	0.0002980931
Fam227a	0.032733509	0.0006496645

Finally, I'm executing the following command:

idea <- iDEA.fit(idea, fit_noGS = FALSE, init_beta = NULL, init_tau = c(-2, 0.5), min_degene = 5, em_iter = 15, mcmc_iter = 1000, fit.tol = 1e-5, modelVariant = FALSE, verbose = TRUE)

and the message I get in the console is the following:

## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations: 3794
## number of genes: 16307
## number of cores: 10
## fitting the model with gene sets information...
| | 0%, ETA NA

Then, I waited for a long time (more than an hour) but nothing happened. The progress is stuck at 0%.

Any idea what's causing this?

Thank you for your time!

iDEA on n=1?

Dear authors,

Your iDEA package seems very useful – I am keen to try it.
I have created an integrated dataset using Seurat v4 from one control and one treatment single-cell RNA-seq sample from the public geodatabase.
Does your package require more replicates (created by bootstrapping)?

Cheers, Maibritt

Question about creating a gene set database

Hi iDEA team,

Thank you for the great package. Could you please tell me or provide a sample code for how to create a gene set database (gene specific annotation files in the tutorial) from gaf or gmt file, or simply from a set of gene lists, from databases (GO, KEGG, etc.)? I want to create a zebrafish gene set database to perform iDEA. I understand that the row names are genes and column names are annotation names/gene set names. I might miss some info, but I would be grateful if you could explain the meaning of the number (0 or 1?) in each row.

Best,