Git Product home page Git Product logo

Comments (10)

FloWuenne avatar FloWuenne commented on September 16, 2024 1

Working fine now! I have to say I can't really pinpoint what the problem was. I guess when people run into this error the best way is to make sure that the input is a matrix and that inputTFs are in rownames of that matrix.

I think your examples will help many people running into this issue.
Thank you very much @s-aibar , you can close the ticket for me! 👍

from scenic.

FloWuenne avatar FloWuenne commented on September 16, 2024

Hi Assaf,

I am having the same problem when running my data. Did you figure out what your problem was?

Thanks,

Florian

from scenic.

s-aibar avatar s-aibar commented on September 16, 2024

Hello,

That code is at the end of the parallel computation, so as temporary solution you might want to use nCores=1.

In order to reproduce the error (and provide more useful help...) I would need more info... What type of system are you using? (Windows/Linux/Mac? Some of the parallel functions are not available on Windows...) Can you provide a minimal example (or part of the data that is producing the error?) and the output of sessionInfo() to try to reproduce the error?

from scenic.

FloWuenne avatar FloWuenne commented on September 16, 2024

Hi there,

I am running GENIE3 on a Linux cluster using a torque scheduler system. I am running the code on 1 node with 12 cores, therefore, using parallel computation at 12 cores.

My data is a matrix with normalized expression values from Drop-seq. I tried runnin a subset of data for computation speed but also tried full matrix and both gave the same error. The current matrix I am trying to run is 225 cells x 7566 genes.

Here is my code snippet, mainly adopted from the "Running SCENIC" tutorial. I load my expression matrix from a precomputed seurat object and then filter out genes:

Thanks for your help in advance!

### Define expression matrix
exp_matrix <- as.matrix(expression_seurat_hqc@data)

org <- "mm9"

if(org=="hg19")
{
  library(RcisTarget.hg19.motifDatabases.20k)
  
  ### Get genes in databases:
  data(hg19_500bpUpstream_motifRanking) # or 10kbp, they should have the same genes
  genesInDatabase <- hg19_500bpUpstream_motifRanking@rankings$rn
  
  ### Get TFS in databases:
  data(hg19_direct_motifAnnotation)
  allTFs <- hg19_direct_motifAnnotation$allTFs
}

if(org=="mm9")
{
  library(RcisTarget.mm9.motifDatabases.20k)
  
  ### Get genes in databases:
  data(mm9_500bpUpstream_motifRanking) # or 10kbp, they should have the same genes
  genesInDatabase <- mm9_500bpUpstream_motifRanking@rankings$rn
  
  ### Get TFS in databases:
  data(mm9_direct_motifAnnotation)
  allTFs <- mm9_direct_motifAnnotation$allTFs
}

### Gene filter / selection
nCellsPerGene <- apply(exp_matrix, 1, function(x) sum(x>0))
nCountsPerGene <- apply(exp_matrix, 1, sum)

gene_info <- data.frame("CellperGene" = nCellsPerGene,
                        "CountsPerGene" = nCountsPerGene)

### Filter genes
gene_info_filtered <- subset(gene_info,log10(CountsPerGene) > 1)
gene_info_filtered <- subset(gene_info_filtered,CellperGene > nrow(exp_pData)*0.01)

### Filter out genes that are not in the Rcis database
genesLeft_minCells_inDatabases <- rownames(gene_info_filtered)[which(rownames(gene_info_filtered) %in% genesInDatabase)]
length(genesLeft_minCells_inDatabases)

### Subset expression matrix for genes to use
exp_matrix_filtered <- exp_matrix[genesLeft_minCells_inDatabases,]

### Potential regulators
inputTFs <- allTFs[allTFs%in% rownames(exp_matrix_filtered)]
save(inputTFs, file="./int/1.2_inputTFs.RData")

### Run GENIE3
weightMatrix <- GENIE3(exp_matrix_filtered,regulators=inputTFs, nCores=12)


sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.3 (Carbon)

Matrix products: default
BLAS: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRblas.so
LAPACK: /home/apps/Logiciels/R/3.4.0-gcc/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] forcats_0.2.0       stringr_1.2.0       dplyr_0.7.4        
 [4] purrr_0.2.4         readr_1.1.1         tidyr_0.7.2        
 [7] tibble_1.3.4        tidyverse_1.2.1     doParallel_1.0.11  
[10] iterators_1.0.8     foreach_1.4.3       GENIE3_1.0.0       
[13] SCENIC_0.1.7        Seurat_2.1.0        bigmemory_4.5.31   
[16] bigmemory.sri_0.1.3 Biobase_2.38.0      BiocGenerics_0.24.0
[19] Matrix_1.2-9        cowplot_0.8.0       ggplot2_2.2.1      

loaded via a namespace (and not attached):
  [1] readxl_1.0.0         backports_1.1.0      Hmisc_4.0-3         
  [4] VGAM_1.0-4           NMF_0.20.6           sn_1.5-0            
  [7] plyr_1.8.4           igraph_1.1.2         lazyeval_0.2.0      
 [10] splines_3.4.0        gridBase_0.4-7       digest_0.6.12       
 [13] htmltools_0.3.6      lars_1.2             gdata_2.18.0        
 [16] magrittr_1.5         checkmate_1.8.3      cluster_2.0.6       
 [19] mixtools_1.1.0       ROCR_1.0-7           modelr_0.1.1        
 [22] R.utils_2.6.0        colorspace_1.3-2     rvest_0.3.2         
 [25] haven_1.1.0          crayon_1.3.4         jsonlite_1.5        
 [28] lme4_1.1-13          bindr_0.1            survival_2.41-3     
 [31] ape_4.1              glue_1.1.1           registry_0.3        
 [34] gtable_0.2.0         MatrixModels_0.4-1   car_2.1-5           
 [37] kernlab_0.9-25       prabclus_2.2-6       DEoptimR_1.0-8      
 [40] SparseM_1.77         scales_0.4.1         mvtnorm_1.0-6       
 [43] rngtools_1.2.4       Rcpp_0.12.13         dtw_1.18-1          
 [46] xtable_1.8-2         htmlTable_1.9        tclust_1.2-7        
 [49] foreign_0.8-67       proxy_0.4-17         mclust_5.3          
 [52] SDMTools_1.1-221     Formula_1.2-2        stats4_3.4.0        
 [55] tsne_0.1-3           htmlwidgets_0.9      httr_1.3.1          
 [58] FNN_1.1              gplots_3.0.1         RColorBrewer_1.1-2  
 [61] fpc_2.1-10           acepack_1.4.1        modeltools_0.2-21   
 [64] ica_1.0-1            pkgconfig_2.0.1      R.methodsS3_1.7.1   
 [67] flexmix_2.3-14       nnet_7.3-12          caret_6.0-76        
 [70] rlang_0.1.4          reshape2_1.4.2       cellranger_1.1.0    
 [73] munsell_0.4.3        tools_3.4.0          cli_1.0.0           
 [76] ranger_0.8.0         broom_0.4.3          ModelMetrics_1.1.0  
 [79] knitr_1.16           robustbase_0.92-7    caTools_1.17.1      
 [82] bindrcpp_0.2         pbapply_1.3-3        nlme_3.1-131        
 [85] quantreg_5.33        R.oo_1.21.0          xml2_1.1.1          
 [88] rstudioapi_0.7       compiler_3.4.0       pbkrtest_0.4-7      
 [91] ggjoy_0.3.0          stringi_1.1.5        lattice_0.20-35     
 [94] trimcluster_0.1-2    psych_1.7.8          nloptr_1.0.4        
 [97] diffusionMap_1.1-0   data.table_1.10.4-3  bitops_1.0-6        
[100] irlba_2.2.1          AUCell_0.99.5        R6_2.2.2            
[103] latticeExtra_0.6-28  KernSmooth_2.23-15   gridExtra_2.2.1     
[106] RcisTarget_0.99.0    codetools_0.2-15     MASS_7.3-47         
[109] gtools_3.5.0         assertthat_0.2.0     pkgmaker_0.22       
[112] mnormt_1.5-5         diptest_0.75-7       mgcv_1.8-17         
[115] hms_0.3              grid_3.4.0           rpart_4.1-11        
[118] class_7.3-14         minqa_1.2.4          segmented_0.5-2.1   
[121] Rtsne_0.13           numDeriv_2016.8-1    scatterplot3d_0.3-40
[124] lubridate_1.7.1      base64enc_0.1-3

from scenic.

s-aibar avatar s-aibar commented on September 16, 2024

Hi again,

Thanks for the info!

I have run GENIE3 using your code and some Drop-seq data with similar characteristics, but the only way I have managed to reproduce the error is by artificially changing the dimensions of weightMatrix and weightMatrix.reg inside the function.

So, just to make sure... can you confirm that the size of the expression matrix and the row names just before entering GENIE3 are what you expect? (the matrix should contain the gene names as rownames() ...)

dim(exp_matrix_filtered)
exp_matrix_filtered[1:5,1:4]

I have added some extra checks for the next version (in case a similar error appears in the future...), but if you would like to help finding out exactly what is causing your error, you can re-run GENIE3 with the same settings after runningoptions(error = recover). This will trigger the debugger, and then you can explore the values of the variables that caused the error, which probably has something to do with inconsistencies within these values:

length(targetNames)
length(regulatorNames)
head(targetNames)
head(regulatorNames)
dim(weightMatrix.reg)
dim(weightMatrix)
weightMatrix.reg[1:5,1:4]
weightMatrix[1:5,1:4]

from scenic.

FloWuenne avatar FloWuenne commented on September 16, 2024

Thank you for the quick feedback. I was also troubleshooting and I found that running with only 1 core (nCores=1) seems to work just fine, so it suggests to me that there might be some issue with the Parallelization going on when running it on a remote node rather than a local machine...

Could there be an issue with the remote node not having any dependencies or similar? On our cluster we run jobs via qsub to the torque scheduler which then launches a remote node that will run the job. All the required R packages will be loaded but maybe I am missing a linux package that is not generally loaded on our worker nodes but is present on the login node?

I will let you know whether I can get it to work with multiple cores but so far I did not have any luck...

from scenic.

s-aibar avatar s-aibar commented on September 16, 2024

Have you checked if the basic example in GENIE3 works? (adding multiple cores, of course)
If it also crashes, then at least we know that it is something in the setup/parallelization, not depending on the data itself...

(We often run GENIE3 also on cluster with qsub, and we have not come up with this error so far...)

## Generate fake expression matrix
exprMatrix <- matrix(sample(1:10, 100, replace=TRUE), nrow=20)
rownames(exprMatrix) <- paste("Gene", 1:20, sep="")
colnames(exprMatrix) <- paste("Sample", 1:5, sep="")

## Run GENIE3
set.seed(123) # For reproducibility of results
weightMatrix <- GENIE3(exprMatrix, regulators=paste("Gene", 1:5, sep=""), nCores=4)

from scenic.

FloWuenne avatar FloWuenne commented on September 16, 2024

Thanks for the running example, should've tried with a simple small snippet like this, my bad.
The code you send works, so I guess it definitely has to do with my matrix. I am using normalized values, this could not be the issue that they are not integers right?

I will go over my data again and see what might cause the problem...

from scenic.

FloWuenne avatar FloWuenne commented on September 16, 2024

So all smaller examples I have run so far have worked now even with multiple cores on the cluster.
I am currently running the full dataset with high-quality cells using GENIE3 and will let you know whether this works and we can close the ticket.

Quick optimization question for other people as well. I had actually not considered this before but does using normalized data slow down the GENIE3 run as well since we are using double values instead of integers and therefore have to load a lot more data into the function?

from scenic.

liuyifang avatar liuyifang commented on September 16, 2024

Hi, the problem maybe some parallel jobs die due to lack of memory. Perhaps move to a larger memory cluster would help.

from scenic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.