Git Product home page Git Product logo

Comments (4)

guidohooiveld avatar guidohooiveld commented on July 17, 2024

If you would like to show in the emapplot the same gene sets as in the dotplot, one option is that you:

  1. explicitly assign the output (=dotplot) to an object, 2) extract from this object the gene sets that are plotted, and 3) only show these in the emapplot.
    BTW: note that the function dotplot is part of the enrichplot package, and not clusterProfiler.
> library(DOSE)
> library(clusterProfiler)
> library(enrichplot)
> data(geneList)
> 
> edo2 <- gseDO(geneList, eps=0)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> ## 1) assign dotplot to an object
> ##    note that NO graph is plotted yet!
> p <- enrichplot::dotplot(edo2, showCategory=10, split=".sign") + facet_grid(.~.sign)
> 
> ## plot the graph (dotplot)
> print(p)
> 
> ## 2) Extract the gene sets. These are the 20 sets that are in the dotplot (10 activated, 10 supressed).
> geneSets2plot <- p$data$Description
> 
> ## check
> geneSets2plot
 [1] tuberculosis                                   
 [2] primary immunodeficiency disease               
 [3] primary bacterial infectious disease           
 [4] bacterial infectious disease                   
 [5] pulmonary tuberculosis                         
 [6] combined immunodeficiency                      
 [7] human immunodeficiency virus infectious disease
 [8] embryonal cancer                               
 [9] malaria                                        
[10] embryoma                                       
[11] abdominal obesity-metabolic syndrome 1         
[12] abdominal obesity-metabolic syndrome           
[13] essential hypertension                         
[14] bone structure disease                         
[15] impotence                                      
[16] sexual dysfunction                             
[17] uterine fibroid                                
[18] polycystic ovary syndrome                      
[19] degeneration of macula and posterior pole      
[20] ovarian dysfunction                            
20 Levels: embryoma ... bone structure disease
> 
> ## 3) make a emapplot showing only the selected gene sets
> edo <- pairwise_termsim(edo2)
> p2 <- enrichplot::emapplot(edo, showCategory=geneSets2plot)
> 
> ## plot the graph (emapplot)
> print(p2)
> 
> 

dotplot:
dotplot

emapplot:
emapplot

from clusterprofiler.

Leandromvelez avatar Leandromvelez commented on July 17, 2024

Thank you very much for your response, and sorry for not explaining this issue in more detail:
The problem is the error "Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :
subscript out of bounds" and only occurred with some of the genelists I'm working on. For some gse results, the solution you posted, works, but not for some others. It seems there is a problem with the order of the terms, but I reordered the terms to match the object used by emapplot, but I'm still getting the error. I'm uploading one of the gse objects giving me the error (hope this links works, an environment with the gse object https://drive.google.com/file/d/172HmlPfgtueBLBqCcZUcOdSrxZecNUxB/view?usp=sharing )

, and the scripts I've been using:

library(clusterProfiler)
library(enrichplot)

g2 = clusterProfiler::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
print (g2)
terms = as.character(g2$data$Description)
terms
p1 <- pairwise_termsim(gse)

matched_indices = p1@result$Description[p1@result$Description %in% terms]

checked the order of terms for terms and matched_indices

enrichplot::emapplot(p1 , showCategory= matched_indices)

Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :

subscript out of bounds

Thanks in advance!

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on July 17, 2024

When posting, please format your code! Select that text, and use the <> button to format as code.

I could reproduce your problem with the provided gse object. Combined with a traceback(), it becomes clear it has to do with the similarity matrix.

> enrichplot::emapplot(p1 , showCategory= terms)
Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] : 
  subscript out of bounds
>
> 
> traceback()
6: build_emap_graph(enrichDf = y, geneSets = geneSets, color = color, 
       cex_line = cex_line, min_edge = min_edge, pair_sim = x@termsim, 
       method = x@method)
5: get_igraph(x = x, nCategory = n, color = color, cex_line = cex_line, 
       min_edge = min_edge)
4: emapplot.enrichResult(x, showCategory = showCategory, ...)
3: .local(x, ...)
2: enrichplot::emapplot(p1, showCategory = terms)
1: enrichplot::emapplot(p1, showCategory = terms)

Note that the total number of significant gene sets in your object gse is 1338:

> dim(gse)
[1] 1338   11
>

.... whereas the dimensions of the similarity matrix is (only) 200 x 200:

> dim(p1@termsim)
[1] 200 200
>

This is because for the function pairwise_termsim() the default setting is showCategory = 200. See ?pairwise_termsim; showCategory - number of enriched terms to display, default value is 200.

Combine this with fact that within gse, by default, the gene sets are ranked on significance, but in the dotplot the gene sets are ranked on GeneRatio.

It thus turns out that some of gene sets that have a large GeneRatio, are not in the top 200 of significant gene sets. As a consequence of this, sub-setting of p1 goes wrong when a set of gene sets based on GeneRatio is provided.... because the required similarity matrix is missing for these gene sets!

This can be simply solved by calculating the similarity matrix for all gene sets, and not only the top 200.

Therefore (note that I also modified [the order of] your code slightly):

> library(clusterProfiler)
> library(enrichplot)
> load("gse.RData")
> 
> ## check
> gse
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     BP 
#...@keytype     SYMBOL 
#...@geneList    Named num [1:33511] 2.57 2.48 2.19 2.19 2.12 ...
 - attr(*, "names")= chr [1:33511] "LHFPL4" "FBXO40" "OR1D3P" "DAAM2-AS1" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...1338 enriched terms found
'd
 $ ID             : chr  "GO:0006302" "GO:0006397" "GO:0018205" "GO:0007059" ...
 $ Description    : chr  "double-strand break repair" "mRNA processing" "peptidyl-lysine modification" "chromosome segregation" ...
 $ setSize        : int  303 484 372 433 468 310 434 364 326 242 ...
 $ enrichmentScore: num  -0.538 -0.475 -0.502 -0.485 -0.472 ...
 $ NES            : num  -2 -1.79 -1.88 -1.82 -1.78 ...
 $ pvalue         : num  2.01e-24 5.57e-23 3.34e-22 7.31e-22 8.73e-22 ...
 $ p.adjust       : num  1.28e-20 1.78e-19 7.10e-19 1.11e-18 1.11e-18 ...
 $ qvalue         : num  9.08e-21 1.26e-19 5.03e-19 7.90e-19 7.90e-19 ...
 $ rank           : num  8691 10625 9127 12397 9312 ...
 $ leading_edge   : chr  "tags=59%, list=26%, signal=44%" "tags=57%, list=32%, signal=40%" "tags=57%, list=27%, signal=42%" "tags=63%, list=37%, signal=40%" ...
 $ core_enrichment: chr  "SFR1/NUCKS1/MBTD1/OTUB1/ZBTB7A/YEATS4/CDCA5/RPA1/PARP2/DPF2/HELB/RPA3/PNKP/ATRIP/RIF1/AP5Z1/RAD50/WRAP53/CDC45/"| __truncated__ "SRSF8/RNPC3/SF3B5/SNRNP25/TSEN34/ZRANB2/SF3B6/APOBEC4/HNRNPR/BRDT/HNRNPK/TTF2/RBM14/HABP4/AKAP17A/KHDRBS1/PTBP1"| __truncated__ "ZBED1/SIRT5/YEATS4/FAM86B1/KAT6A/RELA/PPARGC1A/BCL11A/H1-3/KANSL1/CREBBP/PAGR1/RIF1/CBX4/ASH1L/SENP5/UBE2I/METT"| __truncated__ "SMC4/EML4/GOLGA8G/WAPL/SPAG5/ABRAXAS2/NCAPH/PPP2R1A/MACROH2A1/NTMT1/ACTR2/BCL7A/CDC42/UBE2C/PINX1/ARHGEF10/DPF3"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> ## apply 'pairwise_termsim' before making plots
> ## calculate the similarity matrix for all gene sets, and not the top 200!
> ## note that this takes some time
> gse <- pairwise_termsim(gse, showCategory = dim(gse)[1] )
> 
> 
> ## make dotplot
> g2 <- enrichplot::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
> print(g2)
> 
> ## extract the gene sets that are in the dotplot
> terms = as.character(g2$data$Description)
> terms
 [1] "linoleic acid metabolic process"                                  
 [2] "cysteine metabolic process"                                       
 [3] "response to food"                                                 
 [4] "ovulation"                                                        
 [5] "acetyl-CoA biosynthetic process from pyruvate"                    
 [6] "miRNA-mediated gene silencing by mRNA destabilization"            
 [7] "sequestering of triglyceride"                                     
 [8] "positive regulation of fatty acid metabolic process"              
 [9] "lipoxygenase pathway"                                             
[10] "regulation of amino acid import across plasma membrane"           
[11] "double-strand break repair"                                       
[12] "mRNA processing"                                                  
[13] "peptidyl-lysine modification"                                     
[14] "chromosome segregation"                                           
[15] "histone modification"                                             
[16] "regulation of response to DNA damage stimulus"                    
[17] "proteasome-mediated ubiquitin-dependent protein catabolic process"
[18] "vesicle organization"                                             
[19] "DNA recombination"                                                
[20] "protein polyubiquitination"                                       
> 
> 
> ## Make an emapplot showing only the selected gene sets
> g3 <- enrichplot::emapplot(gse, showCategory = terms)
> print(g3)
> 
> 
> 
>

dotplot g2:

dotplot

emapplot g3:
emapplot

from clusterprofiler.

Leandromvelez avatar Leandromvelez commented on July 17, 2024

That worked perfectly!
Thank you very much for detailed explanation!
And sorry for not posting the lines in the correct format for code.

from clusterprofiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.