Hello dear Clusterprofiler team! When segregating for activated and suppressed ter

Emmaplot with the top activated and supressed from dotplot? about clusterprofiler HOT 4 OPEN

Leandromvelez commented on July 17, 2024

Emmaplot with the top activated and supressed from dotplot?

from clusterprofiler.

Comments (4)

guidohooiveld commented on July 17, 2024

If you would like to show in the emapplot the same gene sets as in the dotplot, one option is that you:

explicitly assign the output (=dotplot) to an object, 2) extract from this object the gene sets that are plotted, and 3) only show these in the emapplot.
BTW: note that the function dotplot is part of the enrichplot package, and not clusterProfiler.

> library(DOSE)
> library(clusterProfiler)
> library(enrichplot)
> data(geneList)
> 
> edo2 <- gseDO(geneList, eps=0)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> ## 1) assign dotplot to an object
> ##    note that NO graph is plotted yet!
> p <- enrichplot::dotplot(edo2, showCategory=10, split=".sign") + facet_grid(.~.sign)
> 
> ## plot the graph (dotplot)
> print(p)
> 
> ## 2) Extract the gene sets. These are the 20 sets that are in the dotplot (10 activated, 10 supressed).
> geneSets2plot <- p$data$Description
> 
> ## check
> geneSets2plot
 [1] tuberculosis                                   
 [2] primary immunodeficiency disease               
 [3] primary bacterial infectious disease           
 [4] bacterial infectious disease                   
 [5] pulmonary tuberculosis                         
 [6] combined immunodeficiency                      
 [7] human immunodeficiency virus infectious disease
 [8] embryonal cancer                               
 [9] malaria                                        
[10] embryoma                                       
[11] abdominal obesity-metabolic syndrome 1         
[12] abdominal obesity-metabolic syndrome           
[13] essential hypertension                         
[14] bone structure disease                         
[15] impotence                                      
[16] sexual dysfunction                             
[17] uterine fibroid                                
[18] polycystic ovary syndrome                      
[19] degeneration of macula and posterior pole      
[20] ovarian dysfunction                            
20 Levels: embryoma ... bone structure disease
> 
> ## 3) make a emapplot showing only the selected gene sets
> edo <- pairwise_termsim(edo2)
> p2 <- enrichplot::emapplot(edo, showCategory=geneSets2plot)
> 
> ## plot the graph (emapplot)
> print(p2)
> 
>

dotplot:

emapplot:

from clusterprofiler.

Leandromvelez commented on July 17, 2024

Thank you very much for your response, and sorry for not explaining this issue in more detail:
The problem is the error "Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :
subscript out of bounds" and only occurred with some of the genelists I'm working on. For some gse results, the solution you posted, works, but not for some others. It seems there is a problem with the order of the terms, but I reordered the terms to match the object used by emapplot, but I'm still getting the error. I'm uploading one of the gse objects giving me the error (hope this links works, an environment with the gse object https://drive.google.com/file/d/172HmlPfgtueBLBqCcZUcOdSrxZecNUxB/view?usp=sharing )

, and the scripts I've been using:

library(clusterProfiler)
library(enrichplot)

g2 = clusterProfiler::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
print (g2)
terms = as.character(g2$data$Description)
terms
p1 <- pairwise_termsim(gse)

matched_indices = p1@result$Description[p1@result$Description %in% terms]

checked the order of terms for terms and matched_indices

enrichplot::emapplot(p1 , showCategory= matched_indices)

Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :

subscript out of bounds

Thanks in advance!

from clusterprofiler.

guidohooiveld commented on July 17, 2024

When posting, please format your code! Select that text, and use the <> button to format as code.

I could reproduce your problem with the provided gse object. Combined with a traceback(), it becomes clear it has to do with the similarity matrix.

> enrichplot::emapplot(p1 , showCategory= terms)
Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] : 
  subscript out of bounds
>
> 
> traceback()
6: build_emap_graph(enrichDf = y, geneSets = geneSets, color = color, 
       cex_line = cex_line, min_edge = min_edge, pair_sim = x@termsim, 
       method = x@method)
5: get_igraph(x = x, nCategory = n, color = color, cex_line = cex_line, 
       min_edge = min_edge)
4: emapplot.enrichResult(x, showCategory = showCategory, ...)
3: .local(x, ...)
2: enrichplot::emapplot(p1, showCategory = terms)
1: enrichplot::emapplot(p1, showCategory = terms)

Note that the total number of significant gene sets in your object gse is 1338:

> dim(gse)
[1] 1338   11
>

.... whereas the dimensions of the similarity matrix is (only) 200 x 200:

> dim(p1@termsim)
[1] 200 200
>

This is because for the function pairwise_termsim() the default setting is showCategory = 200. See ?pairwise_termsim; showCategory - number of enriched terms to display, default value is 200.

Combine this with fact that within gse, by default, the gene sets are ranked on significance, but in the dotplot the gene sets are ranked on GeneRatio.

It thus turns out that some of gene sets that have a large GeneRatio, are not in the top 200 of significant gene sets. As a consequence of this, sub-setting of p1 goes wrong when a set of gene sets based on GeneRatio is provided.... because the required similarity matrix is missing for these gene sets!

This can be simply solved by calculating the similarity matrix for all gene sets, and not only the top 200.

Therefore (note that I also modified [the order of] your code slightly):

> library(clusterProfiler)
> library(enrichplot)
> load("gse.RData")
> 
> ## check
> gse
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     BP 
#...@keytype     SYMBOL 
#...@geneList    Named num [1:33511] 2.57 2.48 2.19 2.19 2.12 ...
 - attr(*, "names")= chr [1:33511] "LHFPL4" "FBXO40" "OR1D3P" "DAAM2-AS1" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...1338 enriched terms found
'd
 $ ID             : chr  "GO:0006302" "GO:0006397" "GO:0018205" "GO:0007059" ...
 $ Description    : chr  "double-strand break repair" "mRNA processing" "peptidyl-lysine modification" "chromosome segregation" ...
 $ setSize        : int  303 484 372 433 468 310 434 364 326 242 ...
 $ enrichmentScore: num  -0.538 -0.475 -0.502 -0.485 -0.472 ...
 $ NES            : num  -2 -1.79 -1.88 -1.82 -1.78 ...
 $ pvalue         : num  2.01e-24 5.57e-23 3.34e-22 7.31e-22 8.73e-22 ...
 $ p.adjust       : num  1.28e-20 1.78e-19 7.10e-19 1.11e-18 1.11e-18 ...
 $ qvalue         : num  9.08e-21 1.26e-19 5.03e-19 7.90e-19 7.90e-19 ...
 $ rank           : num  8691 10625 9127 12397 9312 ...
 $ leading_edge   : chr  "tags=59%, list=26%, signal=44%" "tags=57%, list=32%, signal=40%" "tags=57%, list=27%, signal=42%" "tags=63%, list=37%, signal=40%" ...
 $ core_enrichment: chr  "SFR1/NUCKS1/MBTD1/OTUB1/ZBTB7A/YEATS4/CDCA5/RPA1/PARP2/DPF2/HELB/RPA3/PNKP/ATRIP/RIF1/AP5Z1/RAD50/WRAP53/CDC45/"| __truncated__ "SRSF8/RNPC3/SF3B5/SNRNP25/TSEN34/ZRANB2/SF3B6/APOBEC4/HNRNPR/BRDT/HNRNPK/TTF2/RBM14/HABP4/AKAP17A/KHDRBS1/PTBP1"| __truncated__ "ZBED1/SIRT5/YEATS4/FAM86B1/KAT6A/RELA/PPARGC1A/BCL11A/H1-3/KANSL1/CREBBP/PAGR1/RIF1/CBX4/ASH1L/SENP5/UBE2I/METT"| __truncated__ "SMC4/EML4/GOLGA8G/WAPL/SPAG5/ABRAXAS2/NCAPH/PPP2R1A/MACROH2A1/NTMT1/ACTR2/BCL7A/CDC42/UBE2C/PINX1/ARHGEF10/DPF3"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> ## apply 'pairwise_termsim' before making plots
> ## calculate the similarity matrix for all gene sets, and not the top 200!
> ## note that this takes some time
> gse <- pairwise_termsim(gse, showCategory = dim(gse)[1] )
> 
> 
> ## make dotplot
> g2 <- enrichplot::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
> print(g2)
> 
> ## extract the gene sets that are in the dotplot
> terms = as.character(g2$data$Description)
> terms
 [1] "linoleic acid metabolic process"                                  
 [2] "cysteine metabolic process"                                       
 [3] "response to food"                                                 
 [4] "ovulation"                                                        
 [5] "acetyl-CoA biosynthetic process from pyruvate"                    
 [6] "miRNA-mediated gene silencing by mRNA destabilization"            
 [7] "sequestering of triglyceride"                                     
 [8] "positive regulation of fatty acid metabolic process"              
 [9] "lipoxygenase pathway"                                             
[10] "regulation of amino acid import across plasma membrane"           
[11] "double-strand break repair"                                       
[12] "mRNA processing"                                                  
[13] "peptidyl-lysine modification"                                     
[14] "chromosome segregation"                                           
[15] "histone modification"                                             
[16] "regulation of response to DNA damage stimulus"                    
[17] "proteasome-mediated ubiquitin-dependent protein catabolic process"
[18] "vesicle organization"                                             
[19] "DNA recombination"                                                
[20] "protein polyubiquitination"                                       
> 
> 
> ## Make an emapplot showing only the selected gene sets
> g3 <- enrichplot::emapplot(gse, showCategory = terms)
> print(g3)
> 
> 
> 
>

dotplot g2:

emapplot g3:

from clusterprofiler.

Leandromvelez commented on July 17, 2024

That worked perfectly!
Thank you very much for detailed explanation!
And sorry for not posting the lines in the correct format for code.

from clusterprofiler.

Emmaplot with the top activated and supressed from dotplot? about clusterprofiler HOT 4 OPEN

Comments (4)

checked the order of terms for terms and matched_indices

Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :

subscript out of bounds

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent