Comments (4)
If you would like to show in the emapplot
the same gene sets as in the dotplot
, one option is that you:
- explicitly assign the output (=dotplot) to an object, 2) extract from this object the gene sets that are plotted, and 3) only show these in the
emapplot
.
BTW: note that the functiondotplot
is part of theenrichplot
package, and notclusterProfiler
.
> library(DOSE)
> library(clusterProfiler)
> library(enrichplot)
> data(geneList)
>
> edo2 <- gseDO(geneList, eps=0)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
> ## 1) assign dotplot to an object
> ## note that NO graph is plotted yet!
> p <- enrichplot::dotplot(edo2, showCategory=10, split=".sign") + facet_grid(.~.sign)
>
> ## plot the graph (dotplot)
> print(p)
>
> ## 2) Extract the gene sets. These are the 20 sets that are in the dotplot (10 activated, 10 supressed).
> geneSets2plot <- p$data$Description
>
> ## check
> geneSets2plot
[1] tuberculosis
[2] primary immunodeficiency disease
[3] primary bacterial infectious disease
[4] bacterial infectious disease
[5] pulmonary tuberculosis
[6] combined immunodeficiency
[7] human immunodeficiency virus infectious disease
[8] embryonal cancer
[9] malaria
[10] embryoma
[11] abdominal obesity-metabolic syndrome 1
[12] abdominal obesity-metabolic syndrome
[13] essential hypertension
[14] bone structure disease
[15] impotence
[16] sexual dysfunction
[17] uterine fibroid
[18] polycystic ovary syndrome
[19] degeneration of macula and posterior pole
[20] ovarian dysfunction
20 Levels: embryoma ... bone structure disease
>
> ## 3) make a emapplot showing only the selected gene sets
> edo <- pairwise_termsim(edo2)
> p2 <- enrichplot::emapplot(edo, showCategory=geneSets2plot)
>
> ## plot the graph (emapplot)
> print(p2)
>
>
from clusterprofiler.
Thank you very much for your response, and sorry for not explaining this issue in more detail:
The problem is the error "Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :
subscript out of bounds" and only occurred with some of the genelists I'm working on. For some gse results, the solution you posted, works, but not for some others. It seems there is a problem with the order of the terms, but I reordered the terms to match the object used by emapplot, but I'm still getting the error. I'm uploading one of the gse objects giving me the error (hope this links works, an environment with the gse object https://drive.google.com/file/d/172HmlPfgtueBLBqCcZUcOdSrxZecNUxB/view?usp=sharing )
, and the scripts I've been using:
library(clusterProfiler)
library(enrichplot)
g2 = clusterProfiler::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
print (g2)
terms = as.character(g2$data$Description)
terms
p1 <- pairwise_termsim(gse)
matched_indices = p1@result$Description[p1@result$Description %in% terms]
checked the order of terms for terms and matched_indices
enrichplot::emapplot(p1 , showCategory= matched_indices)
Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :
subscript out of bounds
Thanks in advance!
from clusterprofiler.
When posting, please format your code! Select that text, and use the <>
button to format as code.
I could reproduce your problem with the provided gse
object. Combined with a traceback()
, it becomes clear it has to do with the similarity matrix.
> enrichplot::emapplot(p1 , showCategory= terms)
Error in pair_sim[as.character(enrichDf$Description), as.character(enrichDf$Description)] :
subscript out of bounds
>
>
> traceback()
6: build_emap_graph(enrichDf = y, geneSets = geneSets, color = color,
cex_line = cex_line, min_edge = min_edge, pair_sim = x@termsim,
method = x@method)
5: get_igraph(x = x, nCategory = n, color = color, cex_line = cex_line,
min_edge = min_edge)
4: emapplot.enrichResult(x, showCategory = showCategory, ...)
3: .local(x, ...)
2: enrichplot::emapplot(p1, showCategory = terms)
1: enrichplot::emapplot(p1, showCategory = terms)
Note that the total number of significant gene sets in your object gse
is 1338:
> dim(gse)
[1] 1338 11
>
.... whereas the dimensions of the similarity matrix is (only) 200 x 200:
> dim(p1@termsim)
[1] 200 200
>
This is because for the function pairwise_termsim()
the default setting is showCategory = 200
. See ?pairwise_termsim
; showCategory - number of enriched terms to display, default value is 200.
Combine this with fact that within gse
, by default, the gene sets are ranked on significance, but in the dotplot
the gene sets are ranked on GeneRatio
.
It thus turns out that some of gene sets that have a large GeneRatio
, are not in the top 200 of significant gene sets. As a consequence of this, sub-setting of p1
goes wrong when a set of gene sets based on GeneRatio
is provided.... because the required similarity matrix is missing for these gene sets!
This can be simply solved by calculating the similarity matrix for all gene sets, and not only the top 200.
Therefore (note that I also modified [the order of] your code slightly):
> library(clusterProfiler)
> library(enrichplot)
> load("gse.RData")
>
> ## check
> gse
#
# Gene Set Enrichment Analysis
#
#...@organism Homo sapiens
#...@setType BP
#...@keytype SYMBOL
#...@geneList Named num [1:33511] 2.57 2.48 2.19 2.19 2.12 ...
- attr(*, "names")= chr [1:33511] "LHFPL4" "FBXO40" "OR1D3P" "DAAM2-AS1" ...
#...nPerm
#...pvalues adjusted by 'BH' with cutoff <0.05
#...1338 enriched terms found
'd
$ ID : chr "GO:0006302" "GO:0006397" "GO:0018205" "GO:0007059" ...
$ Description : chr "double-strand break repair" "mRNA processing" "peptidyl-lysine modification" "chromosome segregation" ...
$ setSize : int 303 484 372 433 468 310 434 364 326 242 ...
$ enrichmentScore: num -0.538 -0.475 -0.502 -0.485 -0.472 ...
$ NES : num -2 -1.79 -1.88 -1.82 -1.78 ...
$ pvalue : num 2.01e-24 5.57e-23 3.34e-22 7.31e-22 8.73e-22 ...
$ p.adjust : num 1.28e-20 1.78e-19 7.10e-19 1.11e-18 1.11e-18 ...
$ qvalue : num 9.08e-21 1.26e-19 5.03e-19 7.90e-19 7.90e-19 ...
$ rank : num 8691 10625 9127 12397 9312 ...
$ leading_edge : chr "tags=59%, list=26%, signal=44%" "tags=57%, list=32%, signal=40%" "tags=57%, list=27%, signal=42%" "tags=63%, list=37%, signal=40%" ...
$ core_enrichment: chr "SFR1/NUCKS1/MBTD1/OTUB1/ZBTB7A/YEATS4/CDCA5/RPA1/PARP2/DPF2/HELB/RPA3/PNKP/ATRIP/RIF1/AP5Z1/RAD50/WRAP53/CDC45/"| __truncated__ "SRSF8/RNPC3/SF3B5/SNRNP25/TSEN34/ZRANB2/SF3B6/APOBEC4/HNRNPR/BRDT/HNRNPK/TTF2/RBM14/HABP4/AKAP17A/KHDRBS1/PTBP1"| __truncated__ "ZBED1/SIRT5/YEATS4/FAM86B1/KAT6A/RELA/PPARGC1A/BCL11A/H1-3/KANSL1/CREBBP/PAGR1/RIF1/CBX4/ASH1L/SENP5/UBE2I/METT"| __truncated__ "SMC4/EML4/GOLGA8G/WAPL/SPAG5/ABRAXAS2/NCAPH/PPP2R1A/MACROH2A1/NTMT1/ACTR2/BCL7A/CDC42/UBE2C/PINX1/ARHGEF10/DPF3"| __truncated__ ...
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> ## apply 'pairwise_termsim' before making plots
> ## calculate the similarity matrix for all gene sets, and not the top 200!
> ## note that this takes some time
> gse <- pairwise_termsim(gse, showCategory = dim(gse)[1] )
>
>
> ## make dotplot
> g2 <- enrichplot::dotplot(gse, showCategory=10, split=".sign") + facet_grid(.~.sign)
> print(g2)
>
> ## extract the gene sets that are in the dotplot
> terms = as.character(g2$data$Description)
> terms
[1] "linoleic acid metabolic process"
[2] "cysteine metabolic process"
[3] "response to food"
[4] "ovulation"
[5] "acetyl-CoA biosynthetic process from pyruvate"
[6] "miRNA-mediated gene silencing by mRNA destabilization"
[7] "sequestering of triglyceride"
[8] "positive regulation of fatty acid metabolic process"
[9] "lipoxygenase pathway"
[10] "regulation of amino acid import across plasma membrane"
[11] "double-strand break repair"
[12] "mRNA processing"
[13] "peptidyl-lysine modification"
[14] "chromosome segregation"
[15] "histone modification"
[16] "regulation of response to DNA damage stimulus"
[17] "proteasome-mediated ubiquitin-dependent protein catabolic process"
[18] "vesicle organization"
[19] "DNA recombination"
[20] "protein polyubiquitination"
>
>
> ## Make an emapplot showing only the selected gene sets
> g3 <- enrichplot::emapplot(gse, showCategory = terms)
> print(g3)
>
>
>
>
dotplot g2:
from clusterprofiler.
That worked perfectly!
Thank you very much for detailed explanation!
And sorry for not posting the lines in the correct format for code.
from clusterprofiler.
Related Issues (20)
- No gene can be mapped HOT 1
- enrichKegg Bug HOT 8
- Change Font Size of Dot Plot's Features HOT 1
- Error in download.file(url, quiet = quiet, method = "libcurl"), libcurl cannot be change HOT 2
- compareCluster Function Support for gsonList Objects in clusterProfile?
- gseKEGG()函数运行时出错 HOT 6
- How to use compareCluster for GSEA on the Msigdb database? HOT 4
- compareCluster does not give a warning/message that numeric universe is ignored for fun = "enrichGO"
- clusterProfiler custom annotated list HOT 2
- When there's only one term ID in the dataframe passed as TERM2GENE in enricher
- GSEA long time no response HOT 20
- Cannot library(clusterProfiler) HOT 2
- Unable to load clusterProfiler (4.11.0.00) - object ‘ls2df’ not exported by 'namespace:yulab.utils'`
- Dotplot() cannot change color HOT 2
- Not able to install clusterProfiler on Colab HOT 3
- Trouble with gseKEGG on mac sonoma 14.2 HOT 1
- How to color code cnetplot nodes from enrichrResult data HOT 4
- [feature request] Support metabolite enrichment analysis HOT 2
- problem in run compareCluster HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clusterprofiler.