Comments (20)
I notice you are using subset input for genelogfc
, TERM2NAME
and TERM2GENE
.
Especially regarding the last 2: are you sure that the inputs for TERM2NAME
and TERM2GENE
are a data.frame
with only 2 columns?
Thus: what is the output from class(Pathways[['KEGG']][,c('TermID','TermName')])
and head(Pathways[['KEGG']][,c('TermID','TermName')])
. Idem for TERM2GENE
input and genelogfc
.
from clusterprofiler.
Yes, I run a "subset" command for "genelogFC" to select the genes only in KEGG database.
the inputs for TERM2NAME and TERM2GENE are data.frame.
the data class and head of each inputs are as follow:
40 hours ago, but the GSEA command still no response, and no errors.
from clusterprofiler.
Mmm, such long runtime is indeed not expected at all!
How many gene sets are you analyzing?
Also, if you would like me to double-check, feel free to upload your input files.
from clusterprofiler.
Thank you very much!
I upload a zip file of a R data, which contains two objects, genelogFC and KEGGdb (KEGGdb <- Pathways[['KEGG']]).
GSEA.zip
from clusterprofiler.
For me the analysis takes just few seconds....
Note the warnings on ties and duplicate entries. The warning on ties means that in genelogFC
you have entries that have the same value of the ranking metric, and the duplicate entries means that at least one gene symbol is present multiple times.
Also note that I set the significance cutoff at 1 (to make sure any result is obtained).
> library(clusterProfiler)
> load("GSEA.Rdata")
>
> ## check input
> head(genelogFC)
DKK1 GPD2 CYB561A3 ACVR1B C15orf48 FRRS1L
15.38154 15.21665 14.98970 14.95542 14.72364 14.42049
>
> head(KEGGdb)
Genesnames TermNAME TermID dbType
1 HK2 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
2 HK3 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
3 HK1 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
4 HKDC1 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
5 GCK KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
6 GPI KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010 KEGG
TermName curl
1 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
2 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
3 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
4 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
5 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
6 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
>
>
> ## run universal GSEA function, but do NOT apply sign. cutoff!
> ## this is to show it works.
> res <- GSEA(geneList = genelogFC,
+ minGSSize = 10,
+ maxGSSize = 500,
+ eps = 0,
+ pvalueCutoff = 1,
+ pAdjustMethod = "BH",
+ TERM2GENE = KEGGdb[, c("TermID","Genesnames")],
+ TERM2NAME = KEGGdb[, c("TermID","TermName")]
+ )
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (0.11% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are duplicate gene names, fgsea may produce unexpected results.
>
>
> res
#
# Gene Set Enrichment Analysis
#
#...@organism UNKNOWN
#...@setType UNKNOWN
#...@geneList Named num [1:10772] 15.4 15.2 15 15 14.7 ...
- attr(*, "names")= chr [1:10772] "DKK1" "GPD2" "CYB561A3" "ACVR1B" ...
#...nPerm
#...pvalues adjusted by 'BH' with cutoff <1
#...293 enriched terms found
'data.frame': 293 obs. of 11 variables:
$ ID : chr "chx01523" "chx04927" "chx03320" "chx00670" ...
$ Description : chr "Antifolate resistance" "Cortisol synthesis and secretion" "PPAR signaling pathway" "One carbon pool by folate" ...
$ setSize : int 19 33 46 12 25 18 65 37 52 83 ...
$ enrichmentScore: num 0.722 0.646 0.617 0.765 0.658 ...
$ NES : num 1.67 1.64 1.64 1.61 1.6 ...
$ pvalue : num 0.00215 0.00304 0.00155 0.0083 0.007 ...
$ p.adjust : num 0.297 0.297 0.297 0.37 0.37 ...
$ qvalue : num 0.293 0.293 0.293 0.364 0.364 ...
$ rank : num 1515 842 1773 1515 1516 ...
$ leading_edge : chr "tags=37%, list=14%, signal=32%" "tags=21%, list=8%, signal=20%" "tags=43%, list=16%, signal=36%" "tags=50%, list=14%, signal=43%" ...
$ core_enrichment: chr "FOLR2/GART/ABCG2/NFKB1/TYMS/ATIC/MTHFR" "STAR/PDE8A/CREB3L2/CACNA1G/NR4A1/CREB5/CACNA1D" "SCD5/CPT1C/SLC27A2/ME3/SORBS1/PDPK1/FABP3/NR1H3/HMGCS1/ANGPTL4/ACSL1/SLC27A1/ACSL3/PLIN2/ACSL5/ACSL6/UBC/ACOX1/SLC27A6/APOA2" "GART/MTFMT/TYMS/ATIC/ALDH1L2/MTHFR" ...
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> as.data.frame(res)[1:5,]
ID Description setSize enrichmentScore
chx01523 chx01523 Antifolate resistance 19 0.7218598
chx04927 chx04927 Cortisol synthesis and secretion 33 0.6458605
chx03320 chx03320 PPAR signaling pathway 46 0.6165491
chx00670 chx00670 One carbon pool by folate 12 0.7653581
chx04610 chx04610 Complement and coagulation cascades 25 0.6583045
NES pvalue p.adjust qvalue rank
chx01523 1.673002 0.002152638 0.2973276 0.2926810 1515
chx04927 1.644100 0.003044310 0.2973276 0.2926810 842
chx03320 1.642999 0.001550212 0.2973276 0.2926810 1773
chx00670 1.612934 0.008299682 0.3702470 0.3644608 1515
chx04610 1.600897 0.007001891 0.3702470 0.3644608 1516
leading_edge
chx01523 tags=37%, list=14%, signal=32%
chx04927 tags=21%, list=8%, signal=20%
chx03320 tags=43%, list=16%, signal=36%
chx00670 tags=50%, list=14%, signal=43%
chx04610 tags=36%, list=14%, signal=31%
core_enrichment
chx01523 FOLR2/GART/ABCG2/NFKB1/TYMS/ATIC/MTHFR
chx04927 STAR/PDE8A/CREB3L2/CACNA1G/NR4A1/CREB5/CACNA1D
chx03320 SCD5/CPT1C/SLC27A2/ME3/SORBS1/PDPK1/FABP3/NR1H3/HMGCS1/ANGPTL4/ACSL1/SLC27A1/ACSL3/PLIN2/ACSL5/ACSL6/UBC/ACOX1/SLC27A6/APOA2
chx00670 GART/MTFMT/TYMS/ATIC/ALDH1L2/MTHFR
chx04610 ITGB2/SERPINC1/F13A1/PLAUR/SERPINE1/F2/THBD/C1QB/C5AR1
>
>
>
> packageVersion("clusterProfiler")
[1] ‘4.10.0’
>
from clusterprofiler.
I reinstall the R (version 4.3.2) and clusterProfiler(4.10.0). But the issue is still exist!
I can't find out where is the wrong!
from clusterprofiler.
To confirm: you copied/pasted my code above, and takes more than few seconds?
Also, how long does the code take below that uses the included example data? Again, for me this is just a few seconds...
> library(clusterProfiler)
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
>
> kk <- enrichKEGG(gene = gene,
+ organism = 'hsa',
+ pvalueCutoff = 0.05)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> head(kk)
category
hsa04110 Cellular Processes
hsa04114 Cellular Processes
hsa04218 Cellular Processes
hsa04061 Environmental Information Processing
hsa03320 Organismal Systems
hsa04814 Cellular Processes
subcategory ID
hsa04110 Cell growth and death hsa04110
hsa04114 Cell growth and death hsa04114
hsa04218 Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320 Endocrine system hsa03320
hsa04814 Cell motility hsa04814
Description
hsa04110 Cell cycle
hsa04114 Oocyte meiosis
hsa04218 Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320 PPAR signaling pathway
hsa04814 Motor proteins
GeneRatio BgRatio pvalue p.adjust qvalue
hsa04110 15/106 157/8659 6.013440e-10 1.280863e-07 1.265987e-07
hsa04114 10/106 131/8659 4.141010e-06 4.410176e-04 4.358958e-04
hsa04218 10/106 156/8659 1.951667e-05 1.385683e-03 1.369591e-03
hsa04061 8/106 100/8659 2.834762e-05 1.441085e-03 1.424349e-03
hsa03320 7/106 75/8659 3.382828e-05 1.441085e-03 1.424349e-03
hsa04814 10/106 193/8659 1.192644e-04 4.233885e-03 4.184714e-03
geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114 991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218 2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320 4312/9415/9370/5105/2167/3158/5346
hsa04814 9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
Count
hsa04110 15
hsa04114 10
hsa04218 10
hsa04061 8
hsa03320 7
hsa04814 10
>
> system.time({
+ kk <- enrichKEGG(gene = gene,
+ organism = 'hsa',
+ pvalueCutoff = 0.05)
+ })
user system elapsed
0.24 0.00 0.24
>
>
elapsed is the time that is relevant
from clusterprofiler.
sorry, that was for the overrepresentation (ORA) function, not GSEA....
For KEGG-based GSEA it takes my computer ~14 seconds.
> library(clusterProfiler)
> data(geneList, package="DOSE")
> kkk <- gseKEGG(geneList = geneList,
+ eps = 0,
+ organism = 'hsa',
+ pvalueCutoff = 0.05)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
> head(kkk)
ID Description setSize enrichmentScore NES
hsa04110 hsa04110 Cell cycle 139 0.6637551 2.825343
hsa03050 hsa03050 Proteasome 43 0.7094784 2.448400
hsa03030 hsa03030 DNA replication 33 0.7227680 2.371262
hsa04657 hsa04657 IL-17 signaling pathway 85 0.5622094 2.197537
hsa05169 hsa05169 Epstein-Barr virus infection 193 0.4335010 1.926386
hsa01230 hsa01230 Biosynthesis of amino acids 62 0.5890512 2.158711
pvalue p.adjust qvalue rank
hsa04110 2.693644e-20 9.131453e-18 6.748287e-18 1155
hsa03050 8.494620e-09 1.439838e-06 1.064063e-06 2516
hsa03030 1.182146e-07 1.127374e-05 8.331474e-06 1905
hsa04657 1.662794e-07 1.127374e-05 8.331474e-06 2880
hsa05169 1.623063e-07 1.127374e-05 8.331474e-06 2820
hsa01230 1.697239e-06 9.589398e-05 7.086715e-05 2918
leading_edge
hsa04110 tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa03030 tags=64%, list=15%, signal=54%
hsa04657 tags=49%, list=23%, signal=38%
hsa05169 tags=39%, list=23%, signal=31%
hsa01230 tags=55%, list=23%, signal=42%
core_enrichment
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050 5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa03030 4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424/5983/5425/54107/6119
hsa04657 4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa01230 29968/26227/875/445/5214/440/65263/6472/7086/5723/3418/7167/586/2597/5230/2023/5223/5831/6888/50/5315/3419/10993/2805/22934/3421/5634/3417/5232/221823/2027/5211/384/5832
>
>
> system.time({
+ kkk <- gseKEGG(geneList = geneList,
+ eps = 0,
+ organism = 'hsa',
+ pvalueCutoff = 0.05)
+ })
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
user system elapsed
1.28 0.11 14.06
>
>
from clusterprofiler.
Yes, I run GSEA function, no result or any error output for a long time (more than 1 hour).
I restart my computer, reinstall R(v4.3.2) and clusterProfiler(v4.10.0).
then run the follow code:
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
kkk <- gseKEGG(geneList = geneList,eps = 0,organism = 'hsa', pvalueCutoff = 0.05)
no results or any errors output for more than 5 minutes.
from clusterprofiler.
Well, there is not much anymore I can come up with... other than that the issue seems to be specific to your system. Sorry.
Some last suggestions:
Also reinstall fgsea
(because clusterProfiler
uses this under the hood for execution of GSEA).
Check nothing is reported when running BiocManager::valid()
.
from clusterprofiler.
Thanks very much for your patience.
I reinstall fgsea, but the problem still exist!
I need to spend some more time on this issue, before deciding to reinstall my computer system!
from clusterprofiler.
Maybe you could try this on another computer, e.g. from a colleague? Anyway, please report back.
from clusterprofiler.
I tried GSEA in another computer, it taken my computer few seconds and then got the results.
I restalled my computer system, R, and clusterProfiler package. But, the issue is still exist!
what's wrong with my computer? I really don't know!
from clusterprofiler.
Your question is strange and I can't figure out the reason. Please provide your sessionInfo() so I can see if there are any conflicts.
from clusterprofiler.
@huerqiang @guidohooiveld
Thanks for your attentions!
I found the problem of this issue!
The “BiocParallel” packages was not installed successfully, when i reinstalled 'BiocParallel' successfully, the GSEA function was done soon!
thanks very much!
from clusterprofiler.
Happy to hear you found the culprit!
Also thanks for reporting back since others may find this useful information when experiencing the same behavior.
from clusterprofiler.
Related Issues (20)
- compareCluster does not give a warning/message that numeric universe is ignored for fun = "enrichGO"
- clusterProfiler custom annotated list HOT 2
- When there's only one term ID in the dataframe passed as TERM2GENE in enricher
- Emmaplot with the top activated and supressed from dotplot? HOT 4
- Cannot library(clusterProfiler) HOT 2
- Unable to load clusterProfiler (4.11.0.00) - object ‘ls2df’ not exported by 'namespace:yulab.utils'`
- Dotplot() cannot change color HOT 2
- Not able to install clusterProfiler on Colab HOT 3
- Trouble with gseKEGG on mac sonoma 14.2 HOT 1
- How to color code cnetplot nodes from enrichrResult data HOT 4
- [feature request] Support metabolite enrichment analysis HOT 2
- problem in run compareCluster HOT 4
- Entire species designation is included in KEGG Pathway name HOT 4
- How to generate gseaplots from the result of compareCluster gseaGO or gseaKEGG functions? HOT 2
- change in name of wikipathway file and enrichWP function do not work anymore HOT 2
- compareCluster's enrichGO difference in results when stating ont = "ALL" vs NULL HOT 1
- heatplot function; color scale is switched HOT 1
- How to split the results of compareCluster? HOT 1
- How to combine different compareCluster results? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clusterprofiler.