Git Product home page Git Product logo

Comments (20)

guidohooiveld avatar guidohooiveld commented on August 16, 2024

I notice you are using subset input for genelogfc, TERM2NAME and TERM2GENE.

Especially regarding the last 2: are you sure that the inputs for TERM2NAME and TERM2GENE are a data.frame with only 2 columns?

Thus: what is the output from class(Pathways[['KEGG']][,c('TermID','TermName')]) and head(Pathways[['KEGG']][,c('TermID','TermName')]). Idem for TERM2GENE input and genelogfc.

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

Yes, I run a "subset" command for "genelogFC" to select the genes only in KEGG database.
the inputs for TERM2NAME and TERM2GENE are data.frame.
the data class and head of each inputs are as follow:
image

40 hours ago, but the GSEA command still no response, and no errors.

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Mmm, such long runtime is indeed not expected at all!

How many gene sets are you analyzing?
Also, if you would like me to double-check, feel free to upload your input files.

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

Thank you very much!
I upload a zip file of a R data, which contains two objects, genelogFC and KEGGdb (KEGGdb <- Pathways[['KEGG']]).
GSEA.zip

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

For me the analysis takes just few seconds....

Note the warnings on ties and duplicate entries. The warning on ties means that in genelogFC you have entries that have the same value of the ranking metric, and the duplicate entries means that at least one gene symbol is present multiple times.

Also note that I set the significance cutoff at 1 (to make sure any result is obtained).

> library(clusterProfiler)
> load("GSEA.Rdata")
> 
> ## check input
> head(genelogFC)
    DKK1     GPD2 CYB561A3   ACVR1B C15orf48   FRRS1L 
15.38154 15.21665 14.98970 14.95542 14.72364 14.42049 
> 
> head(KEGGdb)
  Genesnames                        TermNAME   TermID dbType
1        HK2 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
2        HK3 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
3        HK1 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
4      HKDC1 KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
5        GCK KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
6        GPI KEGG_GLYCOLYSIS_GLUCONEOGENESIS chx00010   KEGG
                      TermName                                 curl
1 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
2 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
3 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
4 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
5 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
6 Glycolysis / Gluconeogenesis https://www.kegg.jp/pathway/chx00010
> 
> 
> ## run universal GSEA function, but do NOT apply sign. cutoff!
> ## this is to show it works.
> res <- GSEA(geneList = genelogFC,
+             minGSSize = 10,
+             maxGSSize = 500,
+             eps = 0,
+             pvalueCutoff = 1,
+             pAdjustMethod = "BH",
+             TERM2GENE = KEGGdb[, c("TermID","Genesnames")],
+             TERM2NAME = KEGGdb[, c("TermID","TermName")]
+             )
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are ties in the preranked stats (0.11% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  There are duplicate gene names, fgsea may produce unexpected results.
> 
> 
> res
#
# Gene Set Enrichment Analysis
#
#...@organism    UNKNOWN 
#...@setType     UNKNOWN 
#...@geneList    Named num [1:10772] 15.4 15.2 15 15 14.7 ...
 - attr(*, "names")= chr [1:10772] "DKK1" "GPD2" "CYB561A3" "ACVR1B" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <1 
#...293 enriched terms found
'data.frame':   293 obs. of  11 variables:
 $ ID             : chr  "chx01523" "chx04927" "chx03320" "chx00670" ...
 $ Description    : chr  "Antifolate resistance" "Cortisol synthesis and secretion" "PPAR signaling pathway" "One carbon pool by folate" ...
 $ setSize        : int  19 33 46 12 25 18 65 37 52 83 ...
 $ enrichmentScore: num  0.722 0.646 0.617 0.765 0.658 ...
 $ NES            : num  1.67 1.64 1.64 1.61 1.6 ...
 $ pvalue         : num  0.00215 0.00304 0.00155 0.0083 0.007 ...
 $ p.adjust       : num  0.297 0.297 0.297 0.37 0.37 ...
 $ qvalue         : num  0.293 0.293 0.293 0.364 0.364 ...
 $ rank           : num  1515 842 1773 1515 1516 ...
 $ leading_edge   : chr  "tags=37%, list=14%, signal=32%" "tags=21%, list=8%, signal=20%" "tags=43%, list=16%, signal=36%" "tags=50%, list=14%, signal=43%" ...
 $ core_enrichment: chr  "FOLR2/GART/ABCG2/NFKB1/TYMS/ATIC/MTHFR" "STAR/PDE8A/CREB3L2/CACNA1G/NR4A1/CREB5/CACNA1D" "SCD5/CPT1C/SLC27A2/ME3/SORBS1/PDPK1/FABP3/NR1H3/HMGCS1/ANGPTL4/ACSL1/SLC27A1/ACSL3/PLIN2/ACSL5/ACSL6/UBC/ACOX1/SLC27A6/APOA2" "GART/MTFMT/TYMS/ATIC/ALDH1L2/MTHFR" ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> as.data.frame(res)[1:5,]
               ID                         Description setSize enrichmentScore
chx01523 chx01523               Antifolate resistance      19       0.7218598
chx04927 chx04927    Cortisol synthesis and secretion      33       0.6458605
chx03320 chx03320              PPAR signaling pathway      46       0.6165491
chx00670 chx00670           One carbon pool by folate      12       0.7653581
chx04610 chx04610 Complement and coagulation cascades      25       0.6583045
              NES      pvalue  p.adjust    qvalue rank
chx01523 1.673002 0.002152638 0.2973276 0.2926810 1515
chx04927 1.644100 0.003044310 0.2973276 0.2926810  842
chx03320 1.642999 0.001550212 0.2973276 0.2926810 1773
chx00670 1.612934 0.008299682 0.3702470 0.3644608 1515
chx04610 1.600897 0.007001891 0.3702470 0.3644608 1516
                           leading_edge
chx01523 tags=37%, list=14%, signal=32%
chx04927  tags=21%, list=8%, signal=20%
chx03320 tags=43%, list=16%, signal=36%
chx00670 tags=50%, list=14%, signal=43%
chx04610 tags=36%, list=14%, signal=31%
                                                                                                                      core_enrichment
chx01523                                                                                       FOLR2/GART/ABCG2/NFKB1/TYMS/ATIC/MTHFR
chx04927                                                                               STAR/PDE8A/CREB3L2/CACNA1G/NR4A1/CREB5/CACNA1D
chx03320 SCD5/CPT1C/SLC27A2/ME3/SORBS1/PDPK1/FABP3/NR1H3/HMGCS1/ANGPTL4/ACSL1/SLC27A1/ACSL3/PLIN2/ACSL5/ACSL6/UBC/ACOX1/SLC27A6/APOA2
chx00670                                                                                           GART/MTFMT/TYMS/ATIC/ALDH1L2/MTHFR
chx04610                                                                       ITGB2/SERPINC1/F13A1/PLAUR/SERPINE1/F2/THBD/C1QB/C5AR1
> 
> 
> 

> packageVersion("clusterProfiler")
[1] ‘4.10.0’
> 

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

I reinstall the R (version 4.3.2) and clusterProfiler(4.10.0). But the issue is still exist!
I can't find out where is the wrong!

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

To confirm: you copied/pasted my code above, and takes more than few seconds?

Also, how long does the code take below that uses the included example data? Again, for me this is just a few seconds...

> library(clusterProfiler)
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> kk <- enrichKEGG(gene         = gene,
+                 organism     = 'hsa',
+                 pvalueCutoff = 0.05)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> head(kk)
                                     category
hsa04110                   Cellular Processes
hsa04114                   Cellular Processes
hsa04218                   Cellular Processes
hsa04061 Environmental Information Processing
hsa03320                   Organismal Systems
hsa04814                   Cellular Processes
                                 subcategory       ID
hsa04110               Cell growth and death hsa04110
hsa04114               Cell growth and death hsa04114
hsa04218               Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320                    Endocrine system hsa03320
hsa04814                       Cell motility hsa04814
                                                           Description
hsa04110                                                    Cell cycle
hsa04114                                                Oocyte meiosis
hsa04218                                           Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320                                        PPAR signaling pathway
hsa04814                                                Motor proteins
         GeneRatio  BgRatio       pvalue     p.adjust       qvalue
hsa04110    15/106 157/8659 6.013440e-10 1.280863e-07 1.265987e-07
hsa04114    10/106 131/8659 4.141010e-06 4.410176e-04 4.358958e-04
hsa04218    10/106 156/8659 1.951667e-05 1.385683e-03 1.369591e-03
hsa04061     8/106 100/8659 2.834762e-05 1.441085e-03 1.424349e-03
hsa03320     7/106  75/8659 3.382828e-05 1.441085e-03 1.424349e-03
hsa04814    10/106 193/8659 1.192644e-04 4.233885e-03 4.184714e-03
                                                                           geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114                          991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218                           2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061                                 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320                                       4312/9415/9370/5105/2167/3158/5346
hsa04814                   9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10
> 
> system.time({
+ kk <- enrichKEGG(gene         = gene,
+                 organism     = 'hsa',
+                 pvalueCutoff = 0.05)
+  })
   user  system elapsed 
   0.24    0.00    0.24 
> 
>

elapsed is the time that is relevant

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

sorry, that was for the overrepresentation (ORA) function, not GSEA....

For KEGG-based GSEA it takes my computer ~14 seconds.


> library(clusterProfiler)
> data(geneList, package="DOSE")
> kkk <- gseKEGG(geneList = geneList,
+               eps = 0,
+               organism     = 'hsa',
+               pvalueCutoff = 0.05)
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> head(kkk)
               ID                  Description setSize enrichmentScore      NES
hsa04110 hsa04110                   Cell cycle     139       0.6637551 2.825343
hsa03050 hsa03050                   Proteasome      43       0.7094784 2.448400
hsa03030 hsa03030              DNA replication      33       0.7227680 2.371262
hsa04657 hsa04657      IL-17 signaling pathway      85       0.5622094 2.197537
hsa05169 hsa05169 Epstein-Barr virus infection     193       0.4335010 1.926386
hsa01230 hsa01230  Biosynthesis of amino acids      62       0.5890512 2.158711
               pvalue     p.adjust       qvalue rank
hsa04110 2.693644e-20 9.131453e-18 6.748287e-18 1155
hsa03050 8.494620e-09 1.439838e-06 1.064063e-06 2516
hsa03030 1.182146e-07 1.127374e-05 8.331474e-06 1905
hsa04657 1.662794e-07 1.127374e-05 8.331474e-06 2880
hsa05169 1.623063e-07 1.127374e-05 8.331474e-06 2820
hsa01230 1.697239e-06 9.589398e-05 7.086715e-05 2918
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa03030 tags=64%, list=15%, signal=54%
hsa04657 tags=49%, list=23%, signal=38%
hsa05169 tags=39%, list=23%, signal=31%
hsa01230 tags=55%, list=23%, signal=42%
                                                                                                                                                                                                                                                                                                                                                                               core_enrichment
hsa04110                                                                                                                                  8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050                                                                                                                                                                                                                                       5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa03030                                                                                                                                                                                                                                                                           4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424/5983/5425/54107/6119
hsa04657                                                                                                                                                                 4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa01230                                                                                                                                                                                                             29968/26227/875/445/5214/440/65263/6472/7086/5723/3418/7167/586/2597/5230/2023/5223/5831/6888/50/5315/3419/10993/2805/22934/3421/5634/3417/5232/221823/2027/5211/384/5832
> 
> 
> system.time({
+ kkk <- gseKEGG(geneList = geneList,
+               eps = 0,
+               organism     = 'hsa',
+               pvalueCutoff = 0.05)
+  })
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
   user  system elapsed 
   1.28    0.11   14.06 
> 
> 

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

Yes, I run GSEA function, no result or any error output for a long time (more than 1 hour).
I restart my computer, reinstall R(v4.3.2) and clusterProfiler(v4.10.0).
then run the follow code:

library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
kkk <- gseKEGG(geneList = geneList,eps = 0,organism     = 'hsa', pvalueCutoff = 0.05)

no results or any errors output for more than 5 minutes.
image

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Well, there is not much anymore I can come up with... other than that the issue seems to be specific to your system. Sorry.

Some last suggestions:
Also reinstall fgsea (because clusterProfiler uses this under the hood for execution of GSEA).
Check nothing is reported when running BiocManager::valid().

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

Thanks very much for your patience.
I reinstall fgsea, but the problem still exist!
I need to spend some more time on this issue, before deciding to reinstall my computer system!

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Maybe you could try this on another computer, e.g. from a colleague? Anyway, please report back.

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

I tried GSEA in another computer, it taken my computer few seconds and then got the results.
I restalled my computer system, R, and clusterProfiler package. But, the issue is still exist!
what's wrong with my computer? I really don't know!

from clusterprofiler.

huerqiang avatar huerqiang commented on August 16, 2024

Your question is strange and I can't figure out the reason. Please provide your sessionInfo() so I can see if there are any conflicts.

from clusterprofiler.

Xingsongli avatar Xingsongli commented on August 16, 2024

@huerqiang @guidohooiveld
Thanks for your attentions!
I found the problem of this issue!
The “BiocParallel” packages was not installed successfully, when i reinstalled 'BiocParallel' successfully, the GSEA function was done soon!
thanks very much!

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Happy to hear you found the culprit!
Also thanks for reporting back since others may find this useful information when experiencing the same behavior.

from clusterprofiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.