Git Product home page Git Product logo

Comments (4)

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Hi, a couple of things:

First of all, why do you generate a GSON object with all mouse pathways?

Related to this, please check the help pages on how to call the enrichKEGG function, because you made some mistakes. Note that the argument organism should be the KEGG abbreviation of the organism you are analyzing; in your case thus mmu (and it should NOT be the GSON object!)

The argument gene should be a (character) vector of entrezids.

It is also recommended to leave the argument use_internal_data at its default setting FALSE (so up-to-date information is being downloaded from the KEGG website).

Thus the code below, in which the 7 ids are used that you listed, will do what you intended to do!

> library(clusterProfiler)
> 
> id_transform <- c("240427","12705","241770","102633301","319757","116903","72309")
> class(id_transform)
[1] "character"
> 
> KEGG_enrich = enrichKEGG(gene = id_transform,
+                          organism="mmu",
+                          use_internal_data = FALSE
+  )
> 
> 
> KEGG_enrich
#
# over-representation test
#
#...@organism    mmu 
#...@ontology    KEGG 
#...@keytype     kegg 
#...@gene        chr [1:7] "240427" "12705" "241770" "102633301" "319757" "116903" "72309"
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...5 enriched terms found
'data.frame':   5 obs. of  11 variables:
 $ category   : chr  "Environmental Information Processing" "Human Diseases" "Organismal Systems" "Organismal Systems" ...
 $ subcategory: chr  "Signal transduction" "Cancer: specific types" "Circulatory system" "Development and regeneration" ...
 $ ID         : chr  "mmu04340" "mmu05217" "mmu04270" "mmu04360" ...
 $ Description: chr  "Hedgehog signaling pathway - Mus musculus (house mouse)" "Basal cell carcinoma - Mus musculus (house mouse)" "Vascular smooth muscle contraction - Mus musculus (house mouse)" "Axon guidance - Mus musculus (house mouse)" ...
 $ GeneRatio  : chr  "1/2" "1/2" "1/2" "1/2" ...
 $ BgRatio    : chr  "58/9710" "63/9710" "144/9710" "181/9710" ...
 $ pvalue     : num  0.0119 0.0129 0.0294 0.0369 0.0416
 $ p.adjust   : num  0.0388 0.0388 0.0499 0.0499 0.0499
 $ qvalue     : num  0.00681 0.00681 0.00875 0.00875 0.00875
 $ geneID     : chr  "319757" "319757" "116903" "319757" ...
 $ Count      : int  1 1 1 1 1
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> as.data.frame(KEGG_enrich)[1:3,]
                                     category            subcategory       ID
mmu04340 Environmental Information Processing    Signal transduction mmu04340
mmu05217                       Human Diseases Cancer: specific types mmu05217
mmu04270                   Organismal Systems     Circulatory system mmu04270
                                                             Description
mmu04340         Hedgehog signaling pathway - Mus musculus (house mouse)
mmu05217               Basal cell carcinoma - Mus musculus (house mouse)
mmu04270 Vascular smooth muscle contraction - Mus musculus (house mouse)
         GeneRatio  BgRatio     pvalue   p.adjust      qvalue geneID Count
mmu04340       1/2  58/9710 0.01191138 0.03880464 0.006807832 319757     1
mmu05217       1/2  63/9710 0.01293488 0.03880464 0.006807832 319757     1
mmu04270       1/2 144/9710 0.02944172 0.04989512 0.008753530 116903     1
> 





from clusterprofiler.

Shuixin-Li avatar Shuixin-Li commented on August 16, 2024

Thank you for your detailed reply. Sorry for not using English before.
but do you know why enrichKEGG does not support gson object, I am confused bacause I saw the code below. @guidohooiveld

if (inherits(organism, "character")) {
species <- organismMapper(organism)
if (use_internal_data) {
KEGG_DATA <- get_data_from_KEGG_db(species)
} else {
KEGG_DATA <- prepare_KEGG(species, "KEGG", keyType)
}
} else if (inherits(organism, "GSON")) {
KEGG_DATA <- organism
species <- KEGG_DATA@species
keyType <- KEGG_DATA@keytype
} else {
stop("organism should be a species name or a GSON object")
}

I added gson_file@keytype <- 'ENTREZID' before running enrichKEGG(), and the error disappeared. But I am not sure whether the results are correct by doing this.


In terms of input data, sorry for showing the wrong data, I showed id_transform before, but I actually used id_transform[,1], which is exactly the character vector. Thank you for pointing out.

> head(id_transform)
        id_transform
SP140         434484
SPATA32       328019
SAMD15        238333
FER1L6        631797
RERGL         632971
PHEX           18675
> head(id_transform[,1])
[1] "434484" "328019" "238333" "631797" "632971" "18675" 

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

Sorry for my delayed reply!

Thanks for highlighting the relevant section in the source code from enrichKEGG. I now got what you tried to achieve, and agree with you that the GSON-object kk is somehow missing the keytype slot.

Indeed, when manually adding it (like you did) enrichKEGG works as expected. See code below.

> ## load library
> library(clusterProfiler)
> 
> ## some ids
> id_transform <- c("240427","12705","241770","102633301","319757","116903","72309")
> 
> ## generate GSON-object with pathway information
> kk <- gson_KEGG('mmu')
> 
> ## use GSON as input: FAILS!
> KEGG_enrich = enrichKEGG(gene = id_transform,
+                          organism=kk,
+                          use_internal_data = FALSE)
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “NULL” is not valid for @‘keytype’ in an object of class “enrichResult”; is(value, "character") is not TRUE
> 
> 
> ## check GSON-object
> kk
>> Gene Set: KEGG
>> 9710 genes annotated by 355 gene sets.
>> Species: mmu
>> Version: Release 110.0+/04-27, Apr 24
> 
> ## note that slot keytype is NULL!
> str(kk)
Formal class 'GSON' [package "gson"] with 9 slots
  ..@ gsid2gene    :'data.frame':       38640 obs. of  2 variables:
  .. ..$ gsid: chr [1:38640] "mmu00010" "mmu00010" "mmu00010" "mmu00010" ...
  .. ..$ gene: chr [1:38640] "103988" "106557" "110695" "11522" ...
  ..@ gsid2name    :'data.frame':       355 obs. of  2 variables:
  .. ..$ gsid: chr [1:355] "mmu01100" "mmu01200" "mmu01210" "mmu01212" ...
  .. ..$ name: chr [1:355] "Metabolic pathways - Mus musculus (house mouse)" "Carbon metabolism - Mus musculus (house mouse)" "2-Oxocarboxylic acid metabolism - Mus musculus (house mouse)" "Fatty acid metabolism - Mus musculus (house mouse)" ...
  ..@ gene2name    : NULL
  ..@ species      : chr "mmu"
  ..@ gsname       : chr "KEGG"
  ..@ version      : chr "Release 110.0+/04-27, Apr 24"
  ..@ accessed_date: chr "2024-04-30"
  ..@ keytype      : NULL
  ..@ info         : NULL
> 
> ## Fix, and check
> kk@keytype="kegg"
> 
> str(kk)
Formal class 'GSON' [package "gson"] with 9 slots
  ..@ gsid2gene    :'data.frame':       38640 obs. of  2 variables:
  .. ..$ gsid: chr [1:38640] "mmu00010" "mmu00010" "mmu00010" "mmu00010" ...
  .. ..$ gene: chr [1:38640] "103988" "106557" "110695" "11522" ...
  ..@ gsid2name    :'data.frame':       355 obs. of  2 variables:
  .. ..$ gsid: chr [1:355] "mmu01100" "mmu01200" "mmu01210" "mmu01212" ...
  .. ..$ name: chr [1:355] "Metabolic pathways - Mus musculus (house mouse)" "Carbon metabolism - Mus musculus (house mouse)" "2-Oxocarboxylic acid metabolism - Mus musculus (house mouse)" "Fatty acid metabolism - Mus musculus (house mouse)" ...
  ..@ gene2name    : NULL
  ..@ species      : chr "mmu"
  ..@ gsname       : chr "KEGG"
  ..@ version      : chr "Release 110.0+/04-27, Apr 24"
  ..@ accessed_date: chr "2024-04-30"
  ..@ keytype      : chr "kegg"
  ..@ info         : NULL
> 
> 
> ## enrichKEGG now works!
> KEGG_enrich = enrichKEGG(gene = id_transform,
+                          organism=kk,
+                          use_internal_data = FALSE)
> 
> KEGG_enrich
#
# over-representation test
#
#...@organism    mmu 
#...@ontology    KEGG 
#...@keytype     kegg 
#...@gene        chr [1:7] "240427" "12705" "241770" "102633301" "319757" "116903" "72309"
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...5 enriched terms found
'data.frame':   5 obs. of  11 variables:
 $ category   : chr  "Environmental Information Processing" "Human Diseases" "Organismal Systems" "Organismal Systems" ...
 $ subcategory: chr  "Signal transduction" "Cancer: specific types" "Circulatory system" "Development and regeneration" ...
 $ ID         : chr  "mmu04340" "mmu05217" "mmu04270" "mmu04360" ...
 $ Description: chr  "Hedgehog signaling pathway - Mus musculus (house mouse)" "Basal cell carcinoma - Mus musculus (house mouse)" "Vascular smooth muscle contraction - Mus musculus (house mouse)" "Axon guidance - Mus musculus (house mouse)" ...
 $ GeneRatio  : chr  "1/2" "1/2" "1/2" "1/2" ...
 $ BgRatio    : chr  "58/9710" "63/9710" "144/9710" "181/9710" ...
 $ pvalue     : num  0.0119 0.0129 0.0294 0.0369 0.0416
 $ p.adjust   : num  0.0388 0.0388 0.0499 0.0499 0.0499
 $ qvalue     : num  0.00681 0.00681 0.00875 0.00875 0.00875
 $ geneID     : chr  "319757" "319757" "116903" "319757" ...
 $ Count      : int  1 1 1 1 1
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 

from clusterprofiler.

guidohooiveld avatar guidohooiveld commented on August 16, 2024

As you will see above I opened an issue on the GitHub of the gson package.
YuLab-SMU/gson#9

from clusterprofiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.