Git Product home page Git Product logo

Comments (2)

guidohooiveld avatar guidohooiveld commented on July 17, 2024 1

Please note that because of the lack of formatting your post is difficult to read. Please put your code in code sections; after selecting all code in your post use this button <> to reformat (and check the preview). Moreover, your universe is not available for download, and it is also unclear what exactly your input is....

Yet, based on the last part of your post, in which you perform an ORA using the function enrichGO, i believe your problem can be solved by using the universal enrichment function enricher together with the arguments TERM2GENE and TERM2NAME. This allows you to use your input data, because the more specific function enrichGO (and enrichKEGG) work only with NCBI-based org.xx.eg.db annotation packages.

This post of mine may be helpful: #588 (comment)

from clusterprofiler.

Guerande29 avatar Guerande29 commented on July 17, 2024

Thank you @guidohooiveld for your reply.
My apologies for entering the code incorrectly. You are right, the universal enricher solved the problem.
I used the post 588# and it works very well.
I will leave my corrected code here for whoever needs it.
Thank you so much :)

library(lattice)`
library(BiocFileCache)
library(biomaRt)
library(dplyr)
library(ggplot2)
library(RSQLite)
library(devtools)
library(dbplyr)

## *** PART 1: check available plant marts
listMarts( host="https://plants.ensembl.org" )

## connect to the mart database
EPgenes = useEnsembl(biomart="plants_mart",  host="https://plants.ensembl.org")

## find names of available plant data sets
dsets = listDatasets(EPgenes)
head(dsets)

## in this case, find the brassica oleracea one
dsets[grep("Brassica oleracea", dsets$description),]

## take a note of the dataset name 'boleracea_eg_gene'
## *** PART 2: check available filters and attributes
EPgenes <- biomaRt::useMart(biomart = "plants_mart",  
                            dataset = "boleracea_eg_gene", 
                            host = "https://plants.ensembl.org")

head( listFilters(EPgenes) )
head( listAttributes(EPgenes) )

go.all = getBM(attributes=c("ensembl_gene_id", "go_id", "name_1006", "namespace_1003"), mart=EPgenes) 
head(go.all)
dim(go.all)
#[1] 194991      4

## To get a feeling about the GO annotations:
## check how many of the 194991 entries have a GO annotation
sum(go.all$go_id != "")
#[1] 174260

## check how many unique genes are represented in the 194991 entries?
sum(!duplicated( go.all$ensembl_gene_id) )
#[1] 60586

## remove the genes that don't have a GO annotation.
go.all <- go.all[go.all$go_id != "", ]
dim(go.all)
#[1] 174260      4

## check how many unique genes are represented in the 174260 entries?
sum(!duplicated( go.all$ensembl_gene_id) )
#[1] 39855

### **** PART 4: perform GO ORA analysis GO overrepresentation analysis (ORA) using my input
# I should use the universal enrichment function enricher together with the arguments TERM2GENE and TERM2NAME
# because the more specific function enrichGO (and enrichKEGG) work only with NCBI-based org.xx.eg.db annotation packages.
# note the use of arguments TERM2GENE and TERM2NAME. Their column
# order is important!

#setwd
library(clusterProfiler)    
library(forcats)            
library(enrichplot)         
library(pathview)           
library(data.table)         
library(ggplot2)            
library(GOsummaries)        
library(DOSE)     
library(tidyverse)

# for universe you will use this line:
# universe=go.all$ensembl_gene_id,  
# OR or make your own universe, according to your interest

#### uploading universe file
universe = read.delim("universe_Bol.txt", header = T)
universe<-as.character(universe[,1])
universe <- sort(universe, decreasing = TRUE)
head(universe)

#### uploading File with genes to analyze
gene <- read.delim("GO.txt", header = T)
head(gene)

ORA <- compareCluster(
  geneClusters=gene,
  enricher,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe= universe,
  minGSSize = 10,
  maxGSSize = 500,
  qvalueCutoff = 0.05,
  TERM2GENE = go.all[go.all$namespace_1003 == "biological_process",  c("go_id","ensembl_gene_id")],
  TERM2NAME = go.all[go.all$namespace_1003 == "biological_process",  c("go_id","name_1006")] )

## check
as.data.frame(ORA)[1:15,]

imagen

from clusterprofiler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.