Hello YuLab-SMU I am trying to obtain GO for genes from Brassica ole

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

GO enrichGO using input from g:orth and using biomaRt about clusterprofiler HOT 2 CLOSED

Guerande29 commented on July 17, 2024

GO enrichGO using input from g:orth and using biomaRt

from clusterprofiler.

Comments (2)

guidohooiveld commented on July 17, 2024 1

Please note that because of the lack of formatting your post is difficult to read. Please put your code in code sections; after selecting all code in your post use this button <> to reformat (and check the preview). Moreover, your universe is not available for download, and it is also unclear what exactly your input is....

Yet, based on the last part of your post, in which you perform an ORA using the function enrichGO, i believe your problem can be solved by using the universal enrichment function enricher together with the arguments TERM2GENE and TERM2NAME. This allows you to use your input data, because the more specific function enrichGO (and enrichKEGG) work only with NCBI-based org.xx.eg.db annotation packages.

This post of mine may be helpful: #588 (comment)

from clusterprofiler.

Guerande29 commented on July 17, 2024

Thank you @guidohooiveld for your reply.
My apologies for entering the code incorrectly. You are right, the universal enricher solved the problem.
I used the post 588# and it works very well.
I will leave my corrected code here for whoever needs it.
Thank you so much :)

library(lattice)`
library(BiocFileCache)
library(biomaRt)
library(dplyr)
library(ggplot2)
library(RSQLite)
library(devtools)
library(dbplyr)

## *** PART 1: check available plant marts
listMarts( host="https://plants.ensembl.org" )

## connect to the mart database
EPgenes = useEnsembl(biomart="plants_mart",  host="https://plants.ensembl.org")

## find names of available plant data sets
dsets = listDatasets(EPgenes)
head(dsets)

## in this case, find the brassica oleracea one
dsets[grep("Brassica oleracea", dsets$description),]

## take a note of the dataset name 'boleracea_eg_gene'
## *** PART 2: check available filters and attributes
EPgenes <- biomaRt::useMart(biomart = "plants_mart",  
                            dataset = "boleracea_eg_gene", 
                            host = "https://plants.ensembl.org")

head( listFilters(EPgenes) )
head( listAttributes(EPgenes) )

go.all = getBM(attributes=c("ensembl_gene_id", "go_id", "name_1006", "namespace_1003"), mart=EPgenes) 
head(go.all)
dim(go.all)
#[1] 194991      4

## To get a feeling about the GO annotations:
## check how many of the 194991 entries have a GO annotation
sum(go.all$go_id != "")
#[1] 174260

## check how many unique genes are represented in the 194991 entries?
sum(!duplicated( go.all$ensembl_gene_id) )
#[1] 60586

## remove the genes that don't have a GO annotation.
go.all <- go.all[go.all$go_id != "", ]
dim(go.all)
#[1] 174260      4

## check how many unique genes are represented in the 174260 entries?
sum(!duplicated( go.all$ensembl_gene_id) )
#[1] 39855

### **** PART 4: perform GO ORA analysis GO overrepresentation analysis (ORA) using my input
# I should use the universal enrichment function enricher together with the arguments TERM2GENE and TERM2NAME
# because the more specific function enrichGO (and enrichKEGG) work only with NCBI-based org.xx.eg.db annotation packages.
# note the use of arguments TERM2GENE and TERM2NAME. Their column
# order is important!

#setwd
library(clusterProfiler)    
library(forcats)            
library(enrichplot)         
library(pathview)           
library(data.table)         
library(ggplot2)            
library(GOsummaries)        
library(DOSE)     
library(tidyverse)

# for universe you will use this line:
# universe=go.all$ensembl_gene_id,  
# OR or make your own universe, according to your interest

#### uploading universe file
universe = read.delim("universe_Bol.txt", header = T)
universe<-as.character(universe[,1])
universe <- sort(universe, decreasing = TRUE)
head(universe)

#### uploading File with genes to analyze
gene <- read.delim("GO.txt", header = T)
head(gene)

ORA <- compareCluster(
  geneClusters=gene,
  enricher,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe= universe,
  minGSSize = 10,
  maxGSSize = 500,
  qvalueCutoff = 0.05,
  TERM2GENE = go.all[go.all$namespace_1003 == "biological_process",  c("go_id","ensembl_gene_id")],
  TERM2NAME = go.all[go.all$namespace_1003 == "biological_process",  c("go_id","name_1006")] )

## check
as.data.frame(ORA)[1:15,]

from clusterprofiler.

GO enrichGO using input from g:orth and using biomaRt about clusterprofiler HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent