zhilongjia / cogena Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 7.0 43.81 MB

co-expressed gene-set enrichment analysis

Home Page: https://github.com/zhilongjia/cogena

R 99.85% CSS 0.15%

bioconductor bioinformatics package r

cogena's Introduction

Hi, I’m Zhilong Jia @zhilongjia
I am an Associate Professor at Chinese PLA General Hospital after obtaining a doctor of philosophy degree in Biomedical Engineering.
My research interests are drug repositioning and omics.
I developed drug repositioning methods, such as cogena, and applied them into various diseases, such as cardiovascular disease, periodontal disease, mountain sickness and COVID-19.

cogena's People

Contributors

Stargazers

Watchers

Forkers

hjanime harielgiacomuzzi alenzhao cwt1 yintz meowjiang sailfish009

cogena's Issues

Error with heatmapPEI()

I am having an issue trying to run drug repositioning.
This same issue also occurs for the pathway analysis using KEGG however I noticed the enrichment table is not fully produced and results in NA.

I am currently only using GO biological process analysis results as this has worked up till this drug repositioning stage where I am now facing this error.

heatmapPEI(cmapDn100_cogena_result, "pam", "10", printGS=FALSE, orderMethod = "7", maintitle="Drug repositioning for Psoriasis")

Error: Can't subset columns that don't exist.
x Location 2 doesn't exist.
i There are only 1 column.
Run rlang::last_error() to see where the error occurred. `

`  rlang::last_error()
<error/vctrs_error_subscript_oob>
Can't subset columns that don't exist.
x Location 2 doesn't exist.
i There are only 1 column.
Backtrace:
  1. cogena::heatmapPEI(...)
  2. cogena::heatmapPEI(...)
  4. tidyr:::gather.data.frame(...)
  6. tidyselect::vars_select(tbl_vars(data), !!!quos)
  7. tidyselect:::eval_select_impl(...)
 15. tidyselect:::vars_select_eval(...)
 16. tidyselect:::loc_validate(pos, vars)
17. vctrs::vec_as_location(pos, n = length(vars))
 19. vctrs:::stop_subscript_oob(...)
 20. vctrs:::stop_subscript(...)
`Run rlang::last_trace() to see the full context. `

rlang::last_trace()
<error/vctrs_error_subscript_oob>
Can't subset columns that don't exist.
x Location 2 doesn't exist.
i There are only 1 column.
Backtrace:
     x
  1. +-cogena::heatmapPEI(...)
  2. \-cogena::heatmapPEI(...)
  3.   +-tidyr::gather(...)
  4.   \-tidyr:::gather.data.frame(...)
  5.     +-base::unname(tidyselect::vars_select(tbl_vars(data), !!!quos))
  6.     \-tidyselect::vars_select(tbl_vars(data), !!!quos)
  7.       \-tidyselect:::eval_select_impl(...)
  8.         +-tidyselect:::with_subscript_errors(...)
  9.         | +-base::tryCatch(...)
 10.         | | \-base:::tryCatchList(expr, classes, parentenv, handlers)
 11.         | |   \-base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
 12.         | |     \-base:::doTryCatch(return(expr), name, parentenv, handler)
 13.         | \-tidyselect:::instrument_base_errors(expr)
 14.         |   \-base::withCallingHandlers(...)
 15.         \-tidyselect:::vars_select_eval(...)
 16.           \-tidyselect:::loc_validate(pos, vars)
 17.             \-vctrs::vec_as_location(pos, n = length(vars))
 18.               \-(function () ...
 19.                 \-vctrs:::stop_subscript_oob(...)
 20.                   \-vctrs:::stop_subscript(...)

Any help on the cause of this error ?

Hello, I would like to ask how to build sampleLabel。

您好！I built a sampleLabel with your data, the first column is samplename, the second column is type, and then read into the file using read.csv(), but directly using ：samplela <- factor(sample, levels=c("ct" , "Psoriasis")) ，there will be an error in the following run, how to build sampleLabel?

sample <- read.csv("sample.csv",head = T)
head（sample）
name type
1 GSM337261 ct
2 GSM337262 ct
3 GSM337263 ct
4 GSM337264 ct
5 GSM337265 ct
6 GSM337266 ct
samplela <- factor(sample, levels=c("ct", "Psoriasis"))
samplela
name type

Levels: ct Psoriasis
非常感谢您！R语言新手，希望能给解决下！

About pathway enrichment tests

The packages uses it's own function to load GMT files. Would it be better to use already implemented functions like those from GSEABase?
Could it be possible to use enrichment analysis of other packages of Bioconductor and to use the Enrichment score instead of testing the -log2(FDR)?

downloaded drugsig data for about 500 down regulated drug compressed it by xz but give this error

cmapDn100_cogena_result <- clEnrich_one(genecl_result, "pam", "10", annofile=system.file("extdata", "down12.gmt.xz", package="cogena"), sampleLabel=sampleLabel)

Error in pei[as.character(k), ] <- cogena::PEI(genenames, annotation = annotation, :
number of items to replace is not a multiple of replacement length

heatmapPEI seems to not work woth {}

Hi,

I wanted to use a for loop to create a dedicated .jpg file for each cluster I had done with coExp, but the {} seems to not allows the graph production of heatmapPEI.
The following is an example:
`source("https://bioconductor.org/biocLite.R")
biocLite("cogena")
require(cogena)
data(Psoriasis)
clen_res <- clEnrich(coExp(DEexprs, nClust=2, clMethods="kmeans", metric="correlation", method="complete", ncore=2, verbose=TRUE),
annofile=system.file("extdata", "c2.cp.kegg.v5.0.symbols.gmt.xz", package="cogena"), sampleLabel=sampleLabel)

jpeg(file="_Test.jpg", width=10, height=8, units="in", res=150)
heatmapPEI(clen_res, "kmeans", "2") # jpg file good
dev.off()

{
jpeg(file="_Test2.jpg", width=10, height=8, units="in", res=150)
heatmapPEI(clen_res, "kmeans", "2")
dev.off() # white jpg file
}`

(Sorry for the R code not colored, I do not know how to do it here).

Thank you for the answer,
Bastien

Error in annotation

Hello dear,

I am getting this error

clen_res <- clEnrich(genecl_result, annofile=annofile, sampleLabel=sampleLabel)
Note: 716 out of 773 exist in the genes population.
Error in annotation[genenames, j] : subscript out of bounds

Although I converted GENE.IDs to SYMBOLS:
library(annotate)
library(org.Hs.eg.db)
getSYMBOL(tT$Gene.ID, data='org.Hs.eg')->geneNames

geneNames[1:20]
[1] "NFATC2IP" "PLAG1" "FRMD8" "ANXA4" "HAUS2" "FBXO4"
[7] "NIF3L1" "QRSL1" "IFNAR1" "P2RX5" "TUBA1A" "ZNF106"
[13] "CASP8" "ACTR8" "MSMO1" "PRG2" "CRB1" "CABLES1"
[19] "DIPK2B" "DOCK1"

Which cluster to choose

Hello dear,

I am looking for pathways involved in viral infection. Actually all KEGG categories listed are involved in viral infection. Which one of these gene clusters would you choose for drug repositioning?

error with clEnrich parameter sampleLabel

Hello.
I have been following the tutorial and have named my variables after the ones listed in Data(Psoriasis).
Despite using the same names for the variables, I had to specify clMethods to get coExp to work ("hierarchical").
When I run clEnrich, I get the following error:
Error in clEnrich(genecl_result, annofile = annofile, sampleLabel = sampleLabel) :
No name for parameter sampleLabel.

Any ideas on what might cause this?
Unlike the example data, I am trying to differentiate between 3 groups (NL, PT, PM) as opposed to the 2 used in the example.

I have posted the code I am using as well my input file for the expression data.
cogena_input.zip

library(cogena)
setwd("C:/Users/omics_IPA/Documents/")
data<-c("NL","NL","NL","NL","NL","PT","PT","PT","PT","PT","PT","PT","PT","PT","PT","PT","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM","PM")
data_DEP=read.csv("cogena_input.csv")
sampleLabel=factor(data,levels=c("NL","PT","PM"))
DEexprs=as.matrix.data.frame(data_DEP);
rownames(DEexprs)<-DEexprs[,1]
DEexprs<-DEexprs[,-1]

Clustering the gene expression profiling

clMethods <- c("hierarchical","kmeans","diana","fanny","som","model","sota","pam","clara","agnes")
genecl_result <- coExp(DEexprs, nClust=5:6, clMethods="hierarchical", metric="correlation", method="complete", ncore=2, verbose=TRUE)

Gene set used

annofile <- system.file("extdata", "c2.cp.kegg.v5.0.symbols.gmt.xz", package="cogena")

Enrichment analysis for clusters

clen_res <- clEnrich(genecl_result, annofile=annofile, sampleLabel=sampleLabel)

Error in annotation [genenames,j] : subscript out of bounds (Enrichment analysis)

While running the below code
<gds3952 <- getGEO('GDS3952')
eset <- GDS2eSet(gds3952, do.log2=TRUE)
fa <- factor(pData(eset)$disease.state)
DExprs1 <- subset(DExprs,select=c("ID","adj.P.Val","P.Value","F","Gene.symbol","Gene.title"))
DExprs1
library(cogena)
ls()
annoGMT2 <- "c5.bp.v5.0.symbols.gmt.xz"
annoGMT <- "c2.cp.kegg.v5.0.symbols.gmt.xz"
annofile <- system.file("extdata", annoGMT, package="cogena")
annofile2 <- system.file("extdata", annoGMT2, package="cogena")
nClust <- 3
sampleLabel <- factor(sampleLabel,levels=c("bba", "ec","mbc","h","presm","posm"))
sampleLabel <- factor(sampleLabel,levels=c("bba", "ec","mbc","h","presm","posm"))
ncore <- 2
clMethods <- c("hierarchical","pam")
metric <- "correlation"
method <- "complete"
genecl_result <- coExp(DExprsr1, nClust=nClust, clMethods=clMethods,
metric=metric, method=method, ncore=ncore)
summary(genecl_result)
clen_res <- clEnrich(genecl_result, annofile=annofile, sampleLabel=sampleLabel)

Getting Error as
<Error in annotation [genenames,j] : subscript out of bounds>