montilab / hyper Goto Github PK
View Code? Open in Web Editor NEWAn R Package for Geneset Enrichment Workflows
Home Page: https://montilab.github.io/hypeR-docs/
License: GNU General Public License v3.0
An R Package for Geneset Enrichment Workflows
Home Page: https://montilab.github.io/hypeR-docs/
License: GNU General Public License v3.0
Plots are not being displayed within jupyter notebook, this is due visNetwork. It would be nice to fix it as interactive charts are possible within notebooks (maybe see https://altair-viz.github.io/user_guide/display_frontends.html). However, if this is not possible at least add an option to save a plot so that it can be opened separately, e.g.:
graph=visNetwork(nodes, edges, main=list(text=title, style="font-family:Helvetica")) %>%
visNodes(borderWidth=1, borderWidthSelected=0) %>%
visEdges(color="rgb(88,24,69)") %>%
visOptions(highlightNearest=TRUE) %>%
visInteraction(multiselect=TRUE, tooltipDelay=300) %>%
visIgraphLayout(layout="layout_nicely")
if (!is.null(file)) visSave(graph, file=paste0(file,'.html'),
selfcontained = TRUE, background = "white")
Hi:
First of all. Thanks for developing this nice package.
Is there an easier way to work with single-cell RNA-seq data? For example, extract top cluster markers from a seurat object to do hypeR?
Much like your WGCNA example
$ turquoise : chr [1:1902] "CLEC3A" "KCNJ3"
the color name will be cluster name?
thanks for your time
Great package! For someone that is new to working in R I very much appreciate the ease, documentation and tutorials.
I have some suggestions on what I would find useful additions to improve interpretation of biological data:
When drawing dot plot Function hyp_dots() when used for a mylti_hyp object.
See the significance legend:
This is likely due to the newer version of the package scales
. Unsure which version exactly, but it seems like internal function .reverselog_trans() which is based on scales::trans_new() is not working well.
Solution:
https://github.com/montilab/hypeR/blob/master/R/hyp_dots.R
Line 111. Change this line to:
scale_color_continuous(high = "#114357", low = "#E53935", trans = scales::log10_trans(), guide = guide_colorbar(reverse = TRUE))
Or, add this after using hyp_dots() to overwrite the existing function.
hyp_dots(...) + scale_color_continuous(...insert above...)
When RStudio tries to fetch help for the package I get this error
Error in fetch(key) :
lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/hypeR/help/hypeR.rdb' is corrupt
Triggered when I press Tab in hypeR()
function parantheses to see what options available.
I would change the color-coding of the dots in the dotplot. Red and blue have come to be associated with up- and down-regulated. Therefore, I would use a totally different color palette (either a gray scale or a yellow-orange scale, such as the R default, or you pick one).
I'm executing hyperR
with a custom geneset and "kstest"
weighted by the effect size of a differential expression test. I've noticed, however, that the results I get in hyp_obj$data
and the dots plot remains the same whether or not my signature has gene weights.
The only thing that changes if I use an weighted input are the "Running Enrichment Score vs. Position in Ranked List of Genes" plot.
Is this behaviour expected? Shouldn't the gene weights affect the results of the statistical test?
Hi,
I tried to run hypeR on my datasets, however I am always getting pval and fdr as value 1. Could you please check what's an issue here.
I came across a weird issue when trying to use hyp_to_rmd
- the file_path
argument can take a plain file name or an absolute path, but not a relative path:
> library(hypeR)
> library(tidyverse)
#deleted for space
> load(file.path(system.file("extdat", package="hyperworkshop"), "limma.rda"))
> genesets <- msigdb_gsets("Homo sapiens", "C2", "CP:KEGG")
> signature <- signature <- limma %>%
+ dplyr::filter(t > 0 & fdr < 0.001) %>%
+ magrittr::use_series(symbol)
> hyp_obj <- hypeR(signature, genesets, test="hypergeometric", background=50000, fdr=0.01, plotting=TRUE)
> hyp_to_rmd(hyp_obj,
+ file_path="hypeR.rmd")
processing file: hypeR.rmd
# deleted for space
Output created: hypeR.rmd.html
> getwd()
[1] "C:/Users/jenny/"
> dir.create("test")
> hyp_to_rmd(hyp_obj,
+ file_path="test/hypeR.rmd")
Error: The directory 'test' does not not exist.
> hyp_to_rmd(hyp_obj,
+ file_path="C:/Users/jenny/test/hypeR.rmd")
processing file: hypeR.rmd
# deleted for space
Output created: hypeR.rmd.html
I did some debugging and the problem occurs in rmarkdown::render()
. It wants output directory separate from the file name and does weird things like calling setwd(dirname(abs_path(input)))
so you can't put relative paths in hyp_to_rmd(file_path = )
. Easiest thing to do would be to add a warning in the help file, which also should say that file_path
needs to end in .rmd or it won't be rendered properly:
> dir.create("test2")
> hyp_to_rmd(hyp_obj,
+ file_path="C:/Users/jenny/test2/hypeR")
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS hypeR.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc552854ab6fdb.html --email-obfuscation none --self-contained --standalone --section-divs --table-of-contents --toc-depth 1 --variable toc_float=1 --variable toc_selectors=h1 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template "C:\Users\jenny\R\win-library\4.0\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:united" --include-in-header "C:\Users\jenny\AppData\Local\Temp\Rtmpyuarak\rmarkdown-str55283b06aa5.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --lua-filter "C:/Users/jenny/R/win-library/4.0/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/jenny/R/win-library/4.0/rmarkdown/rmd/lua/latex-div.lua" --variable code_folding=hide --variable code_menu=1
Output created: test2/hypeR.html
#open test2/hypeR.html and see it's not rendered correctly
One final request: could you strip off the .rmd when creating the html file name? I am used to having hypeR.rmd and the rendered hypeR.html and hypeR.rmd.html is just clunky. Thanks!
It would be worthwhile to implement a version of the score similar to the one implemented in GSVA, where the difference between the top positive peak and the top negative peak is computed. Thus, if a signature is half enriched at one end and similarly half enriched at the other end, we will get a score close to zero, rather than the largest of the two, which would be misleading.
Currently, the geneset names are reported twice (as rownames, and as "Labels")
I conducted the function hypeR: hyp_obj <- hypeR(signature, geneset_1,fdr=0.05,plotting = T,test = 'kstest'), the signature is weighted, but it returned Error in apply(results[, c("score", "pval", "geneset", "overlap")], 2, : dim(X) must have a positive length. And my geneset was a custom geneset.
I cannot get the plot.
Hi!
Your package is very useful, thanks.
I've used hypeR
for a couple of months but today I got a client error 404 by doing:
enrichr_gsets("Jensen_DISEASES")
or enrichr_gsets("GO_Biological_Process_2018")
basically with any Enrichr DB and I got this error:
Error in enrichr_connect(.format_str("geneSetLibrary?mode=text&libraryName={1}", : Client error: (400) Bad Request
Thank you for your help.
I recently update some packages and I don't know if it is causing conflicts.
`> sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os macOS 10.16
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Mexico_City
date 2021-03-08
─ Packages ─────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.0)
broom 0.7.5 2021-02-19 [1] CRAN (R 4.0.2)
cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.2)
callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
car 3.0-10 2020-09-29 [1] CRAN (R 4.0.2)
carData 3.0-4 2020-05-22 [1] CRAN (R 4.0.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0)
cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.2)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.2)
colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
cowsay 0.8.0 2020-02-06 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.0)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
forcats 0.5.1 2021-01-27 [1] CRAN (R 4.0.2)
foreign 0.8-81 2020-12-22 [1] CRAN (R 4.0.2)
fortunes 1.5-4 2016-12-29 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
ggforce 0.3.3 2021-03-05 [1] CRAN (R 4.0.2)
ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
ggpubr 0.4.0 2020-06-27 [1] CRAN (R 4.0.2)
ggsignif 0.6.1 2021-02-23 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.0.2)
hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.2)
httpuv 1.5.5 2021-01-13 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
hypeR * 1.6.0 2020-10-27 [1] Bioconductor
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2)
inline 0.3.17 2020-12-01 [1] CRAN (R 4.0.2)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
kableExtra 1.3.4 2021-02-20 [1] CRAN (R 4.0.2)
knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
loo 2.4.1 2020-12-09 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
MASS 7.3-53.1 2021-02-12 [1] CRAN (R 4.0.2)
matrixStats 0.58.0 2021-01-29 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2)
mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
msigdbr 7.2.1 2020-10-02 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
openxlsx 4.2.3 2020-10-27 [1] CRAN (R 4.0.2)
packrat 0.5.0 2018-11-14 [1] CRAN (R 4.0.0)
pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.2)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgload 1.2.0 2021-02-23 [1] CRAN (R 4.0.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.2)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
RcppParallel 5.0.3 2021-02-24 [1] CRAN (R 4.0.2)
reactable 0.2.3 2020-10-04 [1] CRAN (R 4.0.2)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0)
remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.0)
rio 0.5.26 2021-03-01 [1] CRAN (R 4.0.2)
rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.2)
rmsfact 0.0.3 2016-08-04 [1] CRAN (R 4.0.2)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
rstan 2.21.2 2020-07-27 [1] CRAN (R 4.0.2)
rstatix 0.7.0 2021-02-13 [1] CRAN (R 4.0.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.2)
StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
svglite 2.0.0 2021-02-20 [1] CRAN (R 4.0.2)
systemfonts 1.0.1 2021-02-09 [1] CRAN (R 4.0.2)
testthat 3.0.2 2021-02-14 [1] CRAN (R 4.0.2)
tibble 3.1.0 2021-02-25 [1] CRAN (R 4.0.2)
tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
tweenr 1.0.1 2018-12-14 [1] CRAN (R 4.0.2)
usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
V8 3.4.0 2020-11-04 [1] CRAN (R 4.0.2)
vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0)
visNetwork 2.0.9 2019-12-06 [1] CRAN (R 4.0.0)
webshot 0.5.2 2019-11-22 [1] CRAN (R 4.0.0)
withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2)
xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.0)
zip 2.1.1 2020-08-27 [1] CRAN (R 4.0.2)`
Hi hypeR developers,
I am currently using your hypeR package on R 4.0.0 and I am having trouble downloading the genesets. I get the following errors. I was using hypeR over a month ago and was having no issue with it at all.
Error in msigdb_download_all(species = "Homo sapiens") :
could not find function "msigdb_download_all"
Error in msigdb_fetch(msigdb_path, "C2.CP.BIOCARTA") :
could not find function "msigdb_fetch"
Please help.
Thanks
The data table returned by hypeR includes in the hits column the names of the features (genes) separated by commas. To make it more robust, these names should be quoted, as is done in the '.csv' format.
I am using hypeR to perform enrichment analysis with metabolites (based on custom-made metabolite-sets), and often these metabolites have commas in their names. E.g., "5alpha-pregnan-3beta,20alpha-diol disulfate"
The enrichment maps open up in the viewer, and are then unable to be exported in a .tiff format, but can in a .png and .jpg format. This is more of an annoyance than anything else as the enrichment map node label often overlap and intersect, making it hard to determine which label goes with which node.
Hi I hope you can help.
I have tried to perform GSEA using hype R and using the following code:
hyp_obj <- hypeR(weighted.signature, genesets, test="gsea", background=30000)
However I am met with the following error:
Error in match.arg(test) :
'arg' should be one of “hypergeometric”, “kstest”
It doesn't seem to recognise the gsea argument as layed out in the user guide. It works fine for the hypergeometric so I know my genes are recognised and I have tried passing the weighted signature in the same format as the guide to no avail.
Thank you for your time.
Since these are 'constant' across genesets, they should be reported once, perhaps on top of the table (if possible).
Hi,
I just reinstalled hypeR
from its GitHub master branch and tried following to obtain canonical pathways from MSigDB:
CP <- msigdb_gsets("Homo sapiens", "C2", "CP")
With this, I'm expecting to get 2199 gene sets including BIOCARTA, KEGG, PID, REACTOME and more listed here: http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#CP But instead it returns an object with 29 gene sets.
length(CP$genesets)
[1] 29
Any idea why this would happen? Is this something caused by msigdbr
?
Thanks much!
In line 74 of hyper_enrichment.R there's a call to hyperEnrichment, which should be corrected to hyper_enrichment
The documentation for hyp_dots
function says that sizes
argument is "Size dots by geneset sizes". This is true if merge = FALSE
.
But if merge = TRUE
the size of dots actually corresponds to the significance value (pval or fdr). It took me a while to find it by looking at the code. Please update the documentation.
Also I don't see a reason not to show the size legends by default (if sizes = TRUE).
Is there a way to save interactive hyp_emap objects (as an htlm?) and keep its features like zooming/collapsing etc? Whenever I try to save the object as an htlm using the RStudio interface I get the following error:
Error: pandoc document conversion failed with error 127
Hi Team,
Please see:
https://support.bioconductor.org/p/9152979/
I guess it can be solved if you push the latest updates of dev
onto the upstream corresponding one on Bioc.
I tickled a bit myself with the code on master
, but I saw later you also did the same fix: a74b3b8
Hope to see this built soon on the BBS 😃
Federico
I realized that .kstest
, as it is now implemented, does not match ks.test
's behavior when alternative != "two.sided"
. In particular, the score in ks.test
is defined as follows:
STATISTIC <- switch(alternative, two.sided = max(abs(z)),
greater = max(z), less = -min(z))
Thus, in .kstest
, score
should be computed as follows:
score <- if (absolute) max(z)-min(z) else switch(alternative,
two.sided = z[which.max(abs(z))],
greater = max(z),
less = min(z))
Also, the command
results <- suppressWarnings(ks.test(1:n.x, y, alternative="less"))
should be replaced by
results <- suppressWarnings(ks.test(y, 1:n.x, alternative="greater"))
Hi, hypeR developing team.
Thank you for your great package, hypeR.
I've used hypeR since this October, and there have been no problems for enrichment analysis. I tried to analyze another dataset today, I couldn't run "hyperdb_fetch()" and "hyperdb_info()".
>gsets <- hyperdb_fetch(type = "gsets", "KEGG_2019_Mouse")
Error in readRDS(temp) : unknown input format
>hyperdb_info()
Getting available gsets ...
Error in gh("/repos/:owner/:repo/contents/:path", owner = "montilab", :
GitHub API error (404): 404 Not Found
Not Found
I got such error messages.
Would you please give some tips for solving this problem?
Thank you.
R version: 3.6.1
Bioconductor version: 3.10
hypeR version: 1.2.0
When assigning a rgsets object to a new variable, this variable is only a reference to the old object, not a new one. Here is an example:
# genesets
genesets <- list(
gset1=c("gene1","gene2","gene3"),
gset2=c("gene4","gene5","gene6"),
gset3=c("gene7","gene8","gene9"))
# nodes
nodes <- data.frame(row.names = unlist(genesets),label=unlist(genesets))
## some random edges
edges <- data.frame(
from = gset1,
to = gset3)
rgset1 <- hypeR::rgsets$new(genesets=genesets,nodes=nodes,edges=edges,name="test_geneset",version="1.0")
Now, if I assign that object (rgset1
) to a new variable (rgset2
):
> rgset2 <- rgset1
> rgset2$genesets
$gset1
[1] "gene1" "gene2" "gene3"
$gset2
[1] "gene4" "gene5" "gene6"
$gset3
[1] "gene7" "gene8" "gene9"
And I modify that object
> rgset2$genesets <- rgset1$genesets[1:2]
> rgset2$genesets
$gset1
[1] "gene1" "gene2" "gene3"
$gset2
[1] "gene4" "gene5" "gene6"
This also modifies the original object.
> rgset1$genesets
$gset1
[1] "gene1" "gene2" "gene3"
$gset2
[1] "gene4" "gene5" "gene6"
When running hyp_to_excel, the following warning is output:
Note: zip::zip() is deprecated, please use zip::zipr() instead
[1] 1 1 1 1 1 1 1
Hi, very nice package, thank you for making it!
I am confused as to what the x-axis FDR represents after calling hyp_plot on a hyp object. Is this different from the color coded FDR or is it the same values and redundant info?
I would also like to know if it would be possible to output the gene overlap or gene ratio size alongside the dots.
Thank you for your time!
Ideally, when hypeR is called with the KS test, the hypeR-dot plot should color-code the dots based on whether a geneset is positively (red) or negatively (blue) skewed (w/ white as non-significant in the middle).
If this solution is not feasible, an alternative would be to do internally what is now shown in the vignette for fastGSEA, where the output is split into es>0 and es<0 and these results are reported separately. E.g., if the ranked signature sig
is tested, its results should be split into two signatures, sig.up
and sig.dn
, and the hypeR-dot plot will have two corresponding "columns", so we'll know what's enriched up and what's enriched down.
When calling hypeR with a list of signatures, even if quiet=TRUE
, the names of the signatures will be printed out. This should not happen. E.g.,
> names(modsSymbols)
[1] "brown" "turquoise" "red" "blue" "yellow" "pink" "green" "salmon"
[9] "cyan" "black" "grey" "greenyellow" "purple" "tan" "magenta"
> mhypWgcna <- hypeR(modsSymbols, genesets, test="hypergeometric", background=background, plotting=FALSE,quiet=TRUE)
brown
turquoise
red
blue
yellow
pink
green
salmon
cyan
black
grey
greenyellow
purple
tan
magenta
Hi Monti Lab team,
Thanks for assembling this great toolkit! It's really streamlining my DEG pathway analysis.
I've run into a bug around retrieving the MSigDB C7 (Immunologic Signatures) gene sets, which I think is due to handling of categories without a subcategory.
Using "" as the subcategory for C7 doesn't work:
msigdb_imm <- msigdb_gsets("Homo sapiens", "C7", "")
Error in msigdbr(species, category, subcategory): unknown subcategory
Traceback:
1. msigdb_gsets("Homo sapiens", "C7", "")
2. msigdb_download(species, category, subcategory)
3. msigdbr(species, category, subcategory)
4. stop("unknown subcategory")
It looks like msigdbr
may be expecting NA
for this case where there's no subcategory. However, this leads to a new error:
msigdb_imm <- msigdb_gsets("Homo sapiens", "C7", NA)
Error in if (name == "Custom" & !quiet) warning("Describing genesets with a name will aid reproducibility"): missing value where TRUE/FALSE needed
Traceback:
1. msigdb_gsets("Homo sapiens", "C7", NA)
2. gsets$new(genesets, name = name, version = version, clean = clean)
3. initialize(...)
This can be fixed by adding is.na(subcategory)
to the ifelse
switch in the msigdb_gsets
function. This version works:
msigdb_gsets2 <- function(species, category, subcategory = "", clean = FALSE)
{
genesets <- msigdb_download(species, category, subcategory)
name <- ifelse(subcategory == "" | is.na(subcategory), category, paste(category, subcategory, sep = "."))
version <- msigdb_version()
gsets$new(genesets, name = name, version = version, clean = clean)
}
msigdb_imm <- msigdb_gsets2("Homo sapiens", "C7", NA)
In case it helps, I'm running hyperR v1.8.0, with msigdbr v7.5.1 .
Cheers,
-Lucas Graybuck
Here's an error when trying to generate reactable tables w/ hmaps included. I believe there is some limit case that is not checked for and gives error. You can find the rds here: mhyp_obj.rds
library(hypeR)
mhyp_obj <- readRDS("mhyp_obj.rds")`
## subsetting it to focus on the culprit entry
mhyp_obj1 <- mhyp_obj$mhyp
mhyp_obj1$data <- mhyp_obj1$data[10]
rtbl <- hypeR::rctbl_build(mhyp_obj$mhyp,show_hmaps=TRUE)
Success!
rtbl <- hypeR::rctbl_build(mhyp_obj$mhyp,show_hmaps=TRUE,hyp_hmap_args=list("fdr"=.5))
Error in min(x, na.rm = TRUE) : invalid 'type' (list) of argument
Hi. I am wondering what is the "background" setting? Is it the total number of genes in the genome? Thanks.
Does hypeR have an option where we can filter out very large gene sets? I'd like to set a threshold like 900 so it doesn't do enrichment on gene sets with that and above size. Currently I can post-filter results which is fine too but this can be a nice improvement.
There can also be option to filter out gene sets with minimum percentage. Say only return gene sets where data and the gene set share more than 2% of the genes.
Thanks!
Dear authors,
Thanks for developing this wonderful tools! Can I get the gene signatures that enriched in a specific pathway and show them in the result table? Only the number of signatures are shown in the result table.
Thanks a lot.
Hi! I saw hypeR at BioC2020 and I started to play around with it by running through the vignette on Bioconductor. I installed using devtools::install_github("montilab/hypeR")
but then I ran into an error trying to follow the vignette:
> hypdat <- readRDS(file.path(system.file("extdata", package="hypeR"), "hypdat.rds"))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file 'C:/Users/jenny/R/win-library/4.0/hypeR/extdata/hypdat.rds', probable reason 'No such file or directory'
> system.file("extdata", package="hypeR")
[1] "C:/Users/jenny/R/win-library/4.0/hypeR/extdata"
> dir(path = system.file("extdata", package="hypeR"))
[1] "genesets.rds" "testdat.rds"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hypeR_1.5.1
loaded via a namespace (and not attached):
[1] httr_1.4.2 pkgload_1.1.0 msigdbr_7.1.1 jsonlite_1.7.0 viridisLite_0.3.0
[6] shiny_1.5.0 assertthat_0.2.1 remotes_2.2.0 sessioninfo_1.1.1 pillar_1.4.6
[11] backports_1.1.8 glue_1.4.1 digest_0.6.25 promises_1.1.1 polyclip_1.10-0
[16] rvest_0.3.6 colorspace_1.4-1 httpuv_1.5.4 htmltools_0.5.0 plyr_1.8.6
[21] pkgconfig_2.0.3 devtools_2.3.1 xtable_1.8-4 purrr_0.3.4 scales_1.1.1
[26] webshot_0.5.2 processx_3.4.3 tweenr_1.0.1 later_1.1.0.1 openxlsx_4.1.5
[31] ggforce_0.3.2 tibble_3.0.3 generics_0.0.2 farver_2.0.3 ggplot2_3.3.2
[36] usethis_1.6.1 ellipsis_0.3.1 withr_2.2.0 cli_2.0.2 mime_0.9
[41] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 evaluate_0.14 ps_1.3.3
[46] reactable_0.2.0 fs_1.4.2 fansi_0.4.1 MASS_7.3-51.6 xml2_1.3.2
[51] pkgbuild_1.1.0 tools_4.0.2 prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0
[56] stringr_1.4.0 munsell_0.5.0 zip_2.0.4 callr_3.4.3 kableExtra_1.1.0
[61] compiler_4.0.2 tinytex_0.25 rlang_0.4.7 grid_4.0.2 rstudioapi_0.11
[66] visNetwork_2.0.9 htmlwidgets_1.5.1 igraph_1.2.5 rmarkdown_2.3 testthat_2.3.2
[71] gtable_0.3.0 curl_4.3 reshape2_1.4.4 R6_2.4.1 knitr_1.29
[76] dplyr_1.0.1 fastmap_1.0.1 rprojroot_1.3-2 readr_1.3.1 desc_1.2.0
[81] stringi_1.4.6 Rcpp_1.0.5 vctrs_0.3.2 tidyselect_1.1.0 xfun_0.16
Also, the rendered vignette doesn't show all the packages that need to be loaded - I ran into errors with lines that had dplyr/tidyverse codes. Checking the Rscript shows the necessary packages at the beginning:
## ----include=FALSE, messages=FALSE, warnings=FALSE----------------------------
knitr::opts_chunk$set(message=FALSE, fig.width=6.75)
devtools::load_all(".")
library(tidyverse)
library(magrittr)
library(dplyr)
library(reactable)
These should be shown in the rendered vignette somewhere. I'm getting some good results with my own data - thanks for the package!
Dear hypeR developers,
When trying hypeR for the first time, I set quiet = FALSE
in my hypeR(...)
call, and the output said that Percentage of signature found across genesets: 1%
, which I knew was wrong becasue 100% of my signature overlaps with the several genesets I use. When I checked the source code, I realised that the .check_overlap
function returns an overlap as a fraction (from 0 to 1), not a percentage (from 0 to 100):
Line 37 in eaa02ed
100
when is printed as a percentage in Line 136 in eaa02ed
1%
here actually means 100%
. It would be great to fix this! :) Thank you!rctbl_build
doesn't work at present when running ks test. It would be ideal to fix it and have it report, instead of the 'hits' (not relevant), the features in the "leading edge" (i.e., the features up to the max score)
Hi, I’m wondering if there is a way to modify the dot plot function for multiple signatures in HypeR so that regardless how many terms you’re plotting, it chooses the most significant ones first? Right now, it seems to make an arbitrary cutoff without pre-ranking for significance, e.g. here is the same object plotted using top = 10 and top = 20 as arguments:
hyp_dots(multhyp, top = 10, abrv=100, fdr = 0.005, pval = 1, merge=TRUE)
hyp_dots(multhyp, top = 20, abrv=100, fdr = 0.005, pval = 1, merge=TRUE)
it seems that it would be a great function if when I want to plot only 10 terms, GO_CELLULAR_RESPIRATION would make the list, but GO_AEROBIC_RESPIRATION wouldn't. Thank you!
Currently the msigdbr package includes Ensembl IDs in the output data frame of gene sets. Since there are multiple Ensembl IDs corresponding to some genes, your msigdb_download
function returns duplicated genes in some gene sets. For example, check HALLMARK_APICAL_JUNCTION (Human, H category).
You just need to add distinct() %>%
after the second line below in the db_msig.R
(lines 148-151):
mdf <- msigdbr(species, category, subcategory) %>%
dplyr::select(gs_name, gene_symbol) %>%
as.data.frame() %>%
stats::aggregate(gene_symbol ~ gs_name, data=., c)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.