montilab / hyper Goto Github PK

View Code? Open in Web Editor NEW

73.0 7.0 11.0 37.78 MB

An R Package for Geneset Enrichment Workflows

Home Page: https://montilab.github.io/hypeR-docs/

License: GNU General Public License v3.0

R 89.39% Dockerfile 0.32% CSS 10.29%

geneset-enrichment-analysis bioinformatics computational-biology

hyper's People

Contributors

Stargazers

Watchers

Forkers

jasonzhao0307 bioinfonerd-forks assaron pythseq tangbozeng shbrief chenyang666892 shaoyoucheng xchromosome219 andrewdchen

hyper's Issues

Plots not diplaying in jupyter notebook

Plots are not being displayed within jupyter notebook, this is due visNetwork. It would be nice to fix it as interactive charts are possible within notebooks (maybe see https://altair-viz.github.io/user_guide/display_frontends.html). However, if this is not possible at least add an option to save a plot so that it can be opened separately, e.g.:

    graph=visNetwork(nodes, edges, main=list(text=title, style="font-family:Helvetica")) %>%
    visNodes(borderWidth=1, borderWidthSelected=0) %>%
    visEdges(color="rgb(88,24,69)") %>%
    visOptions(highlightNearest=TRUE) %>%
    visInteraction(multiselect=TRUE, tooltipDelay=300) %>%
    visIgraphLayout(layout="layout_nicely")
    if (!is.null(file)) visSave(graph, file=paste0(file,'.html'), 
                                selfcontained = TRUE, background = "white")

Issue fetching rgsets - GitHub API error (404)

Hi,
I am using hypeR version 1.2.0 installed by Bioconductor version 3.10 (R 3.6.1). The only issue I have with the package is with rgsets, it seems that available rgsets are not found. See image attached:

Is this my issue or a GitHub API one?

Thank you for your help,
Manos

work with single-cell RNA-seq data?

Hi:

First of all. Thanks for developing this nice package.

Is there an easier way to work with single-cell RNA-seq data? For example, extract top cluster markers from a seurat object to do hypeR?

Much like your WGCNA example
$ turquoise : chr [1:1902] "CLEC3A" "KCNJ3"
the color name will be cluster name?

thanks for your time

Some suggestions

Great package! For someone that is new to working in R I very much appreciate the ease, documentation and tutorials.
I have some suggestions on what I would find useful additions to improve interpretation of biological data:

Introduce a fold enrichment in addition to p-value and FDR. Have seen it defined as fold enrichment = (b/n) / (B/N), where, N - the total number of genes, B - the total number of genes associated with a specific geneset, n - the number of genes in the top or bottom of the user's input list for ranked or in the target set for unranked, b - the number of genes in the intersection. With weighted signatures it would be informative with an option of calculating fold enrichment based on values (e.g. log2 ratio) supplied.
From my experience with weighted signatures with the kstest test, enrichment is returned for genesets enriched in the top of the signature only. When comparing samples I find it informative to also know what is enriched at the bottom of the signature. Often you see shifts in geneset enrichment between samples from top to bottom of the list, and vice versa. An option to also output genesets at the bottom of the lists would be very informative to me. To separate genesets that are enriched at the top and the bottom of the signature, a minus could be added to the –log10(FDR/p-value). This way, genesets at the top of the list will be positive and those at the bottom negative and easily separated in dotplot or heatmap.
For large datasets with many samples I find it useful to use a heatmap to display enrichment with positive and negative –log10(p-values/FDR) for enrichment in the top and bottom of the list. However, genesets with few genes will have larger p-values so doing a similar plot with fold enrichment would be informative. But then some geneset may show fold enrichment but no significance. Alternatively, a combined score of p-value and fold enrichment could be calculated. For example by summing or multiplying the –log10(p-value/FDR) and fold enrichment. Using log2 as fold change would give givening a minus values for enrichment in bottom and positive values at the top of the signature.
In the plot from hyp_dots I take it that the size is based on the number of genes in the geneset? Is this the number that were overlapping with the signature? In one of my enrichments there were large differences in the number of genes in the different genesets and some geneset had a very small dot that was difficult to see. It would be good to be able to customize the dot sizes. Add log based on nr of genes in the geneset? Also, the legend for nr of genes in the geneset is missing. In the legend and x-axis, the smallest FDR is missing for me. Could the tics and color range be customizable? For the hyp_dots it would also be good to be able to choose if you want to display FDR or fold enrichment on the X-axis. Color and size could then be chosen as FDR, fold enrichment or nr of genes in the geneset. Personally, I prefer white background with grey lines. But now we are getting into a lot of custmiztion as in the ggplot area I guess. Can you take the hyp_dots and make it into a ggplot object for further customization or make your own template in ggplot and use in hypeR?
When I get data from an omics experiment it is always with samples as columns and gene symbols as rows and when I import it into R it will be a data frame. To me, it would be good with an example on how to do weighted ranked analysis with multiple samples. Output would preferentially be enrichment of genesets (FDR, combined FDR & fold change score) both at the top and the bottom of signature. I would prefer some heatmap to view results. Could be clustered to find differences in geneset usage between samples. Changing from data frame to the format that hypeR requires is challenging for me. Do you know any tutorial for that? Some basic use in some more tutorial as suggested above would be most welcome.
When comparing 2 (or more groups) of samples and performing a statistical test to define 2 unranked lists of up and down regulated genes, it would be informative to be able to do the enrichment of both the up and down regulated lists and then visualize the results together. For example, using the ratios in the comparison to define what is enriched in the up and down regulated lists, respectively. In a dotplot, x-axis could be ratio (mean, median, boxplot) of the genes from the geneset and color FDR.
Best
Henrik

multihyp dot plot significance legend is not showing properly (solution included)

When drawing dot plot Function hyp_dots() when used for a mylti_hyp object.
See the significance legend:

This is likely due to the newer version of the package scales. Unsure which version exactly, but it seems like internal function .reverselog_trans() which is based on scales::trans_new() is not working well.

Solution:
https://github.com/montilab/hypeR/blob/master/R/hyp_dots.R
Line 111. Change this line to:
scale_color_continuous(high = "#114357", low = "#E53935", trans = scales::log10_trans(), guide = guide_colorbar(reverse = TRUE))

Or, add this after using hyp_dots() to overwrite the existing function.
hyp_dots(...) + scale_color_continuous(...insert above...)

Example output:

Cannot fetch help through RStudio

When RStudio tries to fetch help for the package I get this error

Error in fetch(key) : 
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/hypeR/help/hypeR.rdb' is corrupt

Triggered when I press Tab in hypeR() function parantheses to see what options available.

Dots plot color-coding

I would change the color-coding of the dots in the dotplot. Red and blue have come to be associated with up- and down-regulated. Therefore, I would use a totally different color palette (either a gray scale or a yellow-orange scale, such as the R default, or you pick one).

Weighted and unweighted "kstest" giving the same results

I'm executing hyperR with a custom geneset and "kstest" weighted by the effect size of a differential expression test. I've noticed, however, that the results I get in hyp_obj$data and the dots plot remains the same whether or not my signature has gene weights.

The only thing that changes if I use an weighted input are the "Running Enrichment Score vs. Position in Ranked List of Genes" plot.

Is this behaviour expected? Shouldn't the gene weights affect the results of the statistical test?

What's the issue with FDR ?

Hi,

I tried to run hypeR on my datasets, however I am always getting pval and fdr as value 1. Could you please check what's an issue here.

https://ibb.co/9ZtKvL1

hyp_to_rmd(file_path = ) has problems with relative paths

I came across a weird issue when trying to use hyp_to_rmd - the file_path argument can take a plain file name or an absolute path, but not a relative path:

> library(hypeR)
> library(tidyverse)
#deleted for space 
> load(file.path(system.file("extdat", package="hyperworkshop"), "limma.rda"))
> genesets <- msigdb_gsets("Homo sapiens", "C2", "CP:KEGG")
> signature <- signature <- limma %>%
+   dplyr::filter(t > 0 & fdr < 0.001) %>%
+   magrittr::use_series(symbol)
> hyp_obj <- hypeR(signature, genesets, test="hypergeometric", background=50000, fdr=0.01, plotting=TRUE)
> hyp_to_rmd(hyp_obj,
+            file_path="hypeR.rmd")
processing file: hypeR.rmd
# deleted for space
Output created: hypeR.rmd.html

> getwd()
[1] "C:/Users/jenny/"
> dir.create("test")
> hyp_to_rmd(hyp_obj,
+            file_path="test/hypeR.rmd")
Error: The directory 'test' does not not exist.

> hyp_to_rmd(hyp_obj,
+            file_path="C:/Users/jenny/test/hypeR.rmd")
processing file: hypeR.rmd
# deleted for space
Output created: hypeR.rmd.html

I did some debugging and the problem occurs in rmarkdown::render(). It wants output directory separate from the file name and does weird things like calling setwd(dirname(abs_path(input))) so you can't put relative paths in hyp_to_rmd(file_path = ). Easiest thing to do would be to add a warning in the help file, which also should say that file_path needs to end in .rmd or it won't be rendered properly:

> dir.create("test2")
> hyp_to_rmd(hyp_obj,
+            file_path="C:/Users/jenny/test2/hypeR")
"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS hypeR.utf8.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc552854ab6fdb.html --email-obfuscation none --self-contained --standalone --section-divs --table-of-contents --toc-depth 1 --variable toc_float=1 --variable toc_selectors=h1 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template "C:\Users\jenny\R\win-library\4.0\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:united" --include-in-header "C:\Users\jenny\AppData\Local\Temp\Rtmpyuarak\rmarkdown-str55283b06aa5.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --lua-filter "C:/Users/jenny/R/win-library/4.0/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/jenny/R/win-library/4.0/rmarkdown/rmd/lua/latex-div.lua" --variable code_folding=hide --variable code_menu=1 

Output created: test2/hypeR.html
#open test2/hypeR.html and see it's not rendered correctly

One final request: could you strip off the .rmd when creating the html file name? I am used to having hypeR.rmd and the rendered hypeR.html and hypeR.rmd.html is just clunky. Thanks!

KS with mx.diff=TRUE (GSVA-like)

It would be worthwhile to implement a version of the score similar to the one implemented in GSVA, where the difference between the top positive peak and the top negative peak is computed. Thus, if a signature is half enriched at one end and similarly half enriched at the other end, we will get a score close to zero, rather than the largest of the two, which would be misleading.

Set rownames=FALSE when calling rctbl_hyp

Currently, the geneset names are reported twice (as rownames, and as "Labels")

error in GSEA about weighted signatures

I conducted the function hypeR: hyp_obj <- hypeR(signature, geneset_1,fdr=0.05,plotting = T,test = 'kstest'), the signature is weighted, but it returned Error in apply(results[, c("score", "pval", "geneset", "overlap")], 2, : dim(X) must have a positive length. And my geneset was a custom geneset.
I cannot get the plot.

Error in conexion

Hi!
Your package is very useful, thanks.
I've used hypeR for a couple of months but today I got a client error 404 by doing:
enrichr_gsets("Jensen_DISEASES") or enrichr_gsets("GO_Biological_Process_2018") basically with any Enrichr DB and I got this error:
Error in enrichr_connect(.format_str("geneSetLibrary?mode=text&libraryName={1}", : Client error: (400) Bad Request

Thank you for your help.

I recently update some packages and I don't know if it is causing conflicts.

`> sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os macOS 10.16
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Mexico_City
date 2021-03-08

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.0)
broom 0.7.5 2021-02-19 [1] CRAN (R 4.0.2)
cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.2)
callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
car 3.0-10 2020-09-29 [1] CRAN (R 4.0.2)
carData 3.0-4 2020-05-22 [1] CRAN (R 4.0.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0)
cli 2.3.1 2021-02-23 [1] CRAN (R 4.0.2)
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.2)
colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2)
cowsay 0.8.0 2020-02-06 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.0)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
forcats 0.5.1 2021-01-27 [1] CRAN (R 4.0.2)
foreign 0.8-81 2020-12-22 [1] CRAN (R 4.0.2)
fortunes 1.5-4 2016-12-29 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
ggforce 0.3.3 2021-03-05 [1] CRAN (R 4.0.2)
ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
ggpubr 0.4.0 2020-06-27 [1] CRAN (R 4.0.2)
ggsignif 0.6.1 2021-02-23 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.0.2)
hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.2)
httpuv 1.5.5 2021-01-13 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
hypeR * 1.6.0 2020-10-27 [1] Bioconductor
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2)
inline 0.3.17 2020-12-01 [1] CRAN (R 4.0.2)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
kableExtra 1.3.4 2021-02-20 [1] CRAN (R 4.0.2)
knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
loo 2.4.1 2020-12-09 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
MASS 7.3-53.1 2021-02-12 [1] CRAN (R 4.0.2)
matrixStats 0.58.0 2021-01-29 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2)
mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
msigdbr 7.2.1 2020-10-02 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
openxlsx 4.2.3 2020-10-27 [1] CRAN (R 4.0.2)
packrat 0.5.0 2018-11-14 [1] CRAN (R 4.0.0)
pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.2)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgload 1.2.0 2021-02-23 [1] CRAN (R 4.0.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.2)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
RcppParallel 5.0.3 2021-02-24 [1] CRAN (R 4.0.2)
reactable 0.2.3 2020-10-04 [1] CRAN (R 4.0.2)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0)
remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.0)
rio 0.5.26 2021-03-01 [1] CRAN (R 4.0.2)
rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.2)
rmsfact 0.0.3 2016-08-04 [1] CRAN (R 4.0.2)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
rstan 2.21.2 2020-07-27 [1] CRAN (R 4.0.2)
rstatix 0.7.0 2021-02-13 [1] CRAN (R 4.0.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.2)
StanHeaders 2.21.0-7 2020-12-17 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
svglite 2.0.0 2021-02-20 [1] CRAN (R 4.0.2)
systemfonts 1.0.1 2021-02-09 [1] CRAN (R 4.0.2)
testthat 3.0.2 2021-02-14 [1] CRAN (R 4.0.2)
tibble 3.1.0 2021-02-25 [1] CRAN (R 4.0.2)
tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
tweenr 1.0.1 2018-12-14 [1] CRAN (R 4.0.2)
usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
V8 3.4.0 2020-11-04 [1] CRAN (R 4.0.2)
vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0)
visNetwork 2.0.9 2019-12-06 [1] CRAN (R 4.0.0)
webshot 0.5.2 2019-11-22 [1] CRAN (R 4.0.0)
withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2)
xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.0)
zip 2.1.1 2020-08-27 [1] CRAN (R 4.0.2)`

Cannot download genesets

Hi hypeR developers,

I am currently using your hypeR package on R 4.0.0 and I am having trouble downloading the genesets. I get the following errors. I was using hypeR over a month ago and was having no issue with it at all.

Error in msigdb_download_all(species = "Homo sapiens") :
could not find function "msigdb_download_all"

Error in msigdb_fetch(msigdb_path, "C2.CP.BIOCARTA") :
could not find function "msigdb_fetch"

Please help.

Thanks

Hits names should be quoted (like in the csv format)

The data table returned by hypeR includes in the hits column the names of the features (genes) separated by commas. To make it more robust, these names should be quoted, as is done in the '.csv' format.
I am using hypeR to perform enrichment analysis with metabolites (based on custom-made metabolite-sets), and often these metabolites have commas in their names. E.g., "5alpha-pregnan-3beta,20alpha-diol disulfate"

Enrichment maps unable to export in a .tiff format

The enrichment maps open up in the viewer, and are then unable to be exported in a .tiff format, but can in a .png and .jpg format. This is more of an annoyance than anything else as the enrichment map node label often overlap and intersect, making it hard to determine which label goes with which node.

GSEA arguement not recognised hypeR

Hi I hope you can help.
I have tried to perform GSEA using hype R and using the following code:
hyp_obj <- hypeR(weighted.signature, genesets, test="gsea", background=30000)

However I am met with the following error:
Error in match.arg(test) :
'arg' should be one of “hypergeometric”, “kstest”

It doesn't seem to recognise the gsea argument as layed out in the user guide. It works fine for the hypergeometric so I know my genes are recognised and I have tried passing the weighted signature in the same format as the guide to no avail.
Thank you for your time.

Enrichment tables should report background and signature size

Since these are 'constant' across genesets, they should be reported once, perhaps on top of the table (if possible).

msigdb_gsets doesn't give me all the CP gene sets

Hi,

I just reinstalled hypeR from its GitHub master branch and tried following to obtain canonical pathways from MSigDB:

CP <- msigdb_gsets("Homo sapiens", "C2", "CP")

With this, I'm expecting to get 2199 gene sets including BIOCARTA, KEGG, PID, REACTOME and more listed here: http://software.broadinstitute.org/gsea/msigdb/collection_details.jsp#CP But instead it returns an object with 29 gene sets.

length(CP$genesets)
[1] 29

Any idea why this would happen? Is this something caused by msigdbr?

Thanks much!

hyper_enrichment vs. hyperEnrichment

In line 74 of hyper_enrichment.R there's a call to hyperEnrichment, which should be corrected to hyper_enrichment

Dots plot log-normalized x-axis to show unnormalized values as labels

sizes argument in hyp_dots function

The documentation for hyp_dots function says that sizes argument is "Size dots by geneset sizes". This is true if merge = FALSE.

But if merge = TRUE the size of dots actually corresponds to the significance value (pval or fdr). It took me a while to find it by looking at the code. Please update the documentation.

Also I don't see a reason not to show the size legends by default (if sizes = TRUE).

Can I save interactive hyp_emap and keep its features?

Is there a way to save interactive hyp_emap objects (as an htlm?) and keep its features like zooming/collapsing etc? Whenever I try to save the object as an htlm using the RStudio interface I get the following error:
Error: pandoc document conversion failed with error 127

Pushing to upstream on Bioc

Hi Team,

Please see:
https://support.bioconductor.org/p/9152979/

I guess it can be solved if you push the latest updates of dev onto the upstream corresponding one on Bioc.

I tickled a bit myself with the code on master, but I saw later you also did the same fix: a74b3b8

Hope to see this built soon on the BBS 😃

Federico

Show number of nested signatures in reactable tables

.kstest needs to match ks.test (modulo the sign)

I realized that .kstest, as it is now implemented, does not match ks.test's behavior when alternative != "two.sided". In particular, the score in ks.test is defined as follows:

STATISTIC <- switch(alternative, two.sided = max(abs(z)), 
        greater = max(z), less = -min(z))

Thus, in .kstest, score should be computed as follows:

score <- if (absolute) max(z)-min(z) else switch(alternative, 
         two.sided = z[which.max(abs(z))],
         greater = max(z), 
         less = min(z))

Also, the command

results <- suppressWarnings(ks.test(1:n.x, y, alternative="less"))

should be replaced by

results <- suppressWarnings(ks.test(y, 1:n.x, alternative="greater"))

Can't get MSigDB

Hi, hypeR developing team.
Thank you for your great package, hypeR.

I've used hypeR since this October, and there have been no problems for enrichment analysis. I tried to analyze another dataset today, I couldn't run "hyperdb_fetch()" and "hyperdb_info()".

>gsets <- hyperdb_fetch(type = "gsets", "KEGG_2019_Mouse")
Error in readRDS(temp) : unknown input format

>hyperdb_info()
Getting available gsets ... 
Error in gh("/repos/:owner/:repo/contents/:path", owner = "montilab",  : 
  GitHub API error (404): 404 Not Found
  Not Found

I got such error messages.
Would you please give some tips for solving this problem?

Thank you.

R version: 3.6.1
Bioconductor version: 3.10
hypeR version: 1.2.0

rgsets does not support proper object "duplication" by value

When assigning a rgsets object to a new variable, this variable is only a reference to the old object, not a new one. Here is an example:

# genesets
genesets <- list(
  gset1=c("gene1","gene2","gene3"),
  gset2=c("gene4","gene5","gene6"),
  gset3=c("gene7","gene8","gene9"))
# nodes
nodes <- data.frame(row.names = unlist(genesets),label=unlist(genesets))
## some random edges
edges <- data.frame(
  from = gset1,
  to = gset3)

rgset1 <- hypeR::rgsets$new(genesets=genesets,nodes=nodes,edges=edges,name="test_geneset",version="1.0")

Now, if I assign that object (rgset1) to a new variable (rgset2):

> rgset2 <- rgset1
> rgset2$genesets

$gset1
[1] "gene1" "gene2" "gene3"

$gset2
[1] "gene4" "gene5" "gene6"

$gset3
[1] "gene7" "gene8" "gene9"

And I modify that object

> rgset2$genesets <- rgset1$genesets[1:2]
> rgset2$genesets
$gset1
[1] "gene1" "gene2" "gene3"

$gset2
[1] "gene4" "gene5" "gene6"

This also modifies the original object.

> rgset1$genesets
$gset1
[1] "gene1" "gene2" "gene3"

$gset2
[1] "gene4" "gene5" "gene6"

hyp_to_excel: zip::zip() is deprecated

When running hyp_to_excel, the following warning is output:
Note: zip::zip() is deprecated, please use zip::zipr() instead
[1] 1 1 1 1 1 1 1

Add a label to mark -log(0.05) on visualization plot

X-axis and Outputting Gene Overlap Size

Hi, very nice package, thank you for making it!

I am confused as to what the x-axis FDR represents after calling hyp_plot on a hyp object. Is this different from the color coded FDR or is it the same values and redundant info?

I would also like to know if it would be possible to output the gene overlap or gene ratio size alongside the dots.

Thank you for your time!

hypeR-dot plots with KS test should incorporate directionality information

Ideally, when hypeR is called with the KS test, the hypeR-dot plot should color-code the dots based on whether a geneset is positively (red) or negatively (blue) skewed (w/ white as non-significant in the middle).

If this solution is not feasible, an alternative would be to do internally what is now shown in the vignette for fastGSEA, where the output is split into es>0 and es<0 and these results are reported separately. E.g., if the ranked signature sig is tested, its results should be split into two signatures, sig.up and sig.dn, and the hypeR-dot plot will have two corresponding "columns", so we'll know what's enriched up and what's enriched down.

Cannot download genesets (have tried solutions from previous issues)

Hi, I’m new to R and am running into some issues following the vignette for HypeR with fetching geneset databases (I looked through the closed issues but they don’t solve it for me).
I’m using R version 3.6.3 and hypeR_1.2.0. Apologies if I'm missing something basic and thanks for your help.

quiet

When calling hypeR with a list of signatures, even if quiet=TRUE, the names of the signatures will be printed out. This should not happen. E.g.,

> names(modsSymbols)
 [1] "brown"       "turquoise"   "red"         "blue"        "yellow"      "pink"        "green"       "salmon"     
 [9] "cyan"        "black"       "grey"        "greenyellow" "purple"      "tan"         "magenta"    
> mhypWgcna <- hypeR(modsSymbols, genesets, test="hypergeometric", background=background, plotting=FALSE,quiet=TRUE)
brown
turquoise
red
blue
yellow
pink
green
salmon
cyan
black
grey
greenyellow
purple
tan
magenta

MSigDB C7 retrieval fails

Hi Monti Lab team,

Thanks for assembling this great toolkit! It's really streamlining my DEG pathway analysis.

I've run into a bug around retrieving the MSigDB C7 (Immunologic Signatures) gene sets, which I think is due to handling of categories without a subcategory.

Using "" as the subcategory for C7 doesn't work:

msigdb_imm <- msigdb_gsets("Homo sapiens", "C7", "")

Error in msigdbr(species, category, subcategory): unknown subcategory
Traceback:

1. msigdb_gsets("Homo sapiens", "C7", "")
2. msigdb_download(species, category, subcategory)
3. msigdbr(species, category, subcategory)
4. stop("unknown subcategory")

It looks like msigdbr may be expecting NA for this case where there's no subcategory. However, this leads to a new error:

msigdb_imm <- msigdb_gsets("Homo sapiens", "C7", NA)

Error in if (name == "Custom" & !quiet) warning("Describing genesets with a name will aid reproducibility"): missing value where TRUE/FALSE needed
Traceback:

1. msigdb_gsets("Homo sapiens", "C7", NA)
2. gsets$new(genesets, name = name, version = version, clean = clean)
3. initialize(...)

This can be fixed by adding is.na(subcategory) to the ifelse switch in the msigdb_gsets function. This version works:

msigdb_gsets2 <- function(species, category, subcategory = "", clean = FALSE) 
{
    genesets <- msigdb_download(species, category, subcategory)
    name <- ifelse(subcategory == "" | is.na(subcategory), category, paste(category, subcategory, sep = "."))
    version <- msigdb_version()
    gsets$new(genesets, name = name, version = version, clean = clean)
}

msigdb_imm <- msigdb_gsets2("Homo sapiens", "C7", NA)

In case it helps, I'm running hyperR v1.8.0, with msigdbr v7.5.1 .

Cheers,
-Lucas Graybuck

Error in hyp_hmap

Here's an error when trying to generate reactable tables w/ hmaps included. I believe there is some limit case that is not checked for and gives error. You can find the rds here: mhyp_obj.rds

library(hypeR)

mhyp_obj <- readRDS("mhyp_obj.rds")`

## subsetting it to focus on the culprit entry
mhyp_obj1 <- mhyp_obj$mhyp
mhyp_obj1$data <- mhyp_obj1$data[10]

rtbl <- hypeR::rctbl_build(mhyp_obj$mhyp,show_hmaps=TRUE)

Success!

rtbl <- hypeR::rctbl_build(mhyp_obj$mhyp,show_hmaps=TRUE,hyp_hmap_args=list("fdr"=.5))

Error in min(x, na.rm = TRUE) : invalid 'type' (list) of argument

Background setting

Hi. I am wondering what is the "background" setting? Is it the total number of genes in the genome? Thanks.

Run enrichment on gene sets below some size and above some percentage

Does hypeR have an option where we can filter out very large gene sets? I'd like to set a threshold like 900 so it doesn't do enrichment on gene sets with that and above size. Currently I can post-filter results which is fine too but this can be a nice improvement.

There can also be option to filter out gene sets with minimum percentage. Say only return gene sets where data and the gene set share more than 2% of the genes.

Thanks!

reactable should be "wider"

When generating a reactable for mhyp objects, the size of the nested tables is such that often the 'hits' column is cropped. See, e.g., here:

Ideally, the table should be wider so as to allow for easier inspection of overlap hits.

How to get signetures enriched in a pathway?

Dear authors,

Thanks for developing this wonderful tools! Can I get the gene signatures that enriched in a specific pathway and show them in the result table? Only the number of signatures are shown in the result table.

Thanks a lot.

install_github("montilab/hypeR") missing hypdat.rds and vignette .html missing packages

Hi! I saw hypeR at BioC2020 and I started to play around with it by running through the vignette on Bioconductor. I installed using devtools::install_github("montilab/hypeR") but then I ran into an error trying to follow the vignette:

> hypdat <- readRDS(file.path(system.file("extdata", package="hypeR"), "hypdat.rds"))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file 'C:/Users/jenny/R/win-library/4.0/hypeR/extdata/hypdat.rds', probable reason 'No such file or directory'
> system.file("extdata", package="hypeR")
[1] "C:/Users/jenny/R/win-library/4.0/hypeR/extdata"
> dir(path = system.file("extdata", package="hypeR"))
[1] "genesets.rds" "testdat.rds"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] hypeR_1.5.1

loaded via a namespace (and not attached):
 [1] httr_1.4.2        pkgload_1.1.0     msigdbr_7.1.1     jsonlite_1.7.0    viridisLite_0.3.0
 [6] shiny_1.5.0       assertthat_0.2.1  remotes_2.2.0     sessioninfo_1.1.1 pillar_1.4.6     
[11] backports_1.1.8   glue_1.4.1        digest_0.6.25     promises_1.1.1    polyclip_1.10-0  
[16] rvest_0.3.6       colorspace_1.4-1  httpuv_1.5.4      htmltools_0.5.0   plyr_1.8.6       
[21] pkgconfig_2.0.3   devtools_2.3.1    xtable_1.8-4      purrr_0.3.4       scales_1.1.1     
[26] webshot_0.5.2     processx_3.4.3    tweenr_1.0.1      later_1.1.0.1     openxlsx_4.1.5   
[31] ggforce_0.3.2     tibble_3.0.3      generics_0.0.2    farver_2.0.3      ggplot2_3.3.2    
[36] usethis_1.6.1     ellipsis_0.3.1    withr_2.2.0       cli_2.0.2         mime_0.9         
[41] magrittr_1.5      crayon_1.3.4      memoise_1.1.0     evaluate_0.14     ps_1.3.3         
[46] reactable_0.2.0   fs_1.4.2          fansi_0.4.1       MASS_7.3-51.6     xml2_1.3.2       
[51] pkgbuild_1.1.0    tools_4.0.2       prettyunits_1.1.1 hms_0.5.3         lifecycle_0.2.0  
[56] stringr_1.4.0     munsell_0.5.0     zip_2.0.4         callr_3.4.3       kableExtra_1.1.0 
[61] compiler_4.0.2    tinytex_0.25      rlang_0.4.7       grid_4.0.2        rstudioapi_0.11  
[66] visNetwork_2.0.9  htmlwidgets_1.5.1 igraph_1.2.5      rmarkdown_2.3     testthat_2.3.2   
[71] gtable_0.3.0      curl_4.3          reshape2_1.4.4    R6_2.4.1          knitr_1.29       
[76] dplyr_1.0.1       fastmap_1.0.1     rprojroot_1.3-2   readr_1.3.1       desc_1.2.0       
[81] stringi_1.4.6     Rcpp_1.0.5        vctrs_0.3.2       tidyselect_1.1.0  xfun_0.16

Also, the rendered vignette doesn't show all the packages that need to be loaded - I ran into errors with lines that had dplyr/tidyverse codes. Checking the Rscript shows the necessary packages at the beginning:

## ----include=FALSE, messages=FALSE, warnings=FALSE----------------------------
knitr::opts_chunk$set(message=FALSE, fig.width=6.75)
devtools::load_all(".")
library(tidyverse)
library(magrittr)
library(dplyr)
library(reactable)

These should be shown in the rendered vignette somewhere. I'm getting some good results with my own data - thanks for the package!

.check_overlap returns a fraction, not a percentage

Dear hypeR developers,

When trying hypeR for the first time, I set quiet = FALSE in my hypeR(...) call, and the output said that Percentage of signature found across genesets: 1%, which I knew was wrong becasue 100% of my signature overlaps with the several genesets I use. When I checked the source code, I realised that the .check_overlap function returns an overlap as a fraction (from 0 to 1), not a percentage (from 0 to 100):

hypeR/R/utils.R

Line 37 in eaa02ed

 overlap <- signif(length(intersect(signature, unique(unlist(genesets)))) / length(signature), 2) 

and this fraction is not multiplied by 100 when is printed as a percentage in

hypeR/R/hype.R

Line 136 in eaa02ed

 cat(.format_str("Percentage of signature found across genesets: {1}% \n", overlap)) 

So 1% here actually means 100%. It would be great to fix this! :) Thank you!

rctbl_build does not work with method="ks"

rctbl_build doesn't work at present when running ks test. It would be ideal to fix it and have it report, instead of the 'hits' (not relevant), the features in the "leading edge" (i.e., the features up to the max score)

Order in which hits are displayed with dot plot function

Hi, I’m wondering if there is a way to modify the dot plot function for multiple signatures in HypeR so that regardless how many terms you’re plotting, it chooses the most significant ones first? Right now, it seems to make an arbitrary cutoff without pre-ranking for significance, e.g. here is the same object plotted using top = 10 and top = 20 as arguments:

hyp_dots(multhyp, top = 10, abrv=100, fdr = 0.005, pval = 1, merge=TRUE)

hyp_dots(multhyp, top = 20, abrv=100, fdr = 0.005, pval = 1, merge=TRUE)

it seems that it would be a great function if when I want to plot only 10 terms, GO_CELLULAR_RESPIRATION would make the list, but GO_AEROBIC_RESPIRATION wouldn't. Thank you!

Collapsible nodes in hierarchy map should turn a distinct color

Duplicated gene symbols in the output of msigdb_download

Currently the msigdbr package includes Ensembl IDs in the output data frame of gene sets. Since there are multiple Ensembl IDs corresponding to some genes, your msigdb_download function returns duplicated genes in some gene sets. For example, check HALLMARK_APICAL_JUNCTION (Human, H category).

You just need to add distinct() %>% after the second line below in the db_msig.R (lines 148-151):

    mdf <- msigdbr(species, category, subcategory) %>%
           dplyr::select(gs_name, gene_symbol) %>%
           as.data.frame() %>%
           stats::aggregate(gene_symbol ~ gs_name, data=., c)