ganglilab / genekitr Goto Github PK

View Code? Open in Web Editor NEW

51.0 3.0 7.0 85.05 MB

🧬 Gene analysis toolkit based on R

Home Page: https://www.genekitr.fun

License: GNU General Public License v3.0

R 100.00%

gene enrichment-analysis id-converter plotting

genekitr's People

Contributors

Stargazers

Watchers

Forkers

rnaimehaom lzlgboy mahlaranjeet genomicsnx healthvivo antecede

genekitr's Issues

transId from alias to symbol no longer works

I'm encountering the following error when using transId() to convert gene aliases to symbols despite the same script working a week ago.

Here's the output using the example in the documentation:

> transId(c("BCC7", "TP53", "PD1", "PDL1", "TET2"), "sym")
Maybe your "trans_to" argument is wrong, please check again...
Error in tbl_vars_dispatch(x) : object 'res' not found

Issue with ORA and importCP

Hi,
Firstly, kudos for the development of genekitr. It's a great tool and your reasons for its creation resonate so much with my experiences so far.

I'm currently working with the organism Yarrowia lipolytica and have noted some challenges:

The geneset for Yarrowia lipolytica is available in the geneset package (GO and KEGG), but there is no organism value attached to it when running getGO:

> mf <- getGO(org = "Yarrowia lipolytica", ont = "mf")
> head(mf$geneset)
          mf          gene
1 GO:0000030 YALI0_C04004g
2 GO:0000030 YALI0_D10549g
3 GO:0000030 YALI0_B01672g
4 GO:0000030 YALI0_E02222g
5 GO:0000030 YALI0_A20922g
6 GO:0000030 YALI0_A13585g
> head(mf$geneset_name)
          id                           name
1 GO:0000030   mannosyltransferase activity
2 GO:0000049                   tRNA binding
3 GO:0000149                  SNARE binding
4 GO:0000166             nucleotide binding
5 GO:0000175 3'-5'-RNA exonuclease activity
6 GO:0000287          magnesium ion binding
> mf$organism
[1] NA

However, I ran into issues with follow-up functions, specifically the genORA function. It suggests there's no short name for the organism. This is perplexing given the initial inclusion of Yarrowia lipolytica in the geneset. I also tried to add the organism value, but the function still does not work.

>   gs <- genORA(de.genes$ensembl_gene_id, mf$geneset,padj_method = "BH",
+                p_cutoff = 0.05,)
Error in if (organism == "hg" | organism == "human" | organism == "hsa" |  : 
  argument is of length zero

I also tried a different route, performing the ORA with ClusterProfiler and then importing the results to genekitr. But this too resulted in an error.

>   ora_go <- clusterProfiler::enrichGO(gene = de.genes,
+                         OrgDb = org.Ylipolytica.eg.db,
+                         universe = filtered_data$entrez,
+                         keyType = "ENTREZID",  
+                         ont = "ALL",  # Biological Process
+                         pAdjustMethod = "BH",  # adjust method
+                         pvalueCutoff = 0.05,
+                         minGSSize = 5,
+                         maxGSSize = 500,
+                         readable = FALSE)
>   go_easy <- importCP(ora_go, type = "go")
Error in mapEnsOrg(object@organism) : 
Check the latin_short_name in `genekitr::ensOrg_name`

I'd appreciate any insights or suggestions you might have regarding these issues. Is there a workaround or am I possibly missing a step?
Thanks!

I'm trying to plot "gotangram" for GO BP enriched A.thaliana data. Other types like bar, upset, and the network worked fine, but gotangram is raising an error: "Error: Bioconductor orgdb for org.At.eg.db not found. You should install first.". It looks like a bug since for A.thaliana it's org.At.tair.db

test <- c('AT1G12610', 'AT5G47600', 'AT1G33760') # just to make it easy to reproduce
gs <- getGO(org="Arabidopsis thaliana", ont="bp")
go_bp <- genORA(test,geneset=gs) 
plotEnrich(go_bp, plot_type = "gotangram", sim_method = "Rel", org='Arabidopsis thaliana')

ps
Many thanks for the package. I do love it.

Plotting for ShinyGO ORA results

Describe the bug
Hello, I am attempting to perform a GO term visualization of my ShinyGO ORA results with plotEnrich. There are GO terms that plotEnrich won't recognize, is there a way to skip them entirely? thank you

Could not resolve host: genekitr-china.oss-accelerate.aliyuncs.com

Hi,

When I attempt to run

"gse <- genGSEA(genelist = ranks, geneset = gs)"

I receive the following error:

"Error in function (type, msg, asError = TRUE) :
Could not resolve host: genekitr-china.oss-accelerate.aliyuncs.com"

What is the reason for this error?

transId loading issue

Describe the bug
transId not working

To Reproduce
Steps to reproduce the behavior:
Just run the example below.

Screenshots

Desktop (please complete the following information):

OS: Windows
Version [11 ]
Browser [Brave]

transId keeping unique ids issue

Hello,

some genes are not changed to the new symbols (the new symbol is BABAM2):

Also, the information of the genee is not complete:

plotGSEA with max.overlap parameter

Hi there,

Thanks for developing this fantasy package. I have tried this package a lot, and I want to raise an issue about the visualization of the GSEA results.

In the 'classic' mode, if the genes are overlapped, they will only show part of the genes. May I ask if you could add a "max.overlap" to customize the number of showing in the GSEA plot.

Best,
Logan

transId() mouse symbols

Hello,

I was comparing between transId() and biomaRt and found that biomaRt returns more symbols than transId() from ensembl ids. They are official mgi symbols, what would be the reason?

labels overlay bars in plotEnrichAdv when left x-axis limit is less than the right limit

Hi!
When I'm trying to create a figure with plotEnrichAdv on simplified data and left xaxis limit is less than the right xaxis limit labels overlay bars of thee graph.

Let up_go_bp_sim and down_go_bp_sim be the resultant dataframes returned by genORA function ran with up- and downregulated DEGs.
Then:

Left limit is greater

plotEnrichAdv(up_go_bp_sim, down_go_bp_sim,
              plot_type = "one",
              term_metric = "FoldEnrich",
              stats_metric = "p.adjust",
              xlim_left = 15, xlim_right = 20) +
    theme(legend.position = c(0.8, 0.5))

Right limit is greater (as in the example in the documentation)
Everything is OK.

plotEnrichAdv(up_go_bp_sim, down_go_bp_sim,
              plot_type = "one",
              term_metric = "FoldEnrich",
              stats_metric = "p.adjust",
              xlim_left = 20.1, xlim_right = 20) +  # now left border is greater than the right one
    theme(legend.position = c(0.8, 0.5))

ps:
It also would be great to add more parameters to simGO function like cutoff etc.

pps:
Thanks again for the package!

transId() does not return all input symbols

Hi--

I use transId():

transId(id = IDs, transTo= "symbol", org = "mouse", keepNA = TRUE, unique = TRUE)

which should return all the input symbols; however, it returns less records and there is always a row of all NA.

Please use the same symbols file to verify.

new version is memory hungry

Hello--

In the previous version, I used to convert 30k gene symbols in one command on my machine with 32GB and never had a problem. Now, when I try to run the same command (transId) on the same symbols, even a machine with 128GN will kill the process as the memory is not enough.

load older versions of the package from CRAN

hello, is there any way one can load older versions of the genekitr package? thank you

plotGSEA classic type for non-model species

Describe the bug
Hello, I was planning to coerce a fgsea (preranked gsea) result onto a plotEnrich function for plotting, with a previous step of gene count and GeneRatio calculation, geneID_symbol mapping and column name changes so that the dataframe looked identical to the model dataframe returned by genGSEA (which I find less flexible than fgsea).

However, when I attempted plotting the results for a single category which I checked was in the gsea_df, i recieved an error

Error in `$<-.data.frame`(`*tmp*`, "gene", value = c("BnaA04g22070D", : 
replacement has 64949 rows, data has 65732

With the following traceback:

8.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
"replacement has %d rows, data has %d"), N, nrows), domain = NA)
7.
`$<-.data.frame`(`*tmp*`, "gene", value = c("BnaA04g22070D", 
NA, NA, NA, NA, "BnaC01g43250D", NA, NA, NA, NA, "BnaC07g39370D", 
"BnaA03g47170D", NA, NA, "BnaCnng19060D", NA, NA, "BnaC01g19310D", 
NA, NA, "BnaC07g50360D", NA, NA, NA, "BnaA05g03390D", NA, NA, ...
6.
`$<-`(`*tmp*`, "gene", value = c("BnaA04g22070D", NA, NA, NA, 
NA, "BnaC01g43250D", NA, NA, NA, NA, "BnaC07g39370D", "BnaA03g47170D", 
NA, NA, "BnaCnng19060D", NA, NA, "BnaC01g19310D", NA, NA, "BnaC07g50360D", 
NA, NA, NA, "BnaA05g03390D", NA, NA, NA, NA, NA, NA, NA, NA, ...
5.
calcScore(geneset, genelist, x, exponent, fortify = TRUE, org)
4.
FUN(X[[i]], ...)
3.
lapply(show_pathway, function(x) {
calcScore(geneset, genelist, x, exponent, fortify = TRUE, 
org)
})
2.
do.call(rbind, lapply(show_pathway, function(x) {
calcScore(geneset, genelist, x, exponent, fortify = TRUE, 
org)
}))
1.
genekitr::plotGSEA(BP_HDAC_list, plot_type = "classic", show_pathway = "GO:0040029")

To Reproduce
reprex_plotGSEA_filtered.xlsx

This is my excel file representing the list of different dataframes I used after preprocessing (with names "gsea_df", "genelist", "geneset", "exponent" and "org" . I am working with Brassica napus external_gene_name ENA identifiers

I filtered the gsea_result to having only 21 rows in order to preserve confidenciality of my results, but it still has the identifier Im looking forward to create a GSEA plot from, GO:0040029. If this is a problem for test generation, please confirm.

Additional context
Any other supplements?

transId() updating symbols weird behaviour

Hello--

I am updating old gene symbols with keepNA = FALSE, unique = FALSE.

I am getting some strange data (please see below).
row 138 (Gm553): is official symbol and it is returned as NA.
row 149-151: the original symbols are Ankrd44 & 4930444A19Rik.
row 156 & 157 (Mob4): it comes one time as Mob4 and one time as NA.

symbols with Excel misidentified gene names

Hi--

I wanted to report that some symbols are official and are not returned by transId().
Also, the tool does not fix the date problem of Excel.

ORA result plotting error because of duplicated terms

Describe the bug
I am trying to create bar plots of my ORA results but keep getting an error in dyplr::mutate()

To Reproduce
Steps to reproduce the behavior:
using attached testfile 'testgenelist.csv', the following code should reproduce the error

library(genekitr)
library(geneset)
gs3 <- getReactome(org = "human")
testgenes <- read.csv(file = "data/testgenelist.csv", header = TRUE, sep = ",")
## ORA Analysis
id <- testgenes$GeneID
test_ego <- genORA(id,
                        geneset = gs3,
                        p_cutoff = 0.05,
                        q_cutoff = 0.10
)

#plot
plotEnrich(test_ego, plot_type = "bar")

See error
The following error was raised (screenshot included):

plotEnrich(test_ego, plot_type = "bar")
Error in dplyr::mutate():
ℹ In argument: Description = factor(.$Description, levels = .$Description, ordered = T).
Caused by error in levels<-:
! factor level [20] is duplicated
Run rlang::last_trace() to see where the error occurred.

When rlang last trace is run:
Error in `dplyr::mutate()`:
ℹ In argument: `Description = factor(.$Description, levels = .$Description, ordered = T)`.
Caused by error in `levels<-`:
! factor level [20] is duplicated

Backtrace:
▆

├─genekitr::plotEnrich(test_ego, plot_type = "bar")
│ └─... %>% ...
├─dplyr::mutate(...)
├─dplyr:::mutate.data.frame(...)
│ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
│ ├─base::withCallingHandlers(...)
│ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
│ └─mask$eval_all_mutate(quo)
│ └─dplyr (local) eval()
├─base::factor(.$Description, levels = .$Description, ordered = T)
└─base::.handleSimpleError(...)
└─dplyr (local) h(simpleError(msg, call))

└─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Expected behavior

I expected the barplot to be generated as normal. I haven't had this issue with any other datasets I have analyzed. Inspection of the test_ego result doesn't seem to be impacted either. Dataframe of ORA result (test_ego) screenshot included.

Screenshots
testgenelist.csv

Desktop (please complete the following information):

OS: macOS
Version 12.6.5
Browser Chrome

Additional context

Problem about transId

Hi,

Thank you for your fantastic work and the great convenience you've brought to us.

However, recently, when I attempted to convert a column of IDs, despite setting both the 'keepNA' and 'unique' parameters to TRUE, I noticed that the returned data length doesn't match the input. What's even more peculiar is that when I re-enter the initially missed IDs into the function, the data is then output completely, although some may be None. The package version of genekitr is 1.2.5. Details are as mentioned above. I'm looking forward to your response, and once again, thank you for your awesome work.