cafferychen777 / ggpicrust2 Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 11.0 21.71 MB

Make Picrust2 Output Analysis and Visualization Easier

Home Page: https://cafferychen777.github.io/ggpicrust2/

License: Other

R 100.00%

16s-rrna functional-data metagenomic-analysis metagenomics microbiome microbiota picrust2 tax4fun2 visualization

ggpicrust2's Introduction

📊 Statistics PhD Student | 🧬 Multi-Omics Analyst | 🖥 R Package Developer

Hello! I'm Caffery Yang👋

I am thrilled to be joining the Statistics PhD program at Texas A&M University in Fall 2024. As a recent graduate from Southern Medical University, where I majored in Biostatistics and achieved a 3.95/4.00 GPA (top 2% of my class), I am currently visiting Jian Yang's Lab at Westlake University to further enrich my research experience before starting my doctoral studies.

🎖 Achievements

Received the National Scholarship from the Ministry of Education of China.
Secured First Prize in the National College Student Mathematics Competition.
Honored with Meritorious Winner (M Award) in the Mathematical Contest in Modeling (MCM) in the USA.

🧪 Research Interests

Under the guidance of Prof. Jun Chen at Mayo Clinic and Prof. Liangliang Zhang at Case Western Reserve University, my research primarily revolves around microbiome data analysis.

Developed the MicrobiomeStat R package, writing over 17,000 lines of R code. Comprehensive documentation is available on the package wiki, which contains in-depth guidance and tutorials exceeding 20,000 words. Additionally, an interactive MicrobiomeStat Shiny application has been created to provide a user-friendly platform for longitudinal statistical analysis and visualization of microbiome data.
Pioneered the MicrobiomeGallery collaborative platform.
Independently developed the ggpicrust2 R package, which amassed 85 stars on GitHub and over 11,000 downloads on CRAN.

My future vision embraces a multi-omics perspective, synthesizing insights from genomics, proteomics, transcriptomics, and more.

📚 Publications

Sole first author on a paper accepted by Bioinformatics.
Preparing to submit another paper as sole first author to Microbiome.

🤝 Collaboration

I am extremely open to collaborations. If you have any projects related to microbiomics or multi-omics and think we could work together, please don't hesitate to reach out. In addition to being open to new projects, I am particularly enthusiastic about assisting others with their microbiome/multi-omics data analysis. I'm also keen on collaborating with researchers and authors who have utilized my pipelines, such as ggpicrust2 or MicrobiomeStat, in their work. If you are considering or have already used these tools in your paper, I would be thrilled to explore the possibility of co-authorship. Whether it's contributing to data analysis, interpretation, or manuscript preparation, I am ready to bring my expertise to your research. Let's connect and see how we can advance the field of microbiomics together!

🛠 Projects

1. Application of LLM in Microbiome Taxonomy

Claude's innovative approach using the entire NCBI database with the LLM offers direct taxonomic insights from FASTQ sequences. A potential game-changer when compared with traditional methods.

2. Performance of Microbiome DA Methods in Other Omics

Exploring how microbiome DA methods, known for processing the compositional nature of microbiome data, perform on other omics.

3. Generative AI in Microbiome Data Analysis and Accuracy Assessment

Assessing the LLM model's capability as a microbiome data analysis expert against real-world experts.

4. PI_find

A tool designed to assist graduate school aspirants in discovering PI names from specific professional journals.

5. Fine-Tuning Platform for Bioinformatics Repositories

Aiming to develop a platform to fine-tune ChatGPT with bioinformatics repositories, enhancing query results for newer tools.

📫 Connect with me

ggpicrust2's People

Contributors

Stargazers

Watchers

Forkers

yuhe-kan lvelosuarez luffyluffy liuchao-nj ahderojas jahernayeem liaochenlanruo aristotle1992 sirusb bioinfo-truth lixinyu1033

ggpicrust2's Issues

My pred_metagenome_unstrat file has EC:1.1.1.1 and not KO format

For my pred_metagenome_unstrat file I have the EC:1.1.1.1 and not the Ko annotation and from the best of my knowledge, this file ought to be my abundance file. However, I had to take a long route to convert the EC annotation to KO annotation and one problem with that is that not all EC annotation had a corresponding KO annotation, so i had to drop some EC during conversion to KO. Is it possible to use ggpicrust2 to analyze EC annotated pred_metagenome_unstrat and if yes, is it possible that i use the following script:

ko_abundance_file <- "path/to/your/pred_metagenome_unstrat.tsv"
kegg_abundance <- ko2kegg_abundance(ko_abundance_file) # Or use data(kegg_abundance)

metadata <- read_delim("path/to/your/metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

The default DAA method is "ALDEx2"

Please change group to "your_group_column" if you are not using example dataset

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Environment", daa_method = "linDA", select = NULL, p.adjust = "BH", reference = NULL)

If you have more than 3 group levels and want to use the LinDA, limma voom, or Maaslin2 methods, you should provide a reference.

metadata <- read_delim("path/to/your/metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

Please change group to "your_group_column" if you are not using example dataset

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Group", daa_method = "LinDA", select = NULL, p.adjust = "BH", reference = "Harvard BRI")

daa_annotated_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_results_df, ko_to_kegg = TRUE)

Please change Group to metadata$your_group_column if you are not using example dataset

p <- pathway_errorbar(abundance = kegg_abundance,
daa_results_df = daa_annotated_results_df,
Group = metadata$Environment,
ko_to_kegg = TRUE,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
p_value_bar = TRUE,
colors = NULL,
x_lab = "pathway_name")

Or do i use the pred_metagenome_unstrat_descrip for my Ko abundance file as it has a Column for description? Please i need clarification on this. Thanks

This image is an snapshot of the pred_metagenome_unstart.tsv file

ggpicrust2 no plot!

error_ggpicrust2

Hello,
I run the code below:

results <- ggpicrust2(file = ko_abundance,
metadata = metadata,
group = "Vaccine",
pathway = "KO",
daa_method = "LinDA",
order = "pathway_class",
ko_to_kegg = T,
x_lab = "pathway_name",
p.adjust = "BH",
select = NULL,
reference = NULL)

and I get this error: Error in switch(file_format, .txt = abundance <- readr::read_delim(file, :
EXPR must be a length 1 vector

What does it mean, and what do I do to correct it?

pathway_daa(): Error in contrasts<-(tmp, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Hi, I have another question, now about the function pathway_daa

I am running the following codes. While the first works normally, I receive the following error in the second:

daa_results <- pathway_daa(kegg_abundance, meta_table, 'Day', daa_method = 'ALDEx2', select = NULL,
p.adjust = 'bonferroni', reference = NULL)

daa_results2 <- pathway_daa(ko_abundance, meta_table, 'Day', daa_method = 'ALDEx2', select = NULL,
p.adjust = 'bonferroni', reference = NULL)

ALDEx2 takes a long time to complete the calculation, please wait patiently.
operating in serial mode
computing center with all features
operating in serial mode
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

I am confused about this error, since the only difference between the codes is the first dataframe. Can you please help me with this too?

Bot test

@all-contributors please init

ggpicrust2(): Error in metadata_mat[, group] : subscript out of bounds

Hi Caffery
This is Frank from Penn State University. Your development of ggpicrust2 is really impressive! I feel blessed to have such a package that can help visualize the picrust2 output.
I encountered a problem while running the package, hope you can help
I run picrust2 following the pipeline on https://github.com/picrust/picrust2/wiki/PICRUSt2-Tutorial-%28v2.5.0%29. Then use the output from this pipeline as the input to ggpicrust2. Then I run the ggpicrust2 script you wrote on the website:

   metadata <-
  read_delim(
    "metadata.tsv",
    delim = "\t",
    escape_double = FALSE,
    trim_ws = TRUE
  )

group <- "Enviroment"

daa_results_list <-
  ggpicrust2(
    file = "pred_metagenome_unstrat.tsv",
    metadata = metadata,
    group = "Environment",
    pathway = "KO",
    daa_method = "LinDA",
    p_values_bar = TRUE,
    p.adjust = "BH",
    ko_to_kegg = TRUE,
    order = "pathway_class",
    select = NULL,
    reference = NULL # If your metadata[,group] has more than two levels, please specify a reference.
  )

It gives the following error message: Error in metadata_mat[, group] : subscript out of bounds
Do you have any suggestions on how to fix this error? Maybe I need to make some change on the group parameter? Thank you.
Best
Frank

Question: how to extract mapping data in `pathway_errorbar`

Dear cafferychen777,

Thank you for this beautiful tool!

But I got a tiny question and want your help.

The question is how to extract data in pathway_errorbar step?

In particular, how to obtain the 'relatively abundance' data in each group in the above figure.

Many thanks for your help!

A bug in the function pathway_errorbar()

Hi,

Thank you for developing this package.

As I ran through the workflow, I got an error with the funciton pathway_errorbar()
Error in daa_results_df$pathway_name$description : $ operator is invalid for atomic vectors

The problem comes from this line of code in the function:

      if (is.null(daa_results_df$pathway_name)&is.null(daa_results_df$pathway_name$description)){
        message("Please use pathway_annotation to annotate the daa_results_df")
      }

precisely: is.null(daa_results_df$pathway_name$description)

The output of daa_results_df has a description column but not under pathway_name.

This will work if I change it to is.null(daa_results_df$description).

Hope you can fix this.

Thanks,

Error in differential abundance analysis

Hi,
I tried to use your package for my dataset and only got it to work partly. The requirements for the input format for the functions are quite rigid, and then also not the same for all the functions. It would be nice if you could harmonise this.
Until then, for everyone with the same problem, here's the code that worked for me:

library(ggpicrust2)
#I used the picrust2_pipeline.py script, so those are the path and name automatically given to the file.
abundance_file <- "picrust2_out_pipeline/KO_metagenome_out/pred_metagenome_unstrat.tsv"
abundance <- ko2kegg_abundance(abundance_file)

#using metadata from a phyloseq object, so first converting to a normal dataframe
meta <- data.frame(sample_data(physeq))
meta$sample_name <- rownames(meta) #important to name the column sample_name
meta <- tibble(meta)

#I'm comparing different water bodies
group <- "water_body"

#this is the only command that needs the abundance table as a matrix and metadata as a dataframe.
#Additionally, the sample names need to be in a column which is called "sample_names", although this is nowhere stated in the documentation. 
pathway_heatmap(as.matrix(abundance), data.frame(meta), group)+
               theme(axis.text.x = element_text(size = 8, color = "black")) #to add sample names

pathway_pca(abundance = abundance, metadata = meta, group = group)

daa <- pathway_daa(abundance = abundance, metadata = meta, group = group,
                   daa_method = "ALDEx2")
daa_kegg <- pathway_annotation(daa_results_df = daa, ko_to_kegg = TRUE)

This is how far I got.
pathway_errorbar() needs the package ggprism to be installed, maybe this should be added as a dependency?
But it still doesn't work, here's the code I tried:

pathway_errorbar(abundance = abundance,
                 daa_results_df = daa_kegg[1:20,],  #subsetting to have less than 30
                 Group = meta$water_body,
                 ko_to_kegg = TRUE,
                 p_values_threshold = 0.05,
                 order = "group",
                 select = NULL,
                 p_value_bar = TRUE,
                 colors = NULL,
                 x_lab = NULL)

The error message is: Insufficient values in manual scale. 3 needed but only 2 provided.
Doesn't work either when I provide colors manually by setting colors = c("blue", "green", "red"). Any idea how to fix this?
In addition, I couldn't figure out how to use the select parameter, tried select = c("ko05340", "ko00562") but that didn't work, so I resorted to subsetting the input dataframe.

Other remarks:
It would be cool if you could select to plot PC3 and PC4 in the PCA, not only PC1 and PC2.
How to add the group labels in the heatmap as you did in the paper preprint?

Thanks for creating this package!

Error in `ggplot_add()`

why?

Tiny bug in ggpicrust2 function?

Hi there,
thanks a lot for providing this nice package.
I think, I found a tiny bug in the ggpicrust2 function:

In line 80 (of ggpicrust2.R) it reads
if (select == "NULL"){
which gives the error "Error in if (select == "NULL") { : argument is of length zero"
shouldn't it be
if (is.null(select)){ ?

pathway_errorbar(): "Error in guide_train.prism_offset_minor" when ggprism loaded

Hi again, different issue this time!

Because the KEGG database annotation was taking so long for me, I decided to run through the workflow for the EC annotation. Much faster! But I am still running into some new errors at the pathway_errorbar() stage. Below is my code, and data so you can hopefully reproduce the issue.
But first, my environment is as follows:
package loadedversion
ape ape 5.7-1
dplyr dplyr 1.1.2
forcats forcats 1.0.0
genefilter genefilter 1.80.3
ggpicrust2 ggpicrust2 1.6.2
ggplot2 ggplot2 3.4.2
ggprism ggprism 1.0.4
ggpubr ggpubr 0.6.0
HTSSIP HTSSIP 1.4.1
lattice lattice 0.21-8
lubridate lubridate 1.9.2
patchwork patchwork 1.1.2
permute permute 0.9-7
phyloseq phyloseq 1.42.0
purrr purrr 1.0.1
readr readr 2.1.4
stringr stringr 1.5.0
tibble tibble 3.2.1
tidyr tidyr 1.3.0
tidyverse tidyverse 2.0.0
vegan vegan 2.6-4

Now my code:

library(phyloseq)
library(ggplot2)
library(ape)
library(vegan)
library(ggpubr)
library(tidyverse)
library(genefilter)
library(HTSSIP)
library(ggprism)
library(patchwork)
library(ggpicrust2)


metadata <- read_table("../Input files/ADWMBAT Combined Metadata.txt")

metadata <- metadata %>%
  filter(sex == "F" & all_data == "Y") %>%
  select(!X8) %>%
  mutate(genotype = as.factor(genotype))
# read in sample meta data as a tibble, and set genotype as a factor


ko_abundance <-
  read.delim(
    "../Data/For PICRUSt/picrust2_out_pipeline/EC_metagenome_out/pred_metagenome_unstrat.tsv"
  )
# load the EC count data in

rownames(ko_abundance) <- ko_abundance$function.
ko_abundance <- ko_abundance[, -1]
# remove the first column of function names by setting them as the row names. This is done because it's the format the next step, pathway_daa(), expects to see.

ko_abundance <- ko_abundance %>%
  rename('24' = X24,
         '26' = X26)
# rename some column names to match metadata file


daa_results_df <-
  pathway_daa(
    abundance = ko_abundance,
    metadata = metadata,
    group = "genotype",
    daa_method = "LinDA",
    select = NULL,
    reference = "WT",
    p.adjust = "none"
  )
# run the differential abundance calcualtion step using LinDA and no p-value adjustment

daa_annotated_sub_method_results_df <-
  pathway_annotation(pathway = "EC",
                     daa_results_df = daa_results_df,
                     ko_to_kegg = FALSE)

# select top 15 differentially expressed features 
# this done because there were too many "significant" without using FDR correction
daa_annotated_sub_method_results_df_filtered <- daa_annotated_sub_method_results_df %>%
  arrange(p_adjust) %>%
  slice_head(n = 15)

ko_abundance_plot <- ko_abundance %>%
  filter(rownames(ko_abundance) %in% daa_annotated_sub_method_results_df_filtered$feature)
# above was attempted to see if the ko_abundance object needed to be the same dimensions as the daa_results. didn't help.

daa_results_list <-
  pathway_errorbar(
    abundance = ko_abundance,
    daa_results_df = daa_annotated_sub_method_results_df_filtered,
    Group = "genotype",
    p_values_threshold = 0.05,
    order = "group",
    select = NULL,
    ko_to_kegg = FALSE,
    p_value_bar = TRUE,
    colors = NULL,
    x_lab = "description"
  )

print(daa_results_list)

For me executing all this code produces this error:
"Error in guide_train.prism_offset_minor(guide, panel_params[[aesthetic]]) :
No minor breaks exist, guide_prism_offset_minor needs minor breaks to work
In addition: Warning messages:
1: Removed 15 rows containing missing values (geom_bar()).
2: Removed 15 rows containing missing values (geom_stripped_cols()). "

Trying to view() the daa_results_list object issues the following error:
"Error: Index out of bounds"

Here are my files:
pred_metagenome_unstrat.tsv.gz

ADWMBAT Combined Metadata.txt

Thanks for any help you can provide!

ggpicrust2(): Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Hi there,

I'm trying out ggpicrust2 on a small subset of data (eight samples) to iron out any problems before using the entire dataset. It wouldn't read my .tsv, so I created the metadata tibble. Could this you provide me with more information on the following error?

alculation may take a long time, please be patient.
The kegg pathway with zero abundance in all the different samples has been removed.
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

#If you want to analysis kegg pathway abundance instead of ko within the pathway. You should turn ko_to_kegg to TRUE.
#The kegg pathway typically have the more explainable description.
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
metadata <- tibble(
sample_name = c("lessQ1", "lessQ2", "lessQ3", "lessQ4", "moreQ1", "moreQ2", "moreQ3", "moreQ4"),
group = c("less", "less", "less", "less", "more", "more", "more", "more")
)

daa_results_list <-
ggpicrust2(
file = "h:/picrust2_out/KO_metagenome_out/pred_metagenome_unstrat.tsv",
metadata = metadata,
group = "group",
pathway = "KO",
daa_method = "LinDA",
p_values_bar = TRUE,
p.adjust = "BH",
ko_to_kegg = TRUE,
order = "pathway_class",
select = NULL,
reference = NULL # If your metadata[,group] has more than two levels, please specify a reference.
)

Questions about KEGG pathways

Hi again!

I have two questions about converting KOs to KEGG pathways.

Regardless if I am analyzing KOs or KEGG pathways, I am using the "pred_metagenome_unstrat.tsv" as my input (please let me know if this is correct), and now I want to extract the description of each KO/KEGG pathway together with their abundance. This was easy to do for the KOs, but I am not sure how to do it for the pathways. Can you please help me with that?

I am also struggling with the pathway_annotation for a different project (pathway_annotation(pathway = 'KO', daa_results_df = daa_results_kegg_DNA2_1, ko_to_kegg = TRUE)). Due to the size, I split my dataframe in 3 before using pathway_annotation (df1, df2, and df3). Somehow, df2 and df3 are promptly annotated but not df1. I even left my computer overnight and nothing happened. I have no idea why this is happening since the dataframes are quite similar. Do you have any thoughts on this?

Thanks,
Andressa

Question: how to annotate pathway information using the output file from PICRUSt2?

Dear developers,

Thank you for this tool!

I was wondering if there is an example file for using the patway_annotation function in the following case:

# Use case 1: Annotating pathway information using the output file from PICRUSt2
result1 <- pathway_annotation(file = "path/to/picrust2/export/file.txt",
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

I tried using the default output from PICRUST2, containing the pathways descriptions (i.e. path_abun_unstrat_descrip.tsv) as follows but it didnt work:

pathways <-
read_delim(
    "path_abun_unstrat_descrip.tsv",
    delim = "\t",
    escape_double = TRUE,
    trim_ws = TRUE
  )

daa_annotated_sub_method_results_df <- pathway_annotation(file = pathways,
pathway = "KO",
daa_results_df = NULL,
ko_to_kegg = FALSE)

Any suggestions?

Thank you in advance!

Cheers,

Lia

pathway_annotation(): Error in `$<-.data.frame`(`tmp`, "pathway_description"......", : replacement has 3 rows, data has 1

Hi Caffery,

Thanks a lot for this useful package!
I am analyzing some data and keep getting the following error:

Do you have any suggestions to fix it?

Thanks for your help!

Best wishes,

Eliana

pathway_errorbar(): "The 'group1' or 'group2' column in the 'daa_results_df' data frame contains more than one group. Please filter each to contain only one group."

Hello!

I am having an issue when running ggpicrust2 function and, when I tried to explore the possible source of error, I found out that the error is happening with the pathway_errorbar function. The title of the issue is the error message I am getting... no details...

Then I looked into the pathway_errorbar function and saw that I can have an error if my target variable has more then 2 categories (I have 4, and used Group 1 as reference for LinDA:

if (nlevels(factor(daa_results_df$group1)) != 1 || nlevels(factor(daa_results_df$group2)) != 1) {
message(
"The 'group1' or 'group2' column in the 'daa_results_df' data frame contains more than one group. Please filter each to contain only one group."

I believe this is my error, but I am not getting this error message as output.

Do you have any suggestions (tutorial?) on how to perform this analysis, since I have 4 categories in my target atribute?

Amazing package! Thanks in advance for your attention!

pathway_errorbar(): Error: Index out of bounds

Hello @cafferychen777, I hope you are doing well. I have been trying ggpicrust2, which I find amazing. I started with the provided tutorials, and it worked correctly when plotting the results. However, I encountered a problem when analyzing my own results. Whenever I try to plot using either the suggested mode or step-by-step, it generates the following error: "Error: Index out of bounds," and the plot is not generated. I looked for past publications with the same error, but the solutions provided didn't work for me. I'm attaching the files
used for the analysis: https://drive.google.com/drive/folders/1G48A4LPkMy6Vko7zLx3egtDtPVOdBEm-?usp=sharing

/library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)

 data(metadata)
metadata <- read_delim("analisis/metadata4.tsv", delim = "\t", escape_double = FALSE, trim_ws = TRUE) 

kegg_abundance <- ko2kegg_abundance("analisis/pred_metagenome_unstrat.tsv") 

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "Grupos", daa_method = "LinDA", select = NULL, reference = NULL) 

daa_sub_method_results_df <- daa_results_df[daa_results_df$method == "LinDA", ]

daa_annotated_sub_method_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_sub_method_results_df, ko_to_kegg = TRUE)
daa_annotated_sub_method_results_df <- daa_annotated_sub_method_results_df[!is.na(daa_annotated_sub_method_results_df$pathway_name),]

daa_annotated_sub_method_results_df$p_adjust <- round(daa_annotated_sub_method_results_df$p_adjust,5)

low_p_feature <- daa_annotated_sub_method_results_df[order(daa_annotated_sub_method_results_df$p_adjust), ]$feature[1:20]

p <- ggpicrust2::pathway_errorbar(
  abundance = kegg_abundance,
  daa_results_df = daa_annotated_sub_method_results_df,
  Group = metadata$Grupos,
  p_values_threshold = 0.05,
  order = "pathway_class",
  select = low_p_feature,
  ko_to_kegg = TRUE,
  p_value_bar = FALSE,
  colors = NULL,
  x_lab = "pathway_name"
)

#Error result

> p <- ggpicrust2::pathway_errorbar(
+   abundance = kegg_abundance,
+   daa_results_df = daa_annotated_sub_method_results_df,
+   Group = metadata$Grupos,
+   p_values_threshold = 0.05,
+   order = "pathway_class",
+   select = low_p_feature,
+   ko_to_kegg = TRUE,
+   p_value_bar = FALSE,
+   colors = NULL,
+   x_lab = "pathway_name"
+ )
Error: Index out of bounds

pathway_heatmap not returning as expected

I ran the following code

pathway_heatmap(t(abundance_matrix), metadata, "Txt")

with these files.

metadata.txt
abundance_matrix.csv

I then got something that looks like this:

Even ignoring the crazy size, it looks like all the cells of the heatmap are approximately the same color, so there's no range represented. I know that there are significant differences between my VEH and MOR level of the Txt group (see below).

What is going on?

compare_daa_results(): Maaslin2 method changed the names of the metacyc pathways, replacing a hyphen (-) with a dot (.)

I had an issue with this function with my own data, but also with the sample dataset

library(ggpicrust2)
library(tidyverse)
data("metacyc_abundance")
data("metadata")

methods <- c("ALDEx2", "DESeq2", "edgeR")
daa_results_list <- lapply(methods, function(method) {
  pathway_daa(abundance = metacyc_abundance %>% column_to_rownames("pathway"), metadata = metadata, group = "Environment", daa_method = method)
})

method_names <- c("ALDEx2_Welch's t test","ALDEx2_Wilcoxon rank test","DESeq2", "edgeR")

comparison_results <- compare_daa_results(daa_results_list = daa_results_list, method_names = method_names)

daa_results_list contains three elements, but later method_names has four elements. As a result, comparison results provide a table where the method names do not correspond to the actual method used. The two variations of ALDEx2 ("ALDEx2_Welch's t test","ALDEx2_Wilcoxon rank test") are in the same pathway_daa result, however in the comparison table they are considered separately and wrongfully assigned to another method result

Error after pathway_errorbar

Hi (and thanks for creating this tool!)

I am running ggpicrust2 one function at a time but struggling with the pathway_errorbar function.

My code
kegg_abundance <- ko2kegg_abundance(abundance_file)

daa_results <- pathway_daa(kegg_abundance, meta_table, 'Day', daa_method = 'ALDEx2', select = NULL,
p.adjust = 'bonferroni', reference = NULL)

daa_results_method <- daa_results[daa_results$method == 'ALDEx2_Kruskal-Wallace test', ]

daa_results_method1 <- daa_results_method[1:100,]
daa_results_method2 <- daa_results_method[101:200,]
daa_results_method3 <- daa_results_method[201:253,]

pa_results1 <- pathway_annotation(pathway = 'KO', daa_results_df = daa_results_method1, ko_to_kegg = TRUE)
pa_results2 <- pathway_annotation(pathway = 'KO', daa_results_df = daa_results_method2, ko_to_kegg = TRUE)
pa_results3 <- pathway_annotation(pathway = 'KO', daa_results_df = daa_results_method3, ko_to_kegg = TRUE)

pa_results <- rbind(pa_results1, pa_results2, pa_results3)

pa_results_padj <- subset(pa_results, p_adjust < 0.001)

Group <- meta_table$Day

daa_results_list <- pathway_errorbar(abundance = kegg_abundance, daa_results_df = pa_results,
Group = Group,
ko_to_kegg = TRUE, p_values_threshold = 0.05, order = "pathway_class",
select = 'NULL',
p_value_bar = TRUE,
colors = NULL,
x_lab = 'description')

Output of pathway_errorbar
The feature with statistically significance is zero, pathway_errorbar can't do the visualization.
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'physeq' in selecting a method for function 'taxa_are_rows': invalid class “otu_table” object:
OTU abundance data must have non-zero dimensions.

Can you please help me with this error? Everything else worked fine! My pa_results table doesn't have a column called x_lab. Is this the problem?

pathway_daa(): object 'p_values_df' not found"

Hello,

I am new to picrust2 and ggpicrust2.

I have obtained the output files from picrust2 and wanted to analyse using ggpicrust2, however, keep receiving an error using the ggpicrust2 function.

Using this command obtained from the tutorial:

results_file_input <- ggpicrust2(file = abundance_file,
metadata = metadata,
group = "Disease",
pathway = "KO",
daa_method = "Maaslin2",
reference = "Healthy",
ko_to_kegg = TRUE,
p.adjust = "BH",
order = "pathway_class",
p_values_bar = FALSE,
x_lab = "pathway_name")

I receive some sort of error in every daa_method used.

For LinDA, "Error in ggpicrust2(file = abundance_file, metadata = metadata, group = "Disease", : There are no statistically significant biomarkers" which I know is not an actual error, rather the statistical output that there are no significant biomarkers.

For Maaslin2, "Error in p.adjust(p_values_df$p_values, method = "none") : object 'p_values_df' not found". Even when I set p_adjust = "none", I still recieve this error.

For Deseq2 and metagenomeseq, "Error in if (sum(as.numeric(daa_results_df$p_adjust <= 0.05)) == 0) { : missing value where TRUE/FALSE needed".

Ironically, the only daa method which works is "limma voom", however, I am not using RNA-seq data.

Are there any suggestions you can offer?

Thank you,
Carmen

pathway_daa(): There are no statistically significant biomarkers

What could be the meaning of this error. May metadata file seems to be okay but this is the result I get when I run the ggpicrust() func.

pathway_annotation(): Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer

Dear Caffery,

Thank you for developing the ggpicrust2 package!
I followed the codes on the tutorial with my own dataset, I ran the command pathway_annotation():
daa_annotated_sub_method_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_sub_method_results_df[1:50,], ko_to_kegg = TRUE)
but encountered the following error:
Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer
May I have some advice on how to solve it, thank you!

Best regards,
Ivan

pathway_errorbar error in evaluating the argument 'physeq'

pathway_errorbar(): color was not mapped on the road when saving

Hello @cafferychen777, I used the ggsave function to save the pathway_errorbar in pdf format, and found that the color was not mapped on the road, but was saved in png format, so the color could normally mapp in the road. What's the problem?
Code：
library(ggpicrust2)
library(tidyverse)
library(GGally)
library(ggprism)
library(patchwork)
library(ggh4x)
data("ko_abundance")
data("metadata")
kegg_abundance <- ko2kegg_abundance(data = ko_abundance)
daa_results_df <- pathway_daa(kegg_abundance, metadata = metadata,
group = "Environment", daa_method = "LinDA")
daa_annotated_results_df <- pathway_annotation(pathway = "KO",
daa_results_df = daa_results_df, ko_to_kegg = TRUE)
pathway_errorbar(abundance = kegg_abundance,
daa_results_df = daa_annotated_results_df,
Group = metadata$Environment,
ko_to_kegg = TRUE,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
p_value_bar = TRUE,
colors = NULL,
x_lab = "pathway_name")
ggsave(filename = "pathway_errobar1.pdf", width = 35, height = 15, units = "cm", dpi = 600)
ggsave(filename = "pathway_errobar1.png", width = 35, height = 15, units = "cm", dpi = 600)

pathway_errobar1.pdf

Error in pathway_errorbar(abundance = metcyc_abundance_data %>% column_to_rownames("pathway"), : The feature with statistically significance is zero, pathway_errorbar can't do the visualization.

Sorry for constantly posting issues, but I was able to make PCA plots of my metacyc data, but I cannot form error bars of the pathways. Here is my code

`metcyc_abundance_file <- "path_abun_unstratroots.tsv"
metcyc_abundance_data <- read_delim(metcyc_abundance_file, delim = "\t", col_names = TRUE, trim_ws = TRUE)
view(metcyc_abundance_data)
metadataM<- read_delim(
"~/Desktop/microbiome2/MetadataROOTS.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE)
view(metcyc_abundance_data)
view(metadataM)

This is the process of getting an error bar,heat map and hopefully pca of metacyc abundance

Perform differential abundance analysis

#START HERE
metacyc_daa_results_df <- pathway_daa(abundance = metcyc_abundance_data%>% column_to_rownames("pathway"), metadata = metadataM, group = "ENV_C", daa_method = "ALDEx2")

Annotate the results

metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = FALSE)

view(meta)
metacyc_daa_annotated_results_df <- metacyc_daa_annotated_results_df [!is.na(metacyc_daa_annotated_results_df$description),]
metacyc_daa_annotated_results_df$p_adjust <- round(metacyc_daa_annotated_results_df$p_adjust,5)
low_p_feature <- metacyc_daa_annotated_results_df[order(metacyc_daa_annotated_results_df$p_adjust), ]$description[1:20]

pathway_errorbar(abundance = metcyc_abundance_data %>% column_to_rownames("pathway"), daa_results_df = metacyc_daa_annotated_results_df, Group = metadataM$ENV_C, ko_to_kegg = FALSE, p_values_threshold = 0.05, order = "group", select = low_p_feature, p_value_bar = TRUE, colors = NULL, x_lab = "description")
`

I get the error Error in pathway_errorbar(abundance = metcyc_abundance_data %>% column_to_rownames("pathway"), :
The feature with statistically significance is zero, pathway_errorbar can't do the visualization.
Does that mean that I cannot visualize these error bars at all? I was able to make error bars with the KEGG pathways, and for my project having the descriptions of the Metacyc pathways would be good as well.

Thank you for your quick and helpful responses to my errors.

pathway_annotation() long runtime, connecting to KEGG database

Question: Is there a, general, expected runtime for the KEGG database connection step of the pathway_annotation() command?

This is my first time using the package, and I am passing (what I think is) a relatively small number of features, 228, to the annotation step - yet it's been running at the "We are connecting to the KEGG database to get the latest results, please wait patiently." step for ~6 hours.

If this runtime is expected, would it be possible to download the annotations ourselves and pass them to the annotation() command locally?

Thanks for any help you can provide!

pathway_annotation(): Error in $<-.data.frame(tmp, "pathway_description"

Hi- thanks a ton for developing ggpicrust2.

I encounter the following error both when I use the full pipeline command, ggpicrust2() and the manual step-by-step mode suggested in case of errors. It looks like the issue is that there is a line break in the KEGG description for Ko04910.

https://www.genome.jp/dbget-bin/www_bget?ko04910

Is there a straightforward fix I can make locally? I'm fairly proficient in python but have very little experience with R.

We are connecting to the KEGG database to get the latest results, please wait patiently.
Error in $<-.data.frame(*tmp*, "pathway_description", value = c("Insulin binding to its receptor results in the tyrosine phosphorylation of insulin receptor substrates (IRS) by the insulin receptor tyrosine kinase (INSR). This allows association of IRSs with the regulatory subunit of phosphoinositide 3-kinase (PI3K). PI3K activates 3-phosphoinositide-dependent protein kinase 1 (PDK1), which activates Akt, a serine kinase. Akt in turn deactivates glycogen synthase kinase 3 (GSK-3), leading to activation of glycogen synthase (GYS) and thus glycogen synthesis. Activation of Akt also results in the translocation of GLUT4 vesicles from their intracellular pool to the plasma membrane, where they allow uptake of glucose into the cell. Akt also leads to mTOR-mediated activation of protein synthesis by eIF4 and p70S6K. The translocation of GLUT4 protein is also elicited through the CAP/Cbl/TC10 pathway, once Cbl is phosphorylated by INSR.", :
replacement has 2 rows, data has 1

pathway_errorbar(): `levels<-`(`tmp`, value = as.character(levels)) : factor level [3] is duplicated

Creating pathway error bar plots...
Error in `levels<-`(`*tmp*`, value = as.character(levels)) : 
  factor level [3] is duplicated

when running

results_file <- ggpicrust2(file = abundance_file,
                                 metadata = metadata,
                                 group = "treatment", 
                                 pathway = "KO",
                                 daa_method = "LinDA",
                                 ko_to_kegg = TRUE,
                                 order = "pathway_class",
                                 p_values_bar = TRUE,
                                 x_lab = "pathway_name",
                                reference = "C")

My metadata consists of a column "treatment" with 6 different levels. levels(metadata$treatment) NULL . Same error occurs after applying levels. metadata$treatment <- factor(metadata$treatment, levels=c("A", "B", "C", "D", "E", "F"))
I can run ggpicrust2 with reference "A" and "B" without any issues and I can run it with another dataset with similar metadata structure without any issues.
Could you please help me fix this error?
Thank you!

finding KO's that contributed to KEGG pathway, KO_to_kegg=FALSE, and Implementing LEfSe

I have four different questions - I hope it's not too much to post them here all at once! :)

I successfully used ggpicrust2 to run a DESeq2 analysis on my data (code below):

results_DESeq2_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "DESeq2", reference = "Control", p.adjust = "BH", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

This gives me an output with graphics and stuff, HOWEVER, I also get this printed out:

Performing pathway differential abundance analysis... DESeq2 is only suitable for comparison between two groups. converting counts to integer mode it appears that the last variable in the design formula, 'Group_group_nonsense', has a factor level, 'Control', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').

Control is, in fact, a level in "cause_control_status," so I'm not quite sure what to do to fix that.

I get 3 KEGG pathways that are significantly differently expressed from the above code, which is great. I would like to know which KO's contributed to that KEGG pathway - is there a way to find this out? For example, the pathway "Epithelial cell invasion" is unregulated in my treatment. How do I find out which KO's specifically contributed to this "Epithelial cell invasion" pathway?
When I change KO_to_kegg to FALSE (as shown below), I get this error:

results_DESeq2_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "DESeq2", reference = "Control", p.adjust = "BH", ko_to_kegg = FALSE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

Performing pathway differential abundance analysis... DESeq2 is only suitable for comparison between two groups. converting counts to integer mode it appears that the last variable in the design formula, 'Group_group_nonsense', has a factor level, 'Control', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2'). using pre-existing size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing -- replacing outliers and refitting for 1699 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing Annotating pathways... Creating pathway error bar plots... Error in [.data.frame(daa_results_df, , x_lab) : undefined columns selected

I suspect I have the wrong label for the x-axis, but I don't know what the appropriate one is here.

I would like to implement LEfSe to analyze this data as well, but when I put it in as a method the code won't run. Is there something I am doing wrong on input?

results_Lefse_BH <- ggpicrust2(file = ko_abundance, metadata = metadata, group = "case_control_status", pathway = "KO", daa_method = "Lefse", reference = "Control", p.adjust = "BH", ko_to_kegg = TRUE, order = "pathway_class", p_values_bar = TRUE, x_lab = "pathway_name")

Performing pathway differential abundance analysis... Error in p.adjust(p_values_df$p_values, method = "BH") : object 'p_values_df' not found

Thank you so much!

Error: could not find function "memory.limit" Execution halted Warning in install.packages : installation of package 'ggpicrust2' had non-zero exit status

hi when I install ggpicrust2. I met an error. does anyone know how to fix it. I used MAC with M1 chip. I can't find any informations related this issues on the internet, it will be great if somebody can help me. thanks so much

pathway_errorbar(): Can't add `e2` to a <ggplot> object.

I ran the following lines of code and got an error while running thr pathway_errorbar code

If you want to analyze KEGG pathway abundance instead of KO within the pathway, turn ko_to_kegg to TRUE.

KEGG pathways typically have more explainable descriptions.

Load metadata as a tibble

data(metadata)

metadata <- read_delim("C:/Microbiome/picrust/Rhizo/check/rhizo_metadata.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)

Load KEGG pathway abundance

data(kegg_abundance)

kegg_abundance <- ko2kegg_abundance("C:/Microbiome/picrust/Rhizo/check/pred_metagenome_unstrat.tsv")

Perform pathway differential abundance analysis (DAA) using ALDEx2 method

Please change group to "your_group_column" if you are not using example dataset

daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "GDD", daa_method = "ALDEx2", select = NULL, reference = NULL)

Filter results for ALDEx2_Welch's t test method

Please check the unique(daa_results_df$method) and choose one

daa_sub_method_results_df <- daa_results_df[daa_results_df$method == "ALDEx2_Wilcoxon rank test", ]

Annotate pathway results using KO to KEGG conversion

daa_annotated_sub_method_results_df <- pathway_annotation(pathway = "KO", daa_results_df = daa_sub_method_results_df, ko_to_kegg = TRUE)

Generate pathway error bar plot

Please change Group to metadata$your_group_column if you are not using example dataset

p <- pathway_errorbar(abundance = kegg_abundance, daa_results_df = daa_annotated_sub_method_results_df, Group = metadata$GDD,
p_values_threshold = 0.05, order = "pathway_class", select = daa_annotated_sub_method_results_df %>% arrange(p_adjust) %>% slice(1:20) %>% dplyr::select(feature) %>% pull(),
ko_to_kegg = TRUE, p_value_bar = TRUE, colors = NULL, x_lab = "pathway_name")

I got hte following error while trying to generate the error bar plot

The following pathways are missing annotations and have been excluded: ko00281, ko00471, ko00472, ko00473
You can use the 'pathway_annotation' function to add annotations for these pathways.
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Error in ggplot_add():
! Can't add e2 to a object.
Run rlang::last_trace() to see where the error occurred.

Please how can i resolve this?

ggpicrust Install Issues

Hello! I really like the idea of ggpicrust. It looks like an efficient means of quickly analyzing picrust. Are there any plans to dockerize the package? I'm having an extremely hard time installing this package with endless dependency problems. I've tried just in R - just in Conda - in R and Conda and from renv::restore (renv.lock) all to no avail. For each dependency problem I install that package and try again and every time there is a different dependency problem. I'm at the point of giving up and writing something myself to analyze picrust data. I've read the other two issue post on dependency problems none of which have been able to solve my problem. If this package was dockerized I think it would help not only me but future people looking to use this package.

pathway_errorbar(): Error: Index out of bounds

Group <-metadata1$CST # column which you are interested in metadata

daa_results_list <-
pathway_errorbar(
abundance = kegg_abundance,
daa_results_df = daa_results_pt3,
Group = Group,
p_values_threshold = 0.05,
order = "pathway_class",
select = NULL,
ko_to_kegg = FALSE,
p_value_bar = TRUE,
colors = NULL,
x_lab = "pathway_name")
Error: Index out of bounds
daa_miscpt3.CSV
[metadataU.txt](https://github.
kegg_abundance.csv
com/cafferychen777/ggpicrust2/files/11314101/metadataU.txt)

Cannot visualize any data

@all-contributors please init

Error in curl::curl_fetch_memory

感谢作者发布这么好用的工具，有个bug，还请帮助查看：

heat map issues

here is my metadata
metadata2.csv

and my ko adundances
HPabun.csv

code for map:
heatmap_plot <- pathway_heatmap(t(SAUN_matrix), metadata, "CST")
print(heatmap_plot)

what I get

it doesn't have CST and isn't normalized

what it looks like when I normalize with ggplot:

trying to figure out what is happening

pathway_daa(): Error in Math.factor(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, : ‘round’ not meaningful for factors

Hello @cafferychen777,

I'm very new to picrust2 and ggpicrust2 is really interesting! I tried to run the pathway_daa() and got the error message:

Error in Math.factor(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, :
‘round’ not meaningful for factors

How can I fix that?

pathway_pca(): Error in prcomp.default(t(abundance), center = TRUE, scale = TRUE) : cannot rescale a constant/zero column to unit variance

Hello,
I had posted in the past about a previous issue, but it has now been resolved.
I do have a new issue in making metacyc pca plots.
I originally was able to make PCA plots with all my data, but there was a certain set of samples I wanted to focus my analysis on, and so I modified my data files and have been doing analysis with them. I can make error bars and heatmaps, but the PCA is giving me trouble.
Here is my code

` setwd("/Users/mayagabitzsch/Desktop/microbiome2")

library(tidyverse)
library(ggh4x)
metcyc_abundance_file <- "path_abun_unstratroots.tsv"
metcyc_abundance_data <- read_delim(metcyc_abundance_file, delim = "\t", col_names = TRUE, trim_ws = TRUE)
view(metcyc_abundance_data)
metadataM<- read_delim(
"~/Desktop/microbiome2/MetadataROOTS.txt",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE)
view(metcyc_abundance_data)
view(metadataM)

#START HERE
metacyc_daa_results_df <- pathway_daa(abundance = metcyc_abundance_data%>% column_to_rownames("pathway"), metadata = metadataM, group = "ENV_ALT", daa_method = "ALDEx2")

Annotate the results

metacyc_daa_annotated_results_df <- pathway_annotation(pathway = "MetaCyc", daa_results_df = metacyc_daa_results_df, ko_to_kegg = TRUE)

feature_with_p_0.05 <- metacyc_daa_results_df %>% filter(p_adjust < 0.05)
pathway_heatmap(abundance = metcyc_abundance_data %>% filter(pathway %in% feature_with_p_0.05$feature) %>% column_to_rownames("pathway"), metadata = metadataM, group = "ENV_C")

pathway_pca(abundance = metcyc_abundance_data %>% column_to_rownames("pathway"), metadata = metadataM, group = "ENV_ALT")
`
Before I removed the samples I did not want to include in the analysis, I could make the PCA plots just fine, but now I get the error

Error in prcomp.default(t(abundance), center = TRUE, scale = TRUE) :
cannot rescale a constant/zero column to unit variance
I have tried multiple different groupings and I get the same error each time. I have also attached the meta data and Metacyc file here:
path_abun_unstratroots copy.txt
MetadataROOTS copy.txt

Also, one quick question about the annotation of KO abundance,
My samples are taken from the roots of plants, and for my error bars I have human diseases occur on one graph.
fixedA.pdf

How do these annotations work? I just think that is a strange group to have given the type of samples, and only occurs on the grouping ENV_A

ggpicrust2 installation dependency error

I tried to install "ggpicrust2" with "install.packages("ggpicrust2")".
ERROR: unable to collate and parse R files for package ‘ggfun’

removing ‘/home/xuan/R/x86_64-pc-linux-gnu-library/4.1/ggfun’
ERROR: dependency ‘ggfun’ is not available for package ‘aplot’
removing ‘/home/xuan/R/x86_64-pc-linux-gnu-library/4.1/aplot’
ERROR: dependency ‘aplot’ is not available for package ‘ggpicrust2’
removing ‘/home/xuan/R/x86_64-pc-linux-gnu-library/4.1/ggpicrust2’

Warning messages:
1: In install.packages("ggpicrust2") :
installation of package ‘ggfun’ had non-zero exit status
2: In install.packages("ggpicrust2") :
installation of package ‘aplot’ had non-zero exit status
3: In install.packages("ggpicrust2") :
installation of package ‘ggpicrust2’ had non-zero exit status

How do I fix it?

What is the statistical test applied by metagenomeSeq?

Hello everyone,

I would like to know what is the statistical test applied by metagenomeSeq? Because in the pathway_daa function output, only the metagenomeSeq method is specified.

Thank you,
Matteo

"FuncDiv" for functional diversity visualization using ggpicrust2?

Many thanks for this tool.
Is any tutorial help to use ggpicrust2 for alpha and beta contributional diversity (or functional diversity) visualization? https://github.com/gavinmdouglas/FuncDiv
alpha and beta contributional diversity are new terms introduced by one of the developers of picrust2
Thanks again

pathway_annotation issue

Thank you for the nice package!
However, when I did pathway_annotation(), data loss occurred.
When I subset about 242 KO data into 80 lines and divided them into 3 pieces of data, then ran the pathway_annotation().

My data is in the link below.
mydata

f1 <- ko_daa_sub_method_results_df[1:80, ] # 121
f2 <- ko_daa_sub_method_results_df[81:161, ] # 121
f3 <- ko_daa_sub_method_results_df[162:242, ] # 121

d1 <-pathway_annotation(pathway = "KO",daa_results_df = f1, ko_to_kegg = TRUE) # 53 × 11
d2 <-pathway_annotation(pathway = "KO",daa_results_df = f2, ko_to_kegg = TRUE) # 58 × 11
d3 <-pathway_annotation(pathway = "KO",daa_results_df = f3, ko_to_kegg = TRUE) # 50 × 11
annotation <- rbind(d1, d2, d3)
dim(annotation) # 161

The result left only 161 rows out of 242 KO data.
output is here result

I thought it was a problem with keggGet(), so I took the contents inside the pathway_annotation()and ran it separately.


ko_id <- ko_daa_sub_method_results_df$feature # 242
pathway_table <-  data.frame(row.names = ko_id)


KeggGet_results <- list()
for ( i in ko_id) {
  KeggGet_results[[i]] <- tryCatch(keggGet(i), error=function(e) NULL) # https://www.biostars.org/p/366463/
}

ko_result_id <- KeggGet_results %>% names # 236

for (i in ko_result_id){
        pathway_table[i, "Level1"]         <- strsplit( KeggGet_results[[i]][[1]]$CLASS, "; ")[[1]][1]
        pathway_table[i, "Level2"]         <- strsplit( KeggGet_results[[i]][[1]]$CLASS, "; ")[[1]][2]
        pathway_table[i, "Level3"]         <-  KeggGet_results[[i]][[1]]$PATHWAY_MAP
        if ( !is.null(KeggGet_results[[i]][[1]]$DESCRIPTION)) {
          pathway_table[i, "pathway_description"] <-  KeggGet_results[[i]][[1]]$DESCRIPTION
          }
}

The following message appeared.

Error in [<-.data.frame(*tmp*, i, "pathway_description", value = c("Type I diabetes mellitus is a disease that results from autoimmune destruction of the insulin-producing beta-cells. Certain beta-cell proteins act as autoantigens after being processed by antigen-presenting cell (APC), such as macrophages and dendritic cells, and presented in a complex with MHC-II molecules on the surface of the APC. Then immunogenic signals from APC activate CD4+ T cells, predominantly of the Th1 subset. Antigen-activated Th1 cells produce IL-2 and IFNgamma. They activate macrophages and cytotoxic CD8+ T cells, and these effector cells may kill islet beta-cells by one or both of two types of mechanisms: (1) direct interactions of antigen-specific cytotoxic T cells with a beta-cell autoantigen-MHC-I complex on the beta-cell, and (2) non-specific inflammatory mediators, such as free radicals/oxidants and cytokines (IL-1, TNFalpha, TNFbeta, IFNgamma).", :
replacement has 2 rows, data has 1

The error did not appear because I excluded the DESCRIPTION column.

for (i in ko_result_id){
        pathway_table[i, "Level1"]         <- strsplit( KeggGet_results[[i]][[1]]$CLASS, "; ")[[1]][1]
        pathway_table[i, "Level2"]         <- strsplit( KeggGet_results[[i]][[1]]$CLASS, "; ")[[1]][2]
        pathway_table[i, "Level3"]         <-  KeggGet_results[[i]][[1]]$PATHWAY_MAP
        # if ( !is.null(KeggGet_results[[i]][[1]]$DESCRIPTION)) {
        #   pathway_table[i, "pathway_description"] <-  KeggGet_results[[i]][[1]]$DESCRIPTION
        #   }
}

Is there any way to get rid of these loss of data in ggpicrust as well?
I will be grateful if you answer.

loading issue

library(ggpicrust2)
Error: package or namespace load failed for ‘ggpicrust2’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘dplyr’ 1.0.8 is already loaded, but >= 1.0.10 is required

add a parameter to select top significant pathways

Hi Caffery,

Thank you for fixing the bug.

I have a request for the pathway_errorbar(), is it possible to add a parameter to select top significant pathways or top pathways by other criteria. Because I got a different error when using the wrapper ggpicrust2

Error in pathway_errorbar(abundance = abundance, daa_results_df = daa_sub_method_results_df,  : 
  The feature with statistically significance are more than 30, the visualization will be terrible.
 Please use select to reduce the number.

I could go step by step using each function, but doing so makes the wrapper lose its convenience and its function in preliminary data exploration, don't you think?

Also is it possible to add parameters in the wrapper to choose or turn on or off the existing visualization methods?

Best regards,

ko2kegg_abundance(): KO2Kegg abundance returning empty output.

I am running the following:

`kegg_abundance <- ko2kegg_abundance("Dropbox/CSIRO_Picrustrun/picrust2_out_pipeline_CSIRO2/KO_predicted.tsv")
Rows: 7130 Columns: 10544
── Column specification ───────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): sequence
dbl (10543): K00001, K00002, K00003, K00004, K00005, K00006, K00007, K00008, K00009, K00010, K00011, K00012...

ℹ Use spec() to retrieve the full column specification for this data.
ℹ Specify the column types or set show_col_types = FALSE to quiet this message.
Calculation may take a long time, please be patient.
The kegg pathway with zero abundance in all the different samples has been removed.

Perform pathway differential abundance analysis (DAA) using ALDEx2 method

Please change group to "your_group_column" if you are not using example dataset

daa_results_df2 <- pathway_daa(abundance = kegg_abundance,

                          metadata = metdata.tab,

                          group = "Treatment",

                          daa_method = "ALDEx2",

                          select = NULL,

                          reference = NULL)

Error in metadata[, matching_columns]:
! Can't subset columns with matching_columns.
✖ Subscript matching_columns can't contain missing values.
✖ It has a missing value at location 1.`

I have run metacyc analysis no problem but wanted to see why this could be?

pathway_daa(): Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columnscan't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace()

hello,
I am trying to follow the example code with my data. I used the file path_abun_unstrat.tsv and a file titled Metadata.txt
When I try to run the ggpicrust with the input file path, I get an error stating that the subscript matching columns can't contain missing values, It has a missing value at location 1. Here is what the code Looks like

and the error that is given is
Calculation may take a long time, please be patient. The kegg pathway with zero abundance in all the different samples has been removed. Performing pathway differential abundance analysis... Error in metadata[, matching_columns]: ! Can't subset columns with matching_columns. ✖ Subscript matching_columnscan't contain missing values. ✖ It has a missing value at location 1. Runrlang::last_trace()` to see where the error occurred.

I went through the step by step approach in the read me and this is the command where the error is popping up
daa_results_df <- pathway_daa(abundance = kegg_abundance, metadata = metadata, group = "ENVIRONMENT", daa_method = "ALDEx2", select = NULL, reference = NULL)

I found another post that had the same error as me, and I tried to follow the advice there and the issue is still not resolving. I am not sure if I am understanding the solutions correctly. I transposed my metadata file so that the sample name was the columns, matching the kegg_abundance, The first column in my meta data is the sample_name with the following groups I am putting my data in. That column does not show up on the kegg abundance columns and I think that might be the issue? But then I lose my samples groupings. I am unsure how to make them match.

ggpicrust2 installation issue - which R version should I use?

Hi,

Thank you very much for such great tool for picrust2 results visualization. That is just I was looking for!

May I kindly ask you how to install it properly?

My recent attempt was to install in conda env with R inside:
I have tried R 3,6 and 4.2 versions, in both cases I had such error at the end of installation

✔ checking for file ‘/tmp/RtmpXY3yEs/remotes601e27a97bf1/cafferychen777-ggpicrust2-7671bf4/DESCRIPTION’ ...
─ preparing ‘ggpicrust2’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
Removed empty directory ‘ggpicrust2/data’
─ looking to see if a ‘data/datalist’ file should be added
NB: this package now depends on R (>= 3.5.0)
WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: 'ggpicrust2/inst/extdata/EC_reference.RData' WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: 'ggpicrust2/inst/extdata/KO_reference.RData' WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: 'ggpicrust2/inst/extdata/MetaCyc_reference.RData' WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: 'ggpicrust2/inst/extdata/kegg_reference.RData'
─ building 'ggpicrust2_1.4.7.9000.tar.gz'

ERROR: dependencies ‘ALDEx2’, ‘DESeq2’, ‘edgeR’, ‘lefser’, ‘limma’, ‘Maaslin2’, ‘metagenomeSeq’, ‘MicrobiomeStat’, ‘SummarizedExperiment’ are not available for package ‘ggpicrust2’

removing ‘/home/jang/anaconda3/envs/mamba/envs/ggpirust2_Renv/lib/R/library/ggpicrust2’
Warning messages:
1: In i.p(...) :
installation of package ‘MicrobiomeStat’ had non-zero exit status
2: In i.p(...) :
installation of package ‘/tmp/RtmpXY3yEs/file601e777feae1/ggpicrust2_1.4.7.9000.tar.gz’ had non-zero exit status

Any hints?

Jan

I end up with 0 obs. for my kegg_abundance

##What I am running

ko_abundance_file <- '/Users/rachaelkramp/Desktop/Selection_paper/Picrust2_analysis/pred_metagenome_unstrat.tsv'

Run ko2kegg_abundance function

kegg_abundance <- ko2kegg_abundance(file = ko_abundance_file)

R-Console feedback:
"> kegg_abundance <- ko2kegg_abundance(file = ko_abundance_file)
Rows: 2286 Columns: 51
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): function
dbl (50): 142-Pre, 143-Pre, 147-Pre, 162-Pre, 165-Pre, 176P, 180-Pre, 181-Pre, 196-Pre, 197P, M1...

Question:

Why am I ended up with zero obs. in the abundance file?

cafferychen777 / ggpicrust2 Goto Github PK

ggpicrust2's Introduction

Hello! I'm Caffery Yang👋

🎖 Achievements

🧪 Research Interests

📚 Publications

🤝 Collaboration

🛠 Projects

1. Application of LLM in Microbiome Taxonomy

2. Performance of Microbiome DA Methods in Other Omics

3. Generative AI in Microbiome Data Analysis and Accuracy Assessment

4. PI_find

5. Fine-Tuning Platform for Bioinformatics Repositories

📫 Connect with me

ggpicrust2's People

Contributors

Stargazers

Watchers

Forkers

ggpicrust2's Issues

The default DAA method is "ALDEx2"

Please change group to "your_group_column" if you are not using example dataset

If you have more than 3 group levels and want to use the LinDA, limma voom, or Maaslin2 methods, you should provide a reference.

Please change group to "your_group_column" if you are not using example dataset

Please change Group to metadata$your_group_column if you are not using example dataset

This is the process of getting an error bar,heat map and hopefully pca of metacyc abundance

Perform differential abundance analysis

Annotate the results

If you want to analyze KEGG pathway abundance instead of KO within the pathway, turn ko_to_kegg to TRUE.

KEGG pathways typically have more explainable descriptions.

Load metadata as a tibble

data(metadata)

Load KEGG pathway abundance

data(kegg_abundance)

Perform pathway differential abundance analysis (DAA) using ALDEx2 method

Please change group to "your_group_column" if you are not using example dataset

Filter results for ALDEx2_Welch's t test method

Please check the unique(daa_results_df$method) and choose one

Annotate pathway results using KO to KEGG conversion

Generate pathway error bar plot

Please change Group to metadata$your_group_column if you are not using example dataset

Annotate the results

Perform pathway differential abundance analysis (DAA) using ALDEx2 method

Please change group to "your_group_column" if you are not using example dataset

Run ko2kegg_abundance function

Recommend Projects

Recommend Topics

Recommend Org