ggloor / aldex2_dev Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 6.0 12.62 MB

ALDEx tool to examine compositional high-throughput sequence data with Welch's t-test

License: GNU Affero General Public License v3.0

R 56.06% TeX 43.94%

aldex2_dev's People

Contributors

Stargazers

Watchers

Forkers

ruthgrace mlaforet jrwu tankmermaid christianbrinch wnq13579

aldex2_dev's Issues

Aldex2 aldex.glm results

Hi! I am trying out the glm run with model matrix to my data. Everything goes fine when running aldex.clr and aldex.glm. However, when I inspect the result data frame, it doesn't return one of the variables in my model matrix. When running aldex.glm.effect, all variables are now included in the glm effect data frame. I am comparing 8 groups/conditions in my data.

Bug in aldex.clr when using glm model matrix

Greetings - I am trying to use a glm for a 2 factor experimental design and I am getting the error:

Error in coerce.data(reads, conds) :
mismatch between number of samples and condition vector

I also get this error when I use the example data provided in the vignette:

selex.sub <- selex[1:400,]

covariates <- data.frame("A" = sample(0:1, 14, replace = TRUE),

                     "B" = c(rep(0, 7), rep(1, 7)))

mm <- model.matrix(~ A + B, covariates)
x <- aldex.clr(selex.sub, mm, mc.samples=8, denom="all")
Error in coerce.data(reads, conds) :
mismatch between number of samples and condition vector

I have thoroughly looked over my data and they have the same amount and samples and the same sample names.

Thank you

Cannot install on CentOS 8.6

Installing on CentOS 8.6 gives error shown below. Also tried unsuccessfully from source both git and .tar.gz versions. R version is 4.2.1.

Thanks.

BiocManager::install("ALDEx2")

....many 100s of lines then:

*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path

DONE (DelayedArray)
ERROR: dependency 'RcppZiggurat' is not available for package 'Rfast'
removing '/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/Rfast'
installing source package 'GenomicRanges' ...
** using staged installation
** libs
icc -I"/apps/R/4.2.1/lib64/R/include" -DNDEBUG -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/S4Vectors/include' -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/IRanges/include' -I/usr/local/include -fpic -g -O2 -c IRanges_stubs.c -o IRanges_stubs.o
icc -I"/apps/R/4.2.1/lib64/R/include" -DNDEBUG -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/S4Vectors/include' -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/IRanges/include' -I/usr/local/include -fpic -g -O2 -c R_init_GenomicRanges.c -o R_init_GenomicRanges.o
icc -I"/apps/R/4.2.1/lib64/R/include" -DNDEBUG -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/S4Vectors/include' -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/IRanges/include' -I/usr/local/include -fpic -g -O2 -c S4Vectors_stubs.c -o S4Vectors_stubs.o
icc -I"/apps/R/4.2.1/lib64/R/include" -DNDEBUG -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/S4Vectors/include' -I'/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/IRanges/include' -I/usr/local/include -fpic -g -O2 -c transcript_utils.c -o transcript_utils.o
icc -shared -L/apps/R/4.2.1/lib64/R/lib -L/usr/local/lib64 -o GenomicRanges.so IRanges_stubs.o R_init_GenomicRanges.o S4Vectors_stubs.o transcript_utils.o -L/apps/R/4.2.1/lib64/R/lib -lR
installing to /home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-GenomicRanges/00new/GenomicRanges/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
DONE (GenomicRanges)
installing source package 'SummarizedExperiment' ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
DONE (SummarizedExperiment)
ERROR: dependency 'Rfast' is not available for package 'ALDEx2'
removing '/home/554/ta0341/R/x86_64-pc-linux-gnu-library/4.2/ALDEx2'

The downloaded source packages are in
'/scratch/nm31/ta0341/tmp/RtmpIEuhnQ/downloaded_packages'
Installation paths not writeable, unable to update packages
path: /apps/R/4.2.1/lib64/R/library
packages:
MASS, nlme, survival
Warning messages:
1: In install.packages(...) :
installation of package 'RcppGSL' had non-zero exit status
2: In install.packages(...) :
installation of package 'RcppZiggurat' had non-zero exit status
3: In install.packages(...) :
installation of package 'Rfast' had non-zero exit status
4: In install.packages(...) :
installation of package 'ALDEx2' had non-zero exit status

Unexpected effect directions

This issue is similar to #21

However it doesn't seem like this feature has been implemented yet. Currently, coercion of the conditions character vector to factor causes the levels to be in alphabetical order, which may produce effect sizes in the unexpected direction.

One possible solution is for the conditions argument to accept a factor so that the user can re-level and have control of the which group is the reference.

code for obtaining all pairwise comparisons under glm module

Hi,

I'm using the glm functionality to analyze a microbiome dataset which has 3 factors: fed/unfed (2 levels), time (9 levels), and location within the GI tract (4 levels). I would like to implement a model similar to a 3-way ANOVA, and create plots which show the number of differentially abundant taxa for each factor.

Here are summaries of my asv and metadata dataframes:

str(asv_trans)
'data.frame': 651 obs. of 367 variables:
$ AC1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ AC10 : int 0 0 0 0 0 0 0 0 0 803 ...
$ AC109: int 0 0 0 0 0 0 0 0 0 0 ...
$ AC11 : int 0 0 25 0 0 0 0 0 0 0 ...
$ AC110: int 0 0 0 0 0 0 0 0 0 0 ...

str(meta)
'data.frame': 367 obs. of 4 variables:
$ sample : chr "AC1" "AC10" "AC109" "AC11" ...
$ days : chr "d01" "d01" "d01" "d02" ...
$ treatment: chr "NF" "F" "NF" "NF" ...
$ tissue : chr "CE_C" "CE_C" "IL_C" "CE_C" ...

So far I ran:

mm<-model.matrix(~meta$treatment + meta$tissue + meta$days)
x <- aldex.clr(asv_trans, mm, mc.samples=128, denom="all", verbose=T)
glm.test <- aldex.glm(x, mm)
glm.effect <- aldex.glm.effect(x)

I believe results contain metrics for each model term coefficient (which are also pairwise comparisons for all levels of a treatment against the first treatment level). However, I am interested in:

Creating a plot of taxa which are differentially abundant across each factor (3 plots total: time, fed/unfed, gut location)
Obtaining results from all pairwise comparisons, not just those against the reference level. I see in a prior post that obtaining a complete set of results may be achievable by changing how the model is specified, but I am unsure how to code this using glm, I have only ever done a similar analysis using anova functions in R. Could you please provide an example of how to run this in aldex2?

Many thanks!

Several options are giving errors

Several options are giving errors:

x.all <- aldex(otus, group1, mc.samples=128,test="t",verbose=TRUE,include.sample.summary=FALSE)

Error in aldex(otus, group1, mc.samples = 128, test = "t", verbose = TRUE, :
unused arguments (test = "t", include.sample.summary = FALSE)

If I use only the command below, it works, but I cannot specify any other options.
x.all <- aldex(otus, group1, mc.samples=128, verbose=TRUE)

Feature Request - comparisons between multiple groups

Hi,

Can ALDEx2 be expanded to do post-hoc testing and report results of comparisons between multiple groups, like it does between two groups? For example, my project has 16 different treatments, and I would like to be able to see whether features (OTUs) are significantly more abundant in certain treatments over others, identified by a Tukey's HSD significance category letter or something similar. It would also be great to be able to extract mean CLRs from each treatment as well as CLRs associated with features (OTUs) from individual samples.

Thanks,
Noah Strom

Cannot get to install

Have tried installing on a Mac with R version 3.0.2 with the following error:

install.packages("ALDEx2_0.99.1.tar.gz")
Warning message:
package ‘ALDEx2_0.99.1.tar.gz’ is not available (for R version 3.0.2)

Tried also on Windows 7 running R version 3.1.0, and received same error message.

ALDEx2_1.16.0 output of aldex.corr may have a mistake

Hello

Output of aldex.corr function has column r . However, the contents of the column is "t statistics" of cor.test function, and it should be "r estimate".

So, code of aldex.corr should be changed as follows.
from
r <- sapply(cors, getElement, "statistic")
to
r <- sapply(cors, getElement, "estimate")

daichi

updates to aldex.glm and results interpretation

Hello!

I'm trying to use aldex.glm() to explore differential abundance for a complex design involving both Genotype (B6 vs. TCR) and stress condition (Stress vs. NoStress). My code is as follows (dat_pr is my filtered phyloseq object):

#create model matrix
meta = data.frame(dat_pr_OBIT_f@sam_data)
mm = with(meta, model.matrix(~GENOTYPE + Stress + GENOTYPE:Stress))
#extract OTU table
otus = data.frame(dat_pr_OBIT_f@otu_table)
# check alignment
all(meta$SAMPLE_ID == colnames(otus))

#CLR transform values with model matrix
glm <- aldex.clr(otus, mm, mc.samples=128, denom="all")

glm.test <- aldex.glm(glm)
glm.test = rownames_to_column(glm.test, "ASV")

As mentioned in a previous issue, originally it seems that the glm function had a "drop1" argument, which based on my knowledge means that differentially abundant taxa for the first factor would only be tested within the baseline group of the second factor. For example, differentially abundant taxa B6 vs. TCR (model.GENOTYPETCR) would only be tested in the "NoStress" condition and vice versa. Is this still the case? If so, how can I construct a model that tests for the global effects of genotype on taxon abundance while still accounting/controlling for stress, without removing sample groups, similar to something like DESeq2 or corncob?

Thanks so much in advance!

Miranda

aldex.clr with model.matrix

Hello,
First of all, thanks for the nice package ALDEx2. Its a great idea and a very, very useful tool! Thanks!
I am having a problem running what has been called a 'complex design' in this forum. In this regard, I would like to ask two questions:

Every time I am running the analysis with a model.matrix, I am getting an error message "Error in coerce.data(reads, conds) : mismatch between number of samples and condition vector". To me, they seem to match... (dim()). I also run the example from the vignette - with the same error.
covariates <- data.frame("A" = sample(0:1, 14, replace = TRUE),
"B" = c(rep(0, 7), rep(1, 7)))
mm <- model.matrix(~ A + B, covariates)
x <- aldex.clr(selex.sub, mm, mc.samples=8, denom="all")
What am I doing wrong? I would like to get this to work as I will have several factors eventually.
Can this be a version problem? I am running the analysis with ALDEx2 1.14.1.. I saw, there is ALDEx2 1.20 available. However, it appears, I cannot update and all tries to update the package failed... A colleague said she was using 1.18.0. However, for her it worked only with the model.matrix and her attemps to run the aldex.clr with a character failed.
Did I understand correctly, that I only have to run a model.matrix approach if I am using more than 1 factor, independent of the number of its levels? In other words: The model.matrix approach is for designs with more than 1 factor variable, is that correct? For only one factor, I can simply run it as 'conds'?!

Thank you very much in advance for your kind help!
Yours faithfully,
Maraike

Interpreting ALDEx2 results

Hello,

I have been using ALDEx2 extensively to find differential features (taxa) between case and control saliva microbiome samples. Using a q-value cutoff of 0.05 I have found some puzzling results where the distributions and median values of features are similar but are still called differential abundant. An example of this can be seen from this boxplot:

ALDEX results:

effect: 0.2778,
wi.eBH=0.0168

Similar plots are also found for other features that were called.
I'm trying to figure out why these features are being called differential abundant when it looks like they have very similar relative abundances (as shown by the boxplot). Is this due to my experimental design having a large class imbalance 1:5 (case:control) or perhaps I need to also implement an effect size cutoff as well (which is hard to estimate what a "good" cutoff would be).

Feature Request: Specify a "reference" condition

It would be useful to the reference condition so we can control the directionality of the dif.btw. For example, if we were comparing treatment vs. control we could specify reference='control' and the resulting statistics would be using control as the reference.

If there was something like this:

rab.win.treatment  = 2
rab.win.control  = 5

Then diff.btw would be -3 since it would be using control as the reference. I think right now it's in alphabetical order.

The usage could be like this:

aldex(X, y, reference='control')

Feature request: confidence interval parameter for aldex()

Hello @ggloor

I would like to use aldex() to get both p-values and CIs. However when I pass CI=TRUE to aldex() no CIs are output.

It seems the only way to do this is to call aldex.clr() then aldex.effect() and aldex.ttest() and combine the results.

Pron

Paired analysis by variable, not order

Hello,
thanks for making this great package. I have a paired dataset where I would like to apply aldex2 paired test. However, I do not have my data organized by order, but rather a column to identify which samples are paired (host_id). Is there a way around this, instead of transforming the data into wide format?

Thanks!

Use aldex.glm to plot with aldex.plot

Hello,
I would like to know if it is possible to take aldex.glm results to plot them in the same way you do with a ttest and aldex.effect outputs.

As far as I tried, aldex.effect won't run with a multi-level glm model, in my case:

mm <- model.matrix(~ NLACTA + DIASLE + CH4_Rank, metadataset)

Being NLACTA and DIASLE numeric parameters and CH4_Rank a 4-level factor (HIGH, H_MID, L_MID, LOW).

I am interested only in one pairwise contrast: HIGH vs LOW, but with the clr transformation from all the dataset, not only the HIGH-LOW subcomposition. So I tried using aldex.glm, but the output is not similar at all to the one from aldex.ttest + aldex.effect, so I wonder how can I plot it in a similar way.

Abnormal raw p-value distribution from aldex.ttest

Hello,

I have been looking at some diagnostic plots of test results and noticed a bimodal raw p-value histogram.
aldex_example_pval_distr_gg.pdf

For comparison, here is a p-value histogram from a limma analysis, which produces an anti-conservative histogram.
limma_example_pval_distr_gg.pdf

http://varianceexplained.org/statistics/interpreting-pvalue-histogram/

I am wondering if such a distribution might be expected given the testing strategy. I tried using a IQLR denominator as well as iterate=TRUE, and I see similar results with different parameters.

Thanks for any help you might be able to provide!

feature request - controlling for additional variables

Hi,

Thank you for Aldex2, I have just started using it and love it; I also love the clear manual and vignette.

I was wondering if there are any plans to account for multiple variables when using a dependent variable ("conditions"), e.g. I might want to control for gender of subjects, day of DNA extraction, or day of sequencing.

Thank you for your kind attention,
Max

plotting of aldex2 (& picrust2) results?

Hello .. Could you please suggest a way to plot something like this?
I used aldex2 after picrust2 as it is recommended (reproduced this tutorial: https://ycl6.github.io/16S-Demo/4_picrust2_tutorial.html).

Many thanks

how to calculate the fold-change in ALDEx2?

I wonder to know how to calculate the fold-change in ALDEx2?
Can the output diff.btw value tell me the fold-change between two group?

Confidence interval

Hi !
Maybe it's me misunderstanding the confidence interval feature, but I feel like the explanation given in Bioconductor is really unclear.
When I run aldex.effect(x, CI=TRUE), I get two more rows, "effect.high" and "effect.low", both of which are just never explained in the manual. In that same manual, you show the plots where features with a CI > 95% are circled in blue, but never show what data you use to circle these features.
What is effect.high and effect.low? How do they relate to the confidence interval?
Also, just to be sure, is the q-value the corrected p-value ?
I would really like to use this package, but I strongly feel the manual could contain better explanations and more representative examples (as well as including the code used to generate all graphs).
Thank you for your help !

Installation issue

Hi,
ALDEx2 works well before I upgraded to Ubuntu 22.04, but the following error emerged when I wanted to install it:

Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/[]/4.2/RcppZiggurat/libs/RcppZiggurat.so':
libgsl.so.23: cannot open shared object file: No such file or directory
Calls: ... asNamespace -> loadNamespace -> library.dynam -> dyn.load
Execution halted

I am using R 4.2.1 and tried reinstalling Rstudio, but the error persists. Would really appreciate your advice, thanks!

Error running aldex.ttest

Hello Dr. Gloor I've come as a desperate graduate student with very little coding experience to try and work out this issue. I am trying to run the code as shown below:

count <- 3
aldex.controls <- data.frame(Controls[which(apply(Controls, 1, function(x){sum(x)}) > count),],
check.names=F)
strain <- c("S","Ctl")

x.controls <- aldex.clr(aldex.controls, strain, mc.samples=2, denom="zero", verbose=TRUE)
x.controls.tt <- aldex.ttest(x.controls, conds, paired.test=FALSE)

x.controls.effect <- aldex.effect(x.controls, conds, include.sample.summary=FALSE, verbose=TRUE)
x.controls.all <- data.frame(x.controls.tt, x.controls.effect, stringsAsFactors=FALSE)

When I do I get the following error when I get the the aldex.ttest function:
Error in apply(data[, grp1], 1, sd) : dim(X) must have a positive length

I know this has something to do with the formatting of clr but I'm really lost as to how to fix it. Any and all help is appreciated!

Problem in calculating the differential OTU abundance based on ALDEx2

I am trying to calculating the differential OTU abundance, for this I am trying this code
I attached my file also
ENV_aldex.txt
OTU_table_final.txt

" ]
Script:
OTU_FL <- OTU[ , ENV$Water_type == "Radon
OTU_FL <- OTU_FL[ rowSums(OTU_FL) > 0, ]
relOTU_FL <- prop.table(as.matrix(OTU_FL), 2) * 100
ENV_FL <- ENV[ colnames(OTU_FL), ]
OTU_FL.sub <- OTU_FL[apply(OTU_FL, 1, function(x) { max(table(ENV_FL$Type[x > 0])) >=3 } ) &
apply(relOTU_FL, 1, function(x) { max(x) >= 1 } ), ]
clrOTU_FL_mc <- aldex.clr(OTU_FL.sub, mc.samples = 128)
clrOTU_FL <- t(sapply(getMonteCarloInstances(clrOTU_FL_mc), function(x) {apply(x, 1, median)}))
glmOTU_FL <- aldex.glm(clrOTU_FL_mc, conditions = ENV_FL$Type)

but I got an error that
Error in lr2glm(mci_lr, conditions, ...) :
Please define the aldex.clr object for a model.matrix 'conditions'.
In addition: Warning message:
In if (class(conditions) != "matrix" & !("assign" %in% names(attributes(conditions)))) { :
the condition has length > 1 and only the first element will be used
Please help me to solve this error.

Problem in calculating the differential OTU abundance based on glm model ALDEx2 version 1.20.0

Hello,

I am trying to calculate the differential OTU abundance, I used this code

OTU_table=read.csv("PS118_physeq.csv", h=T)

OTU_St1_St4_2_3=OTU_table[,c(38,44,50,98,104,110)]

Remove rows having all zeros

OTU_St1_St4_2_3=OTU_St1_St4_2_3[rowSums(OTU_St1_St4_2_3[])>0,]

Conds<-c(rep("St4",3),rep("St1",3))

create an aldex object, which inludes clr transformation (so your input file is OTU_table with raw couts) and statistical test

St1St4_2_3=aldex(OTU_St1_St4_2_3, Conds, mc.samples=128, test="glm", effect=FALSE, include.sample.summary=FALSE, verbose=FALSE)

I got this error:

aldex.clr: generating Monte-Carlo instances and clr values
operating in serial mode
computing center with all features
aldex.glm: doing glm test based on a model matrix
Error in lr2glm(mci_lr, conditions, ...) :
Please define the aldex.clr object for a model.matrix 'conditions'.

Can't use aldex.effect() on aldex.kw() object

We can't use aldex.effect() on aldex.kw() object so how are we meant to understand the effect and plot it with the package?

x = aldex.clr(profile_nitrogen, conds = metadata$Timepoint, mc.samples = 32, denom = "all", verbose = F)

x.kw = aldex.kw(x)

x.effect = aldex.effect(x, CI=T, verbose=FALSE)

>Error in aldex.effect(x, CI = T, verbose = FALSE) : 
  only two condition levels are currently supported

Generalized Linear model type and Random effect

I've been searching through the documentation, and I cannot seem to find what the GLM model is (family, link type). Is it possible to specify this within the glm function, and is it also possible to specify fixed effects within the model?

Does Aldex2 supports confounding covariate adjustment?

Hi,
May I know does Aldex2 supports covariates adjustment? I see there is a function aldex.glm, is it the correct function to achieve this goal? If yes, which column is the correct pvalue to be extracted? Thanks!

(Intercept) Estimate (Intercept) Std. Error (Intercept) t value (Intercept) Pr(>|t|) model.A Estimate
S:D:A:D 2.32201503 1.6048552 1.44686884 0.17582287 -0.8645702
S:D:A:E 2.34043611 0.9274932 2.52339970 0.02830383 -0.6441754
S:E:A:D 1.03699020 0.9955797 1.04159433 0.31995195 1.5643757
S:E:A:E 2.67510968 1.0422060 2.56677633 0.02620043 -1.1223356
S:D:C:D -0.01981533 0.7282679 -0.02720884 0.97878052 1.0472986
S:D:C:E 1.02661028 0.7654643 1.34116026 0.20690355 -0.3351250
model.A Std. Error model.A t value model.A Pr(>|t|) model.B Estimate model.B Std. Error model.B t value
S:D:A:D 1.8223933 -0.4744147 0.6444825 0.32802184 1.8037016 0.1818604
S:D:A:E 1.0532149 -0.6116277 0.5532148 2.91606412 1.0424124 2.7974189
S:E:A:D 1.1305305 1.3837537 0.1938683 3.78552532 1.1189350 3.3831503
S:E:A:E 1.1834770 -0.9483375 0.3633269 5.48445133 1.1713384 4.6822090
S:D:C:D 0.8269846 1.2664064 0.2315279 -0.09481651 0.8185025 -0.1158414
S:D:C:E 0.8692229 -0.3855455 0.7071849 0.06993148 0.8603076 0.0812866
model.B Pr(>|t|) (Intercept) Pr(>|t|).BH model.A Pr(>|t|).BH model.B Pr(>|t|).BH
S:D:A:D 0.8590002535 1 1 1.0000000
S:D:A:E 0.0173547079 1 1 1.0000000
S:E:A:D 0.0061084312 1 1 1.0000000
S:E:A:E 0.0006691559 1 1 0.2542792
S:D:C:D 0.9098659327 1 1 1.0000000
S:D:C:E 0.9366741895 1 1 1.0000000

aldex.corr having unexpected behavior

Hi,

Thanks for the development of this very interesting tool. I am trying to use it for assessing correlation between 16S rRNA microbiota composition and a numeric continuous variable and I am facing issues with the aldex.corr function.

Here are my session infos :

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ALDEx2_1.14.1     ggplot2_3.1.0     phyloseq_1.26.1   readr_1.3.1       tidyr_0.8.2       tibble_2.0.1     
 [7] data.table_1.12.0 dplyr_0.8.0.1     vegan_2.5-4       lattice_0.20-38   permute_0

Then following the aldex.corr description from this manual, I understand that I must first compute aldex.clr to then feed its output to aldex.corr:

# Toy values
values <- round(runif(14,1,5), 2)
data(selex)
selex <- selex[1201:1600,]

## Correlation
x <- aldex.clr(selex, values)

But here is what I obtain:

Error in validObject(.Object) :
invalid class “aldex.clr” object: invalid object for slot "conds" in class "aldex.clr": got class "integer", should be or extend class "character"

Which can be fixed by

## Correlation
x <- aldex.clr(selex, as.character(values))

Yet this looks like a bad idea...

And since I am on it, the following example found in the manual is not working neither, throwing the same error:

data(selex)
#subset for efficiency
selex <- selex[1201:1600,]
x <- aldex.clr(selex, 1:14)
corr.test <- aldex.corr(x)

Error in validObject(.Object) :
invalid class “aldex.clr” object: invalid object for slot "conds" in class "aldex.clr": got class "integer", should be or extend class "character"

Could you provide some guidance here? Thank you!

Valentin

confusion regarding "mc.samples" argument

Hello!

First off, thanks so much for putting together this incredibly useful package.

I was just hoping to get some clarification regarding the "mc.samples" argument in the initial clr transform function. I think I understand the significance, i.e. the amount of Monte Carlo instances drawn from a Dirichlet distribution to estimate the underlying technical variation of the sample. However, I'm having difficulty seeing how I can determine the proper amount of sampling for my data and the desired tests. The documentation says that "128 is sufficient for most cases", but that you only need 16 for ANOVA or up to 1000 for rigorous effect sizes. I've also found some literature that achieves good results with only 8 Monte Carlo instances, and shows no differences in results when comparing 8 vs. 128 instances.

With such a range of possibilities I hope you understand my confusion! I want to compare 2 groups, each with size 12, using a KW test. What would be the minimal number of Monte Carlo instances for such a dataset, assuming I want to optimize runtime while achieving accurate results?

Thanks so much!

aldex.corr() giving correlation values outside -1 to 1

I previously used aldex.corr() with 1.14.0 (R3.6.2) to generate correlation values for 16S data v metabolomic data using the following code. See https://www.biorxiv.org/content/10.1101/866459v2 for the outputs.

sample.data <- t(otu_table(ps3)) #count data needed not relative abundances
wanted <- sample.data[,grep('FMT aged|FMT adult', mapping$Group)]
wanted <- wanted[-which(rowSums(wanted) == 0),] #remove zero sum rows (71 taxa, 11 samples)

conds <- as.character(mapping$Group[grep('FMT aged|FMT adult', mapping$Group)])

metab <- read.table('Autumn_metabolomics.txt', sep='\t', header=T, row.names=1) #71 metabolites, 11 samples
metab <- metab[-which(rowSums(metab) == 0), ]

x <- aldex.clr(wanted, conds, mc.samples=1000, denom="iqlr", verbose=TRUE) #Monte Carlo simulation

out <- list()
for (i in 1:nrow(metab)) {
print(i)
y <- aldex.corr(x, as.numeric(metab[i,]))
out[[i]] <- y
}

pearson.cor <- lapply(out, '[[', 1)
pear.cor <- Reduce(rbind, pearson.cor)
colnames(pear.cor) <- rownames(wanted)
rownames(pear.cor) <- rownames(metab)

pearson.BH <- lapply(out, '[[', 3)
pear.BH <- Reduce(rbind, pearson.BH)
pear.BH[pear.BH >= 0.1] <- NA
pear.BH[pear.BH < 0.1] <- '+'

The work was done on my work computer, which I don't have access to for obvious reasons. I've tried running new analyses with v1.18.10 and v1.20.0 but the correlations I'm getting out look odd. The r values are outside -1 and 1 for pearson, even stranger for spearman. I can't work out why. I've rerun the analyses shown above and the results look nothing those I got previously. Only thing that's changed from what I can tell is you only get r, p and BH column outputs now rather than - colnames(out[[1]])
##[1] "pearson.ecor" "pearson.ep" "pearson.eBH" "spearman.erho" "spearman.ep" "spearman.eBH"

Thanks
Lesley

could not find function "aldex.plot"

Hi,

using the latest release, ALDEx2_2.0.6.2, I get this error:

x.all <- aldex(otus, group1, mc.samples=128,test="t",verbose=TRUE)

aldex.plot(x.all, type="MA", test="welch", xlab="Log-ratio abundance",ylab="Difference")

Error in aldex.plot(x.all, type = "MA", test = "welch", xlab = "Log-ratio abundance", :
could not find function "aldex.plot"

Thanks.

ALDEx2 clr transformation

Hi,

It is possibel to picked up the data frame from aldex.clr function and use it in different downstream analysis?
My concern is because to do so, I remove the conds argument since I do not want to introduce any metadata and that rises to warnings:
no conditions provided: forcing denom = 'all'
no conditions provided: forcing conds = 'NA'

I think it is totally fine if I accept that it can just use all as denom.

Thanks and great tool!!

JMB

no "conditions" flag option in aldex.effect or aldex.ttest- update directions?

Hi, I am trying to use your package for the first time and following directions from this vignette (http://127.0.0.1:15937/library/ALDEx2/doc/ALDEx2_vignette.pdf) which I found via this command: browseVignettes("ALDEx2")

I don't see a way to add a conds = argument to supply a conditions vector to either aldex.ttest or to aldex.effect. It looks like this is because the conditions vector is mantained as an element of the clr object after being supplied in aldex.clr.

If I'm interpreting this correctly, then it'd be helpful to update the vignette. For example, the directions for aldex.effect say that " There are four inputs: the aldex object
from aldex.clr, the vector of conditions ..."

Thanks! I'm looking forward to getting this package working for me :)

Feature request: enable modeling random effects

Hi, thanks for developing and maintaining this amazing tool for differential analysis of microbiome data. I've read in several papers that ALDEX2 is a good choice for differential analysis of microbiome data. I tried to use it, but my study design contains nested random effects, which stop me from using ALDEX2. It would increase the utility of ALDEX2 if it could cope with designs with random effects.

ALDEx2 for multiple (3) groups

Hi,

I am new to using ALDEx2. I looked at the tutorial. I was wondering can I perform an ALDEx2 analysis on three groups? Or is this tool just designed to look at two groups at a time ?

Thank you :)

aldex.glm with more complicated glm models

Hi,
I was wondering whether you are planning on including the option to work with more complicated glm models in aldex.glm, e.g. 2 factors, interaction terms, numerical factors, etc.?
I have tried to find my own solution for this issue using anova() with the option test = "F" instead of drop1 to get the p-values for each term in the model and adjusting the p-values per term over all genes. Do you think this approach is viable?

Thanks a lot!

Cheers,
Christiane

Error in data.t[grp1, ] - data.t[grp2, ] : non-conformable arrays

Hi,
I'm trying to understand the nature of this error, which is produced after running the x.tt <- aldex.ttest(clr, paired.test=T, hist.plot=T) function. It only appears with paired.test=T.

I'm not sure what info I can provide you with, but the groups (study vs placebo) for which I test the condition are not of equal size. Also, observations are not independent, as we have two saliva samples per participant (at two points in time).

Does that mean a paired test is impossible to perform?
Thanks in advance!

ggloor / aldex2_dev Goto Github PK

aldex2_dev's People

Contributors

Stargazers

Watchers

Forkers

aldex2_dev's Issues

Remove rows having all zeros

create an aldex object, which inludes clr transformation (so your input file is OTU_table with raw couts) and statistical test

Recommend Projects

Recommend Topics

Recommend Org