aldex.glm with more complicated glm models about aldex2_dev HOT 7 CLOSED

ggloor commented on August 31, 2024

aldex.glm with more complicated glm models

from aldex2_dev.

Comments (7)

ggloor commented on August 31, 2024

Christiane

We do plan on adding more complex designs to ALDEx2, but are constrained by manpower. If you have a particular analysis I could see if we could incorporate it into the codebase. We just need to get the appropriate data out of the aldex.clr object.

On Oct 26, 2015, at 2:06 PM, Christiane Hassenrück [email protected] wrote:

Hi,
I was wondering whether you are planning on including the option to work with more complicated glm models in aldex.glm, e.g. 2 factors, interaction terms, numerical factors, etc.?
I have tried to find my own solution for this issue using anova() with the option test = "F" instead of drop1 to get the p-values for each term in the model and adjusting the p-values per term over all genes. Do you think this approach is viable?

Thanks a lot!

Cheers,
Christiane

—
Reply to this email directly or view it on GitHub.

from aldex2_dev.

chassenr commented on August 31, 2024

Hi,
I was using the same basic structure that you already set up for aldex.glm.
Below is my solution to the problem. I was working on a dataset where I had a 2-factorial design.

Cheers,
Christiane

aldex.glm2 <- function(data.clr, formulaGLM, env) {

input:

data.clr - output from aldex.clr

formulaGLM - glm formula to be used, of the form e.g. "~ var1 + var2" or "~ var1 * var2"

env - data.frame containig the input variables for formulaGLM (samples are rows)

mc.instances <- numMCInstances(data.clr)
feature.number <- numFeatures(data.clr)
vars <- attributes(terms(as.formula(formulaGLM)))$term.labels
nvars <- length(vars)
feature.names <- getFeatureNames(data.clr)
output <- list(p.glm = data.frame(matrix(NA, feature.number, nvars)),
BH.glm = data.frame(matrix(NA, feature.number, nvars)))
colnames(output$p.glm) <- vars
colnames(output$BH.glm) <- vars
rownames(output$p.glm) <- feature.names
rownames(output$BH.glm) <- feature.names
output.mci <- vector("list", length = nvars)
names(output.mci) <- vars

for (i in 1:nvars) {
output.mci[[i]] <- matrix(NA, feature.number, mc.instances)
}
for (mc.i in 1:mc.instances) {
print(mc.i)
t.input <- sapply(getMonteCarloInstances(data.clr), function(y) { y[, mc.i] })
for (i in 1:nrow(t.input)) {
Y <- t.input[i, ]
F1 <- as.formula(paste("Y", formulaGLM))
GLM <- glm(F1, data = env)
AOV <- anova(GLM,test = "F")
for (j in 2:nrow(AOV)) {
output.mci[names(output.mci) == rownames(AOV)[j]][[1]][i, mc.i] <- AOV$"Pr(>F)"[j]
}
}
}
for(i in names(output.mci)) {
output$p.glm[, colnames(output$p.glm) == i] <- apply(output.mci[[i]],1,mean)
output$BH.glm[, colnames(output$p.glm) == i] <- p.adjust(output$p.glm[, colnames(output$p.glm) == i], method = "BH")
}
return(output)
}

output:

list with 2 data.frames containing p-values and BH-adjusted p-values for each term in the model formula

(p.adjust run separately for each term)

from aldex2_dev.

chassenr commented on August 31, 2024

sorry for the weird formatting... I am new to github. I just copy pasted the R script (including comments).

from aldex2_dev.

ggloor commented on August 31, 2024

no worries, I'm learning as well. So I'm guessing your code did not give you the result you needed?

from aldex2_dev.

chassenr commented on August 31, 2024

The code is working and the results seem to make sense. What I am worried about is whether the way I implemented the glm function (and p value adjustment) is correct. Was there a specific (statistical) reason why you only allowed for one categorical variable in aldex.glm and used drop1 to get the p value? Do you think that anova.glm with test = "F" will also give reliable p values? Is it valid to adjust the p values for each term in the model formula separately? I want to make sure that the genes (or in my case rather OTUs) that are detected as differentially abundant are not biased by false positives and that the test is robust.

Thanks!

from aldex2_dev.

ggloor commented on August 31, 2024

Christiane

Ah, now I see your question.

Great. There is no (statistical) reason why we implemented the glm with only one categorical variable. But, I’m not sure why drop1 was used. Arianne Albert ([email protected]) wrote the glm function.

What I can say, is that using the approach of estimating technical variation using Dirichlet Monte-Carlo instances, and converting all values to log ratios does three things.

First, normalization of read count is taken care of by the use of a prior and the Dir instances. Second, you are weeding out components that have a large stochastic contribution. Third, the log-ratio makes the differences between the components linear. These make the analyses much more robust than point estimates.

On Oct 29, 2015, at 4:56 PM, Christiane Hassenrück [email protected] wrote:

The code is working and the results seem to make sense. What I am worried about is whether the way I implemented the glm function (and p value adjustment) is correct. Was there a specific (statistical) reason why you only allowed for one categorical variable in aldex.glm and used drop1 to get the p value? Do you think that anova.glm with test = "F" will also give reliable p values? Is it valid to adjust the p values for each term in the model formula separately? I want to make sure that the genes (or in my case rather OTUs) that are detected as differentially abundant are not biased by false positives and that the test is robust.

Thanks!

—
Reply to this email directly or view it on GitHub.

from aldex2_dev.

chassenr commented on August 31, 2024

Hi again,
I tried emailing Arianne, but I received a message that my email couldn't be delivered because her inbox was full. Do you have another way of contacting her?

Thanks!

Cheers,
Christiane

from aldex2_dev.

aldex.glm with more complicated glm models about aldex2_dev HOT 7 CLOSED

Comments (7)

input:

data.clr - output from aldex.clr

formulaGLM - glm formula to be used, of the form e.g. "~ var1 + var2" or "~ var1 * var2"

env - data.frame containig the input variables for formulaGLM (samples are rows)

output:

list with 2 data.frames containing p-values and BH-adjusted p-values for each term in the model formula

(p.adjust run separately for each term)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent