Git Product home page Git Product logo

Comments (7)

ggloor avatar ggloor commented on August 31, 2024

Christiane

We do plan on adding more complex designs to ALDEx2, but are constrained by manpower. If you have a particular analysis I could see if we could incorporate it into the codebase. We just need to get the appropriate data out of the aldex.clr object.

g

On Oct 26, 2015, at 2:06 PM, Christiane Hassenrück [email protected] wrote:

Hi,
I was wondering whether you are planning on including the option to work with more complicated glm models in aldex.glm, e.g. 2 factors, interaction terms, numerical factors, etc.?
I have tried to find my own solution for this issue using anova() with the option test = "F" instead of drop1 to get the p-values for each term in the model and adjusting the p-values per term over all genes. Do you think this approach is viable?

Thanks a lot!

Cheers,
Christiane


Reply to this email directly or view it on GitHub.

from aldex2_dev.

chassenr avatar chassenr commented on August 31, 2024

Hi,
I was using the same basic structure that you already set up for aldex.glm.
Below is my solution to the problem. I was working on a dataset where I had a 2-factorial design.

Cheers,
Christiane

aldex.glm2 <- function(data.clr, formulaGLM, env) {

input:

data.clr - output from aldex.clr

formulaGLM - glm formula to be used, of the form e.g. "~ var1 + var2" or "~ var1 * var2"

env - data.frame containig the input variables for formulaGLM (samples are rows)

mc.instances <- numMCInstances(data.clr)
feature.number <- numFeatures(data.clr)
vars <- attributes(terms(as.formula(formulaGLM)))$term.labels
nvars <- length(vars)
feature.names <- getFeatureNames(data.clr)
output <- list(p.glm = data.frame(matrix(NA, feature.number, nvars)),
BH.glm = data.frame(matrix(NA, feature.number, nvars)))
colnames(output$p.glm) <- vars
colnames(output$BH.glm) <- vars
rownames(output$p.glm) <- feature.names
rownames(output$BH.glm) <- feature.names
output.mci <- vector("list", length = nvars)
names(output.mci) <- vars

for (i in 1:nvars) {
output.mci[[i]] <- matrix(NA, feature.number, mc.instances)
}
for (mc.i in 1:mc.instances) {
print(mc.i)
t.input <- sapply(getMonteCarloInstances(data.clr), function(y) { y[, mc.i] })
for (i in 1:nrow(t.input)) {
Y <- t.input[i, ]
F1 <- as.formula(paste("Y", formulaGLM))
GLM <- glm(F1, data = env)
AOV <- anova(GLM,test = "F")
for (j in 2:nrow(AOV)) {
output.mci[names(output.mci) == rownames(AOV)[j]][[1]][i, mc.i] <- AOV$"Pr(>F)"[j]
}
}
}
for(i in names(output.mci)) {
output$p.glm[, colnames(output$p.glm) == i] <- apply(output.mci[[i]],1,mean)
output$BH.glm[, colnames(output$p.glm) == i] <- p.adjust(output$p.glm[, colnames(output$p.glm) == i], method = "BH")
}
return(output)
}

output:

list with 2 data.frames containing p-values and BH-adjusted p-values for each term in the model formula

(p.adjust run separately for each term)

from aldex2_dev.

chassenr avatar chassenr commented on August 31, 2024

sorry for the weird formatting... I am new to github. I just copy pasted the R script (including comments).

from aldex2_dev.

ggloor avatar ggloor commented on August 31, 2024

no worries, I'm learning as well. So I'm guessing your code did not give you the result you needed?

from aldex2_dev.

chassenr avatar chassenr commented on August 31, 2024

The code is working and the results seem to make sense. What I am worried about is whether the way I implemented the glm function (and p value adjustment) is correct. Was there a specific (statistical) reason why you only allowed for one categorical variable in aldex.glm and used drop1 to get the p value? Do you think that anova.glm with test = "F" will also give reliable p values? Is it valid to adjust the p values for each term in the model formula separately? I want to make sure that the genes (or in my case rather OTUs) that are detected as differentially abundant are not biased by false positives and that the test is robust.

Thanks!

from aldex2_dev.

ggloor avatar ggloor commented on August 31, 2024

Christiane

Ah, now I see your question.

Great. There is no (statistical) reason why we implemented the glm with only one categorical variable. But, I’m not sure why drop1 was used. Arianne Albert ([email protected]) wrote the glm function.

What I can say, is that using the approach of estimating technical variation using Dirichlet Monte-Carlo instances, and converting all values to log ratios does three things.

First, normalization of read count is taken care of by the use of a prior and the Dir instances. Second, you are weeding out components that have a large stochastic contribution. Third, the log-ratio makes the differences between the components linear. These make the analyses much more robust than point estimates.

g

On Oct 29, 2015, at 4:56 PM, Christiane Hassenrück [email protected] wrote:

The code is working and the results seem to make sense. What I am worried about is whether the way I implemented the glm function (and p value adjustment) is correct. Was there a specific (statistical) reason why you only allowed for one categorical variable in aldex.glm and used drop1 to get the p value? Do you think that anova.glm with test = "F" will also give reliable p values? Is it valid to adjust the p values for each term in the model formula separately? I want to make sure that the genes (or in my case rather OTUs) that are detected as differentially abundant are not biased by false positives and that the test is robust.

Thanks!


Reply to this email directly or view it on GitHub.

from aldex2_dev.

chassenr avatar chassenr commented on August 31, 2024

Hi again,
I tried emailing Arianne, but I received a message that my email couldn't be delivered because her inbox was full. Do you have another way of contacting her?

Thanks!

Cheers,
Christiane

from aldex2_dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.