Git Product home page Git Product logo

divnet's People

Contributors

adw96 avatar ailurophilia avatar astrobiomike avatar bryandmartin avatar mariaavc avatar mooreryan avatar paulinetrinh avatar samhunter avatar svteichman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

divnet's Issues

doSNOW not a requirement

Hi,

Just started playing with DivNet - looks great by the way! One small issue - the function MCmat has a requirement for doSNOW to be installed but doSNOW is only in suggests and so the code will not run in parallel unless it's installed. Not a big issue but it wasn't immediately apparent why the parallel code didn't run (and it makes a difference!)

Thanks for the great package!

Why is BW1 so different in the following 2 plots?

plot(divnet_phylum_name, color = "red") +
ylim(0,2) + ylab("Shannon estimate with DivNet")

lee_phylum %>%
otu_table %>%
apply(2, shannon) %>%
data.frame("shannon" = .) %>%
rownames_to_column %>%
ggplot(aes(x = rowname, y = shannon)) +
geom_point() +
theme_bw() +
ylim(0,2) + ylab("Naive Shannon estimate") + xlab("samples")

DivNet/betta post-hoc testing

I had a question about interpreting the output as well....say for this example I wanted to know if there was a significant difference between Lung and Liver, or Kidney and Lung. Is there an easy way to do some post-hoc testing with the betta output?

Originally posted by @mooreryan in #34 (comment)

simplifyBeta not generalisable

simplifyBeta(divnet_phylum_char, lee_phylum, "bray-curtis", "char")
throws an error.

Fix is to replace

 names(vars) <- physeq %>% 
    sample_data %>% 
    get_variable(c("X.SampleID")) 

with

names(vars) <- physeq %>% 
    sample_names

DivNet error bars

I had a question regarding how to describe error bars in DivNet figures.

Here are the relevant lines of code:

Variance is the var function:

  output_list[["shannon-variance"]]  <- list_of_fitted_models %>% 
    lapply(function(x) sapply(x$shannon, function(x) x$estimate)) %>%
    simplify2array %>%
    apply(1, var, na.rm = TRUE)

error is square root of variance, (so standard deviation)

output_list$shannon[[i]]$error <- sqrt(variance_estimates$`shannon-variance`)[i]

and the interval is twice the SD in both directions:

output_list$shannon[[i]]$interval <- c(output_list$shannon[[i]]$estimate - 2*output_list$shannon[[i]]$error,

So if I were to describe the error bars in a manuscript, what would be the best way? Something like: "error bars represent four standard deviations" or "error bars represent two standard deviations from the mean" or something else?

feature request: allow random effects in testDiversity

Hi,

I just read the recent works from your group on estimating microbial diversity and relative abundance with measurement errors and decided to use the methods proposed. I was not aware of the pitfalls on using the observed diversity metrics as if they were the real ones until I read the papers. Great works that you have done!

I want to compare the alpha diversity among groups with random effects. While I can use the breakaway::betta_random to compare the species richness , similar functions for comparing the Shannon and Simpson index are missing in the Divnet package. It'd be great to also allow modelling random effects in the testDiversity as well.

Meanwhile, can I use betta_random to do hypothesis testing for Shannon and Simpon index?

Yanxian

Understanding the use of "X" in divnet() function

Hi Amy,

I'm in Rob Knight's lab and excited to see (and use) your DivNet package — congrats on the in press acceptance! I have a question regarding the use of the covariate matrix ("X") in the divnet() function. The example given in your tutorial uses the Lee dataset, picking "char" (basalts of different characteristics) as the covariate. I understand that this is useful when sampling from different environments, but I'm trying to figure out if it would include samples from the same environment that are affected by external factors. A few examples:

  • Plots of soil sampled in winter versus plots of soil sampled in summer in the same region (X = season)
  • A series of gut microbiome samples from patients before and after the influence of antibiotics (X = antibiotic status)
  • Liver tissue samples of tumors that originate in the liver versus those that metastasized to the liver among several patients (X = tumor site of origin) (cf. Bullman et al., 2017. Science.)

Is it permissible to use these differences as covariates ("X") for the divnet() function (e.g. antibiotic vs. non-antibiotic gut samples) even though they're technically sampled from the same environment?

Thanks a ton!

set seed for vignette for reproducibility

Not all users are going to get the same results when they compile the vignette because we don't set a seed. Fix this and make a note to the user about why setting seeds is important.

Paired samples using betta_random/testDiversity

Hi,

I'm still diving deeper into your wonderful packages for estimating alpha diversity of microbiome data.

I have paired data from a time course experiment (three time points) testing three different products (skin swabs). For each individual, all three products were samples at all three time points.

So far, I estimated Shannon diversity using divnet() and now I would like to test whether there are differences between time points for each product. Usually, I used non-parametric Wilcox tests to perform all comparisons needed (with p-value adjustment for multiple testing). I'm wondering whether I can use betta_random as well to do the statistical testing.

Here is my piece of code but I'm not sure whether the more traditional way (Wilcox test) is more suitable in this case:

dv_ps_cyp <- x %>% divnet(X = "Time", base = "ASV1", ncores = 10)

ds <- dv_testing_asv1$shannon %>%
summary %>%
add_column("SampleNames" = x %>% otu_table %>% sample_names,
"Time" = x %>% sample_data %>% .$Time,
"Product" = x %>% sample_data %>% .$Product) %>%
arrange(Time, Product)

betta_random(chats = ds$estimate,
ses = ds$error,
X = model.matrix(~Time*Product, data = ds),
groups=ds$SampleNames) %>% .$table

Thanks in advance for any advice!

Best,
Axel

Converting to Exponential Shannon

Hi,

Apologies in advance for the mathematically-ignorant question.

To convert to exponential Shannon estimates, can I just do exp(estimate) and exp(error)? Or does exponential Shannon need to be estimated from the beginning to properly estimate error?

Thanks,
Ian

'glue::collapse' is deprecated. Use 'glue_collapse' instead.

When running the following;
simplifyBeta(dv_ps_locB, ps, "bray-curtis", "Sample_Type")

I get the following warning:
Error: Unknown columns 282, 283, 284, 285, 286 and ...
In addition: Warning message:
'glue::collapse' is deprecated.
Use 'glue_collapse' instead.
See help("Deprecated") and help("glue-deprecated").

While trying to sort out the "Error: Unknown columns 282, 283, 284, 285, 286 and ..." part of the error, which I think is a separate issue, I discovered that the glue package has a deprecated function. Version 1.2.0 of glue uses "collapse" and glue 1.3.0 uses "glue_collapse".

Don't know that its an urgent issue at the moment as the code still ran, but just FYI.

Problems with lams parameters

### Amy's script for testing DivNet

library(devtools)
library(phyloseq)
library(ggplot2)
library(magrittr)
library(breakaway); packageVersion("breakaway") # 4.6.16
library(tidyverse)
library(speedyseq)
library(DivNet); packageVersion("DivNet") # 0.3.3

data(Lee)

lee_phylum <- speedyseq::tax_glom(Lee, taxrank="Phylum") 


library(glasso); packageVersion("glasso") # 1.11
library(pulsar); packageVersion("pulsar") # 0.3.6
library(SpiecEasi); packageVersion("SpiecEasi") # 1.0.7
library(huge); packageVersion("huge") # 1.3.4

DivNet:::stars 
dv_stars <- lee_phylum %>% 
  divnet(X="type", ncores = 4, network="stars", variance="none")
dv_stars

"X" vs formula confusion

Dear Amy,

First of all - many thanks for producing great tools to You and Your team! : ]

As in the title, Im having problem with understanding X and formula parameters:

  1. In the divnet function description You write that X parameter is "The covariate matrix, with samples as rows and variables as columns". Yet, in the vignette, the actual value passed into the parameter is a string (in phyloseq object - a column name in sample data). I understand that some sub-setting is being done here, but still...

  2. From both vignette and Your answer to this issue: #53 it seemed to me as if "X" parameter was being used in the same way as formula (i.e. You provide examplary X value: X = season + plot). Yet, formula is a separate parameter...

  3. Reading the divnet function code it seems to me, that when using formula, the X value is ignored. This is because at the beginning of the function, You check:

  if (!is.null(formula)) {
    if ("phyloseq" %in% class(W)) {
      X <- data.frame(phyloseq::sample_data(W))
      X <- stats::model.matrix(object = formula, data = X)
    }

Which I think means, that if formula is provided, X is produced based only on W and formula. Are these two parameters exclusive?

Perhaps clarification of these concepts would be beneficial for users such as myself - with poor statistical knowledge and mediocre coding skills : ]

Thank You and best wishes,
Adrian

feature request: Plot Diversity Estimates per Factor

Hi,
I recently started to use DivNet to estimate Shannon for microbiome samples.
I'm wondering whether it is possible to plot only the diversity estimate (e.g. Shannon) just for the factors (and not all individuals). Basically, I could restrict the plot myself by selecting just one sample per group and do the plot based on the estimate and error but I think it would by have such an option in the plot.diversityEstimates() command already implemented.

Best regards,
Axel

Covariate Modeling

Quick question,

is it possible include multiple covariates in DivNet?
Ex. I have five location but i have a sample from pond and a swamp from each location and want to compare the diversity between ponds and swamps controlling for the location.

Thank you!

Verifying divnet

Question regarding output, if I'm getting a p-value of 0, I'm assuming that's implying that the p-value is just extremely small and the software just concatenated it to 0?

Example
`

Hypothesis testing:
p-value for global test: 0
Estimates Standard Errors p-values
(Intercept) 1.8138791 0.04773023 0
predictorsnonIBD 0.6719023 0.11433915 0

Relationship between Bray-Curtis distance and Shannon div estimates

Hi Amy,

I hope this is an appropriate place to raise a question regarding the vignette.

In the DivNet vignette, it said:

For glassy and water samples, the estimated diversity is in the middle of the individual diversity estimates, while for altered samples the estimated diversity is lower. That's because the water and glassy samples are much more similar to each other in terms of relative abundances than the altered samples, as we can see from the distribution of Bray-Curtis distances between the samples

However, in the estimates of Bray-Curtis distance plot below, glassy.water interaction showed the most distance, while altered.altered showed the least distance. Doesn't that indicate exactly the opposite to the statement above i.e. altered samples are quite similar to each other while glassy vs. water are quite different? Unless you meant glassy.glassy and water.water are quite similar, but those comparisons were not shown.

If possible, could you please clarify the relationship between similarity between samples (Bray-Curtis distance) and the resulting estimated Shannon diversity?

Thank you for all the hard work into DivNet and I am eager to implement it into my future projects!

Kevin

Tracking accept values in the EM loop

While working on #45 I found that the acceptance values were being stored in accept_list but that they weren't actually being used anywhere (as far as I can tell).

It's created here:

accept_list <- matrix(0, nrow = EMiter, ncol = N)

And updated here during the EM iterations:

accept_list[em,] <- accepts

I can't seem to find that being used anywhere....is it necessary to track them like this?

Release of the package

Hi! Many thanks for providing these methods in such an easy way as this package does!.

Is there any plan to release them in any repository like Bioconductor or CRAN?
I've seen that breakaway was on CRAN for some years but was later removed. This would make possible for other packages on CRAN and Bioconductor to use these methods.
This would also imply that all the dependencies would need to be on CRAN. Not sure if there is room for such move.

SVD failed; sigma is error

hello it's me again breaking things.

I was doing a test run on my data with the base taxon set to the taxa with the highest abundance and this is the error I received

SVD failed; sigma is
followed by a long string of numbers of this form

            [,1]        [,2]        [,3]         [,4]         [,5]         [,6]        [,7]         [,8]        [,9]      [,10]       [,11]
  [1,] 17.27093822 -0.68898413 -0.19269662 -4.162210948 -3.091394414  1.462428498  3.92078759 -1.856796600  1.20663115  6.2004287  2.45530317

and then

Error in default_network(sigma) :

Help? I'm guessing the decomposition broke down somewhere

Shannon returned all the same value

Hallo, I'm trying out the DivNet to calculate shannon diversity.
I put directly the raw counts (without using phyloseq).
I got the exact same estimation of diversity for all the samples (including error) which I find suspicious. Any Idea.

alpha_estimate class

adw96/breakaway now has this badass S3 class alpha_estimate, with sweet printing functions and plotting functions. Convert the output of DivNet into this class.

Boxplot alpha diversity?

Hi,

Sorry for my silly question, but how would you boxplot the output of alpha diversity (ie. Shannon) from DivNet?

Thanks!

Partial divnet() and Hypothesis Testing with a Random Effect (and Potentially an Interaction)

Hi,

First of all, a big thank you and a bow for maintaining such amazing packages as breakaway and DivNet.

I have a two-part question, however. Firstly, if I am only interested in estimating Shannon's and Simpson's diversities, is there a way to do that with the divnet() function? Currently, it is taking me too long to calculate all the estimates and it mostly errors due to memory overload.

Secondly, if I wanted to carry out some hypothesis testing with these estimates, what would be the function command for that? I want to test the estimate with an interaction (is that even possible?) and a random effect, i.e. Shannon ~ Organ*Health.Status + (1|Individual.ID).

Thanks so much for your time!

Liisa

DivNet runtime

Do you have any info on the runtime/memory usage of the divnet function? In the vignettes it says it takes a while with lots of taxa, but I was wondering if you had any more specific info.

For example, I was trying to run it on data ~300 samples by ~10,000 taxa/ASVs with covariates, but it kept crashing. Is there a limit on taxa or samples that we should keep in mind?

No taxa observed in all samples? error

Hello Dr. Willis,

I was fiddling around with Divnet and my phyloseq dataset and attempted to run it at the ASV level as well as the Genus level. I both cases I obtained the following error

Error in pick_base(W) : Yikes! No taxa observed in all samples!
Pick which taxon is to be the base

First, how would one go about picking a taxon to be the base and second, why do I need one?

Best,
Nur Shahir

nonparametric variance issue

set.seed(1)
my_counts <- matrix(rpois(30, lambda=10), nrow = 6)
my_counts
my_covariate <- cbind(1, rep(c(0,1), each = 3), rep(c(0,1), 3))
my_covariate

roxygen2::roxygenise(package_directory)
build(package_directory)
install(package_directory)
library(DivNet)

divnet(my_counts, my_covariate, variance="nonparametric")

Verifying DivNet getting started vignette

I am currently working through the DivNet getting started vignette and seem to be getting different p-values for hypothesis testing than those in the tutorial.

estimates <- divnet_phylum_char$shannon %>% summary %$% estimate`
ses <- sqrt(divnet_phylum_char$`shannon-variance`)
X <- breakaway::make_design_matrix(lee_phylum, "char")
betta(estimates, ses, X)$table

Output:

                     Estimates Standard Errors p-values
(Intercept)             1.34699686      0.02031768    0.000
predictorsbiofilm      -0.91477148      0.08798129    0.000
predictorscarbonate    -0.06179152      0.11506678    0.591
predictorsglassy       -0.21181413      0.04741169    0.000
predictorswater         0.11174642      0.10319277    0.279

Based on the vignette there should be no significant differences between altered and glassy (p-value of 0.53) but I am getting a p-value of 0. Not sure where I may be going wrong.

Many thanks!

Lara

Understanding the use of "X" in divnet() function

Hi Amy,

I'm in Rob Knight's lab and excited to see (and use) your DivNet package — congrats on the in press acceptance! I have a question regarding the use of the covariate matrix ("X") in the divnet() function. The example given in your tutorial uses the Lee dataset, picking "char" (basalts of different characteristics) as the covariate. I understand that this is useful when sampling from different environments, but I'm not entirely if it would include samples from the same environment that are affected by external factors. A few examples:

  • A plot of soil sampled in winter versus that same soil sampled in summer
  • The gut microbiome under the influence of antibiotics versus a gut microbiome without antibiotics
  • Liver tissue samples of tumors that originate in the liver versus those that metastasize to the liver (e.g. Bullman et al., 2017. Science.)

Is it permissible to use these differences as covariates ("X") for the divnet() function (e.g. antibiotic vs. non-antibiotic gut samples) even though they're technically sampled from the same environment?

Thanks a ton!

nonparametric variance wrong size

divnet(my_counts, my_covariate, variance="nonparametric", nsub=7)
divnet(my_counts, my_covariate, variance="nonparametric", nsub=3)

fix link to tutorial

From README.md: "See here for a full tutorial, and the paper for more details." has broken links. FIX!

Speeding up run time on wide datasets

Moved over from twitter.

I'm trying to run divnet on ASVs with a dataset of 44 samples and 19,921 ASVs. No ASVs appear in all samples so I've chosen a reference ASV that is present in 42 of the 44 indicated by ref_otu. I'm also leaving X = NULL with no design matrix so I'm just trying to estimate diversity and confidence intervals for each sample. physeq is my phyloseq object. If I run this on a cluster with 28 cores and 128 GB of memory, I don't see any progress after ~30 minutes. Running locally on my 4 core, 16 GB machine it crashes, I think because it runs out of memory. Function call below:

asv_div <- divnet(physeq, ncores = 28, base = ref_otu)

Thank you for the help on this!

R warning

I get the following warning when trying to run DivNet, even with the example data

Warning message in if (class(test) == "try-error") {:
“the condition has length > 1 and only the first element will be used”
Warning message in if (class(outcome) == "try-error") {:
“the condition has length > 1 and only the first element will be used”
Warning message in if (class(outcome) == "try-error") {:
“the condition has length > 1 and only the first element will be used”
Warning message in if (class(outcome) == "try-error") {:
“the condition has length > 1 and only the first element will be used”
Warning message in if (class(outcome) == "try-error") {:
“the condition has length > 1 and only the first element will be used”

Numbered samples cause error

Hi Amy,

When running this command:
simplifyBeta(dv_ps_st, ps, "bray-curtis", "Sample_Type")

I get this error:
Error: Unknown columns 282, 283, 284, 285, 286 and ...

I think its coming from lines 27 & 31 in the beta_diversity_helper.R file. If you have numbered samples it puts an X in front of the number sample when it makes a dataframe and then it can't match names anymore.

dv_ps_locB[[measure]] %>%
data.frame

     X282      X283      X284      X285      X286      X287      X288

282 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000

If you change lines 27 & 31 from just data.frame to data.frame(check.names =FALSE) it won't put the X in front of the number and the script runs on my end. Happy to send my phyloseq object over to make sure it works on your end.

packageVersion("DivNet")
[1] ‘0.3.0’

2 taxa

W <- matrix(rpois(16, 10), nrow=4)
W
divnet(W)
w2 <- W
w2[,4] <- 0
w2
divnet(w2)
w2[,3] <- 0
w2
divnet(w2)

2 taxa should be possible -- overdispersed binomial is just as legitimate. TODO

default X is specified differently for phyloseq W and matrix W

I noticed a discrepancy in the results of divnet when I use a phyloseq object vs just a plain matrix.

Using the data from your vignette:

library(phyloseq)
library(breakaway)
library(DivNet)
library(magrittr)

data(Lee)
lee_phylum <- Lee %>%
  tax_glom(taxrank = "Phylum") %>%
  subset_samples(temp != "<NA>")

So, lee_phylum is a phyloseq object. If we use that as input to divnet with X = NULL (default):

divnet_phyloseq <- divnet(W= lee_phylum, base = 1)
summary(divnet_phyloseq$shannon)
# A tibble: 14 x 7
   estimate   error lower upper sample_names name   model    
      <dbl>   <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1    1.82  0.0177  1.79  1.86  BW1          DivNet Aitchison
 2    1.08  0.0328  1.02  1.15  BW2          DivNet Aitchison
 3    1.12  0.0235  1.07  1.16  R10          DivNet Aitchison
 4    1.01  0.0145  0.978 1.04  R11          DivNet Aitchison
 5    0.435 0.0233  0.389 0.482 R11BF        DivNet Aitchison
 6    1.41  0.0171  1.38  1.44  R1A          DivNet Aitchison
 7    1.38  0.0142  1.35  1.41  R1B          DivNet Aitchison
 8    1.29  0.00983 1.27  1.31  R2           DivNet Aitchison
 9    1.29  0.00654 1.27  1.30  R3           DivNet Aitchison
10    1.37  0.0124  1.34  1.39  R4           DivNet Aitchison
11    1.32  0.00981 1.31  1.34  R5           DivNet Aitchison
12    1.43  0.00805 1.41  1.45  R6           DivNet Aitchison
13    1.43  0.0152  1.40  1.46  R8           DivNet Aitchison
14    1.11  0.0155  1.08  1.14  R9           DivNet Aitchison

However, if we use the same otu table as a matrix:

mat <- t(otu_table(lee_phylum)@.Data)
divnet_matrix <- divnet(W=mat, base = 1)
summary(divnet_matrix$shannon)
# A tibble: 14 x 7
   estimate  error lower upper sample_names name   model    
      <dbl>  <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1     1.07 0.0761 0.913  1.22 BW1          DivNet Aitchison
 2     1.07 0.0761 0.913  1.22 BW2          DivNet Aitchison
 3     1.07 0.0761 0.913  1.22 R10          DivNet Aitchison
 4     1.07 0.0761 0.913  1.22 R11          DivNet Aitchison
 5     1.07 0.0761 0.913  1.22 R11BF        DivNet Aitchison
 6     1.07 0.0761 0.913  1.22 R1A          DivNet Aitchison
 7     1.07 0.0761 0.913  1.22 R1B          DivNet Aitchison
 8     1.07 0.0761 0.913  1.22 R2           DivNet Aitchison
 9     1.07 0.0761 0.913  1.22 R3           DivNet Aitchison
10     1.07 0.0761 0.913  1.22 R4           DivNet Aitchison
11     1.07 0.0761 0.913  1.22 R5           DivNet Aitchison
12     1.07 0.0761 0.913  1.22 R6           DivNet Aitchison
13     1.07 0.0761 0.913  1.22 R8           DivNet Aitchison
14     1.07 0.0761 0.913  1.22 R9           DivNet Aitchison

^ This result appears to be incorrect, as all the estimates are the same (?). I traced the discrepancy back to how X is specified differently for a phyloseq object: https://github.com/adw96/DivNet/blob/master/R/divnet_main.R#L72
and how it is specified for a generic matrix: https://github.com/adw96/DivNet/blob/master/R/divnet_main.R#L91

If I use the phyloseq specification for X with the matrix, then I get the same result as with a phyloseq object:

xx <- row.names(mat)
X <- model.matrix(~xx)
divnet_matrix_X <- divnet(W=mat, X = X, base = 1)
summary(divnet_matrix_X$shannon)
# A tibble: 14 x 7
   estimate   error lower upper sample_names name   model    
      <dbl>   <dbl> <dbl> <dbl> <chr>        <chr>  <chr>    
 1    1.82  0.0170  1.79  1.86  BW1          DivNet Aitchison
 2    1.08  0.0466  0.988 1.17  BW2          DivNet Aitchison
 3    1.12  0.00693 1.10  1.13  R10          DivNet Aitchison
 4    1.01  0.0152  0.977 1.04  R11          DivNet Aitchison
 5    0.435 0.00909 0.417 0.453 R11BF        DivNet Aitchison
 6    1.41  0.0197  1.37  1.45  R1A          DivNet Aitchison
 7    1.38  0.0147  1.35  1.41  R1B          DivNet Aitchison
 8    1.29  0.00968 1.27  1.31  R2           DivNet Aitchison
 9    1.29  0.00927 1.27  1.31  R3           DivNet Aitchison
10    1.37  0.00703 1.36  1.38  R4           DivNet Aitchison
11    1.32  0.0103  1.30  1.35  R5           DivNet Aitchison
12    1.43  0.0205  1.39  1.47  R6           DivNet Aitchison
13    1.43  0.0129  1.40  1.45  R8           DivNet Aitchison
14    1.11  0.0202  1.07  1.15  R9           DivNet Aitchison

I assume, in practice, we would want to use the latter, and the current implementation is just a bug?

Thanks very much for your very useful research and R packages.

full phyloseq object and otu_table give different results

Probably something to do with how otu_table are transposed by default

library(devtools)
library(tidyverse)
library(phyloseq)
install_github("adw96/DivNet")
library(DivNet)
packageVersion("DivNet") # 0.3.2

data("GlobalPatterns")
# since this is just an illustration, just get the most abundant 5 taxa
top_five_taxa = names(sort(taxa_sums(GlobalPatterns), decreasing = TRUE)[1:5])

# don't specify covariates
dv <- GlobalPatterns %>% prune_taxa(top_five_taxa, .) %>% divnet 
dv$fitted_z 

# this explicitly strips the sample_data
dv2 <- GlobalPatterns %>% prune_taxa(top_five_taxa, .) %>% otu_table %>% divnet 
dv2$fitted_z 

rownames as default covariate

Make rownames the default X variable

`xx <- sample_data(lee_phylum) %>% rownames
yy <- model.matrix(~xx)

divnet_phylum_name <- lee_phylum %>%
divnet(X = yy, ncores = 4)`

Bray-curtis statistical test

Hello there!

I wanted to know if there is a way to test with betta changes in the bray-curtis dissimilarity measures across an environmental parameter. I have tried this:

lee_phylum <- tax_glom(Lee, taxrank="Phylum")
divnet_phylum <- lee_phylum %>%
  divnet(ncores = 4)

estimates <- simplifyBeta(divnet_phylum,
                          lee_phylum,
                          "bray-curtis", "char") %>% pull(beta_est)
ses <- sqrt(simplifyBeta(divnet_phylum,
                         lee_phylum,
                         "bray-curtis", "char") %>% pull(beta_var))

bt.bray <- betta(estimates, ses, divnet_phylum$X)$table

I guess that beta diversity is a dissimilarity measurement between two samples and not a measurement of one sample and therefore you cannot predict a change with the betta approach. What would be the best way to statistically test that there is more differences between different conditions?

Additionally, when I compared the bray curtis dissimilarities in my data, I observe different values between the comparison winter.autumn and the comparison autumn.winter. Does this makes sense? I would expect an identical violin plot for these cases.

image

My intention is to know which are the most dissimilar seasons and obtain an estimate of the general dissimilarity for each case.

Thanks a ton for everything!

Error in default_network(sigma) :

I'm getting this "Error in default_network(sigma)" that I was trying to figure out but figured it'd probably be a lot faster to ask one of you @adw96 or @bryandmartin.

I'm using fit_aitchison on mydata.txt which is a 73x1271 data frame (73 samples, 1271 ASV's) of relative abundances.

my_x.txt
mydata.txt

this code should produce the error
my_w <- read.table("mydata.txt", sep="\t")
my_x <- read.table("my_x.txt", sep="\t")
divnet_estimates <- DivNet::fit_aitchison(my_w, base = 1040, X = my_x)

what am I doing wrong? :(

Interpretation of betta() output

Thank you so much, Amy, this has been really helpful!

For other users, my data set is 350 samples * 7100 ASVs. Still struggling to get divnet() to work due to memory overload, but attempting to run this on a server.

The betta_random() works like a charm, according to your shared vignette. Just one question about the output.

results <- betta_random(chats = estimated_richness$estimate,
ses = estimated_richness$error,
X = model.matrix(~Organ*Health.Status, data = estimated_richness),
groups=estimated_richness$Individual.ID)

For clarification:
Organ: heart, liver, lung, kidney
Health.Status: healthy, symptomatic

Example

Are the first estimates for healthy organs and then the Health.Status:Symptomatic is for heart*symptomatic?

Thanks again!

Originally posted by @liisaveerus in #31 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.