ax3man / phylopath Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 3.0 18.08 MB

Perform phylogenetic path analysis in R.

Home Page: https://ax3man.github.io/phylopath/

R 100.00%

analysis comparative-methods path phylogenetics r rstats

phylopath's People

Contributors

Stargazers

Watchers

Forkers

lzhangss achazhardenberg schnappi-wkl

phylopath's Issues

Remove import of guide_train from ggraph

This package imports and reexports guide_train.edge_colourbar from ggraph. As guide_train is now a proper exported S3 generic in ggplot2 the specific methods are no longer exported and can thus not be imported.

Can I get you to fix this and submit to CRAN?

phylopath/R/internal.R

Line 265 in b260035

ggraph::guide_train.edge_colourbar

Error in number of dimensions with vignette example

Hi, I am running the example describe here https://cran.r-project.org/web/packages/phylopath/vignettes/intro_to_phylopath.html and when I run results <- phylo_path I get this error message
Error in s[nrow(s), ncol(s)] : incorrect number of dimensions

Any thoughts?
(I was trying first with my data and had the same problem so I decided to try the example). I restarted Rstudio and the problem persists)

Integration of sensiPhy functions

A future enhancement to Phylopath - if it was technically possible - would be the integration of the functions in the sensiPhy package to provide information on the sensitivity of path models to sample, data, and phylogenetic uncertainty. Like Phylopath, sensiPhy uses the phylolm package, which might make integration easier. A challenge would be providing summary information on sensitivity for a whole path model, though even generating a table that presents the sensitivity information for each individual regression in the model in one place would be helpful.

Issues comparing models with phylo_path

Hi,

I'm using the phylo_path function to compare models (using the logistic_IG10 method). I'm getting errors saying that it can't fit a model with two particular variables in it, but I've checked and checked again and there are no models in the set that contain both these variables together, even via indirect links. I think I must have misunderstood what phylo_path is doing - I thought it was running each model independently, but is each model also using the other models in the set?

Many thanks

When gls2 fails, an informative error should be returned.

automating model generation

Hello again,

I've been trying to automate the model generation procedure so that all possible causal models, except for those that are either cyclic or fully connected, can be generated for some given number of variables without having to enter each model manually. Using the rhino data set to illustrate the point (and with three variables only), the main code looks like this:

library(phylopath)
library(gtools)

Nvar <- 3  #number of variables
#detach("package:ggm") # Disable ggm package because it masks DAG() function in 'phylopath::DAG'
dagobj <-  DAG(BM ~ LS, LS ~ DD) # generate a DAG with the three variables of interest
dagobj[1:Nvar,1:Nvar] <- matrix(0,Nvar,Nvar) # preset DAG matrix entries to 0

# --- Generate all possible binary strings, in order to allocate to DAG matrices
zerosones <- permutations(2,Nvar^2,v=c(0,1),repeats.allowed=TRUE)

# --- Remove models with reflexive edges (these are not DAGs)
cycles <- matrix(0,dim(zerosones)[1],1)
for (i in 1:Nvar){ cycles[which(zerosones[,Nvar*(i-1)+i] == 1)] <- 1 }
zerosones_2 <-  zerosones[-which(cycles==1),]

# --- Allocate binary strings to copies of the DAG object
models_1 <- rep(list(dagobj),dim(zerosones_2)[1])
for (i in 1:dim(zerosones_2)[1]){ models_1[[i]][,] <- zerosones_2[i,] }

# --- Further remove models, this time ones that have bidirectional edges
cycles_2 <- matrix(0,length(models_1),1)
for (k in 1:length(models_1)){ cycles_2[k] <- length(intersect( which(models_1[[k]] == t(models_1[[k]])), which(models_1[[k]]==1))) >0}   
models_2 <- models_1[which(cycles_2==0)]

# --- then do an additional check for cyclic models using ggm::isAcyclic()
#library(ggm)
cycles_3 <- matrix(0,length(models_2),1)
for (i in 1:length(models_2)){ cycles_3[i] <- ggm::isAcyclic(models_2[[i]])==0 }
models_3 <- models_2[which(cycles_3==0)]

# --- remove models for which d-separation cannot be evaluated due to full connectedness, using ggm::basiSet()
cycles_basisnull <- matrix(0,length(models_3),1)
for (i in 1:length(models_3)){ cycles_basisnull[i] <- is.null(ggm::basiSet(models_3[[i]])) } # ggm::basiSet
models_auto <- models_3[which(cycles_basisnull==0)]

After running the above code, the list models_auto should contain 19 models. And it should be basically identical to the following manually constructed list:

models_manual <- list(
  m1=DAG(BM~BM,LS~LS,DD~DD), m2=DAG(BM~LS,DD~DD), m3=DAG(BM~DD,LS~LS), m4=DAG(BM~DD,BM~LS), m5=DAG(LS~BM,DD~DD), m6=DAG(LS~BM,BM~DD), 
m7=DAG(LS~DD,BM~BM), m8=DAG(LS~DD,BM~LS),  m9=DAG(LS~DD,BM~DD), m10=DAG(LS~DD,LS~BM), m11=DAG(DD~BM,LS~LS), m12=DAG(DD~BM,BM~LS), 
m13=DAG(DD~BM,LS~BM), m14=DAG(DD~BM,LS~DD), m15=DAG(DD~LS,BM~BM), m16=DAG(DD~LS,BM~LS), m17=DAG(DD~LS,BM~DD), m18=DAG(DD~LS,LS~BM), m19=DAG(DD~LS,DD~BM)
)

However, whereas phylo_path() works for models_manual, it doesn't work for models_auto. Specifically, phylo_path(models_auto, rhino, rhino_tree)
gives the following error:

Error in mutate_impl(.data, dots) :  Evaluation error: comparison (6) is possible only for atomic and list types.

Do you have any idea why phylo_path() works for one but not the other? Is the apparent identity between models_manual and models_auto actually false? I need to automate model generation in this way because I'm going to scale up from 3 to 6 or 7 variables (an huge increase in the number of possible models), and also want to be able to approach the analysis in a theory-free manner.

Any advice would be appreciated.

Implement standard errors and confidence intervals for estimated DAGs.

Return SE and CI and p-values for best model (should be straight forward).
Return SE and CI for average model (much more complicated).

Change parallel implementation to future

Also see #16.

phylo_path fails with data.table objects.

I should probably just coerce all to data.frame

Unhelpful error when using a fully connected DAG.

Hello Wouter,

I have no problem running the example analysis that uses the 'rhino' data set and rhino_tree. But when I try to run the function phylo_path() using instead a tree downloaded from 10kTrees, I get errors such as:

Error in mutate_impl(.data, dots) : Evaluation error: comparison (6) is possible only for atomic and list types.

and

Error in cor_fun(par, .x): object "phy" is not of class "phylo"

10kTrees is (as you might know) a resource used by many researchers, and the downloaded tree should be a properly formatted phylo object. Even when I use the rhino data set and the models as specified in your tutorial but together with a 10kTrees primate tree (after replacing the rhino species names with a subset of the primate ones), I still get the former error listed above, so I assume the problem is the tree itself, although I'm not completely sure. I do have the most recent versions of the dependent packages.

I have no idea what I should be trying out, so any suggestions would be appreciated! If there is any other information or code that I could provide in order to clarify, please let me know.

Thanks,
Ryu

Ensure compatibility with next ggraph version

My reverse dependency check shows some problems between phylopath and the next ggraph version:

checking Rd cross-references ... WARNING

Missing link or links in documentation object 'plot.DAG.Rd':
  ‘[ggraph:create_layout.igraph]{ggraph::create_layout.igraph()}’

Missing link or links in documentation object 'plot.fitted_DAG.Rd':
  ‘[ggraph:create_layout.igraph]{ggraph::create_layout.igraph()}’

See section 'Cross-references' in the 'Writing R Extensions' manual.

This is basically because the internal network representation has changed. I suggest you link directly to create_layout instead of to one of the methods. I plan on submitting ggraph to CRAN on September 1st and hope you'll be able to have a fix on CRAN by then.

best
Thomas

Altering values for the btol argument when using phylopath

Hi Wouter,

Thanks for making this very useful package!

Is there a way to alter the value for the 'btol' argument when using the phylo_path function? Normally, I am able to make this adjustment when using phyloglm but I can't seem to make this adjustment when using phylopath.

Do you have any suggestions?

Thanks,
Louis

package `graph` dependency?

@Ax3man I need to update phylosem and phylopath is a dependency. However, the code below fails with a warning about package graph not being available. graph appears to be a dependency of ggm. Any ideas on what's happening here?

models_pp <- phylopath::define_model_set(
  one   = c(RS ~ DD),
  two   = c(DD ~ NL, RS ~ LS + DD),
  three = c(RS ~ NL),
  four  = c(RS ~ BM + NL),
  five  = c(RS ~ BM + NL + DD),
  six   = c(NL ~ RS, RS ~ BM),
  seven = c(NL ~ RS, RS ~ LS + BM),
  eight = c(NL ~ RS),
  nine  = c(NL ~ RS, RS ~ LS),
  .common = c(LS ~ BM, NL ~ BM, DD ~ NL)
)

Fix warnings generated by ggraph

ggraph has changed the curvature argument to strength.

Feature request: non-binary categorical variables

Hi @Ax3man!
Thanks for a really useful package! I was wondering if you are planning to provide support for the inclusion of non-binary categorical variables.
Thank you!
Joan

Erros in labels=c()

Hello
I am Abraão
Please can you help me?
I would like know how I change the variables names in my final plot using;
plot(best_model, labels=c(vol="volume

"...))
because I am having the following error:
some nodes are missing from labels

Thanks!

phylo_path() produces error

Hi Wouter,

I am getting an error when running even the example of phylo_path()

code:

candidates <- list(A = DAG(LS ~ BM, NL ~ BM, DD ~ NL),
                   B = DAG(LS ~ BM, NL ~ LS, DD ~ NL))
p <- phylo_path(candidates, rhino, rhino_tree)

error:

Error: Fitting the following model:
    DD ~ NL + BM 
produced this error:
    Error in nlme::gls(..., correlation = cor_fun(par, .x)): 
model must be a formula of the form "resp ~ pred"

I am running R 3.4, phylopath 0.3.0 and nlme 3.1-131 on MAC OSX

Any ideas why that might be?

Thank you for your help and the great package!

Cheers,

Christoph

Formal argument "boot" matched by multiple actual arguments

Hi there,

great package, thanks for sharing and maintaining it!

I was able to define boot values in an older version of phylo_path(). When I try and define boot values in the latest version (v.1.1.3 from CRAN) of phylopath I get the following error: "formal argument "boot" matched by multiple actual arguments." R and all my other packages are up to date.

Is there a change in how the boot argument is being passed on to phylolm in the updated version of phylopath? or am I just doing something silly here? The outcome variable is continuous if that helps and a sample of code is below. Happy to pass on any additional information if that would be useful.

Much appreciated,
Joseph

best_result <- phylo_path(best_models_DAG,
data = data_use,
tree = pruned_tree,
model = 'lambda',
method = "logistic_MPLE",
boot = 100)

Pruned tree to drop species not included in dat.
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
Error: Fitting the following model:
Unaffiliated ~ Education_Index + Life_expectancy_Index
produced this error:
formal argument "boot" matched by multiple actual arguments

Detect duplicated dsep models and only run them once.

var_labels broken, at least for plot.fitted_DAG

d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS)
d_fitted <- est_DAG(d, rhino, ape::corBrownian, rhino_tree)

plot(d_fitted)
plot(d_fitted, labels = c(LS = 'ls', BM = 'bm', NL = 'nl', DD = 'dd'))

Should be the same ordering.

Is it possible to obtain the baseline values from models?

Hi Ax3man,

I would like to be able to convert the odds ratios obtained from phylopath models to probabilities to make results easier to interpret. For this I would need the baseline coefficients, which aren't provided. I presume it doesn't make sense to have a baseline coefficient when each phylopath model is comprised of many regression models, but thought I would ask just in case!

Many thanks

phylo_path errors when parallel is not NULL

However, whenever I start a cluster e.g.

result <- phylo_path(all_models, 
                     data = data_use, 
                     tree = tree, 
                     model = 'lambda',
                     method = "logistic_MPLE"
                     , parallel = “FORK"
)

… the analysis fails — with "Error: $ operator is invalid for atomic vectors”

It’s easy to replicate, just set the parallel flag to either ‘FORK’ or ‘SOCK’ and it happens.