mwpennell / arbutus Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 2.0 44.46 MB

Assessing the adequacy of phylogenetic models of quantitative trait evolution

Makefile 0.40% R 99.21% Shell 0.39%

arbutus's People

Contributors

Stargazers

Watchers

Forkers

phylo-lab-usc lucas-jardim

arbutus's Issues

Broken links within package help files

If you click on make_unit_tree within the arbutus help file it leads to this error message:

Error in find.package(package, lib.loc, verbose = verbose) :
object 'pkg' not found

Build is broken

See https://travis-ci.org/mwpennell/arbutus#L1786-L1807

Looks like the examples are out of sync with the code.

Travis has thrown up two NOTEs, too.

Unit tree with arbitrary parameters

Issue by richfitz from Tuesday Oct 01, 2013 at 06:05 GMT
Originally opened as traitecoevo/modeladequacy#21

It should be easy enough to write a method based on the fit.mle ones that takes a vector of parameters and makes a unit tree. This is easiest for BM with one parameter, but given we have a standard list of names, we could require (or assume) those.

Delta transform disagrees with geiger

Issue by richfitz from Friday Dec 06, 2013 at 00:41 GMT
Originally opened as traitecoevo/modeladequacy#58

See tests added by 18bbdfc (commented out for delta, as they fail). There is a note in the rescale-fitContinuous.R, model.phylo.delta():

    ## get braching times in terms of tree depth
    ## note: Geiger's delta transformation does not index the branches (like OU)
    ## need to check if this is correct

that suggests @mwpennell has an idea where the problem lies, so I've not poked about at all.

Multi-optimum OU models

Issue by richfitz from Wednesday Sep 25, 2013 at 04:21 GMT
Originally opened as traitecoevo/modeladequacy#1

An issue that I have come across is that if we transform the phylogeny according to a multi-optimum OU model, will the unit tree still be ultrametric??
I am trying to figure this out b/c we look at the slope of the contrasts ~ branching.time, which might be messed up when using non-ultrametric trees as ape's branching.times() fxn returns negative values in this case. This statistic should still work but I may need to replace the branching.times fxn.

Build is failing

Haven't looked into this yet but looks like geiger/parallel to me, which I think we've had before https://travis-ci.org/mwpennell/arbutus/builds/51186293

diversitree unit tree tests fail

I can't quite figure out what is going on here

If I fit a model with diversitree's MCMC function

library(diversitree)
library(arbutus)
data(finch)
phy <- finch$phy
states <- finch$data[,"wingL"]
lik <- make.bm(phy, states)
samples.bm <- mcmc(lik, x.init=1, w=1, nsteps=1000, print.every=0)
m <- make_unit_tree(samples.bm, sample=10)

this seems to work fine. however in the file /tests/test-unit-tree.R under the heading "Unit tree construction from BM/diversitree/MCMC with SE works", there is something like this

  fit.bm <- find.mle(lik.bm, .1)
  set.seed(1)
  samples.bm <- mcmc(lik.bm, coef(fit.bm), 100, w=0.1, print.every=0)

  ## Quick check:
  idx <- 5
  fit.bm$par <- coef(samples.bm)[idx,]
  fit.bm$lnLik <- samples.bm$p[idx]
  cmp <- make_unit_tree(fit.bm)
  expect_that(phy.unit[[idx]], is_identical_to(cmp))

which fails because the parameter names are not being returned correctly and make_unit_tree(fit.bm) cannot parse the fit.bm object. did you change how diversitree returned parameters since you wrote the test. i don't think this is anything I did when I changed the function names.

any ideas.
m

two versions of OU in phylolm

Issue by mwpennell from Tuesday Dec 10, 2013 at 19:48 GMT
Originally opened as traitecoevo/modeladequacy#63

this is a very low priority at the moment but there are two different options for OU models in phylolm -- depending on whether one want to assume stationarity or not. i am just going to ignore the difference for now and rescale them in the same way. just wanted to put the issue on record

Bug in compare_pic_stat when observed test statistic is NA

There may be cases where the observed test statistic is not able to be calculated (i.e. returns NA) because the model is so poor (for example, fitting lm to the abs(contrasts) ~ node height chokes on some datasets).

Currently, this results in a p-value than goes towards 2 as more sims are added

MWE

library(arbutus)
library(diversitree)

t <- tree.bd(pars=c(1,0), max.taxa=100)
d <- sim.character(t, 1)

## build unit tree
uu <- make_unit_tree(t, data=d)

## calculate a simple test statistic (doesn't matter what it is)
foo <- function(x) mean(x$pics[,"contrasts"])

## Calculate the test statistics
stat_obs <- calculate_pic_stat(uu, stats=list(mean=foo))

## Simulate new datasets
sims <- simulate_char_unit(uu, nsim=1000)

## Calculate test statistic on simulated data
stat_sim <- calculate_pic_stat(sims, stats=list(mean=foo))

## Compare the two
compare_pic_stat(stat_obs, stat_sim)

## Everything works fine

## Make the value of the test stat equal to NA
stat_obs[1] <- NA

## Now compare
compare_pic_stat(stat_obs, stat_sim)

if the test statistic cannot even be calculated on the observed data, the data is clearly outside of the bounds of what we expect if our model was adequate. I.e., the p-value should automatically be 0 when this is the case

What is the safest way to handle this case in compare_pic_stat

plot.arbutus legends

now that plot.arbutus uses plot instead of ggplot2, I can't figure out a nice way to place the legend for an arbitrary number of test statistics. this is not a high priority but it is sort of annoying.

Models with shift points along branches

Issue by richfitz from Wednesday Sep 25, 2013 at 04:24 GMT
Originally opened as traitecoevo/modeladequacy#3

Still trying to find a nice way to represent models in which there is a shift point midbranch
Considering creating a modified phylo object with three additional lists appended: 1) an edge list -- where each edge can have n-specified models 2) a shift list -- specifying location of all n-1 shifts on each branch 3) a list of named parameters -- these will be shared across edges.

Write model.phylo.trend()

Issue by richfitz from Friday Dec 06, 2013 at 00:58 GMT
Originally opened as traitecoevo/modeladequacy#59

Current function is placeholder only.

ordering of data

a user was using arbutus and came across the following problem

in geiger (and some other packages, e.g., bayou), the order of tip labels and data do not need to be the same. so the returned object may not have them ordered according to the tip labels.

however, we do strict checking when creating the unit.tree (function check.tree.data)

    if (!identical(names(data), phy$tip.label)) 
        stop("Trait data and species do not align")

we do this b/c pic does not check. so my questions is: where should we do the reordering? when getting the info from model objects or just prior to computing the pics? each will require modifications to the tests

Should we return parameters for linear part of PGLS models

Issue by richfitz from Tuesday Dec 03, 2013 at 00:23 GMT
Originally opened as traitecoevo/modeladequacy#50

At present we only return the phylogenetic part (diffusion term, etc). Should we also be returning slope and intercept parameters?

testting summary stats

Issue by mwpennell from Friday Sep 27, 2013 at 02:19 GMT
Originally opened as traitecoevo/modeladequacy#15

I am not sure of the best way to evaluate the summary statistic fxn. Currently just using expected values on the geospiza dataset. This will only help if I change things as the calculations are based on current code. However, difficult to test independently because involves some complicated fxns like lm and ks.test. not sure best way to do this.

fitContinuous with SE

Hello,

I have a set comparative models that strongly rely on accounting for measurement error in the data. I've been assessing model support but I'd love to include the model adequacy options your package offers. However, I get this error:
Error in model_pars.gfit(fit) : Variable SE not yet implemented
Do you think it would be possible to have a version with SE-accounting model fits implemented any time soon?

Thank you, and congratulations on an impressive package!

plot.arbutus x axes

These are pretty tricky to interpret. Can we do something more informative than: m.sig, c.var, etc..

Extraction of ML and REML s2 estimates from gls models

Issue by richfitz from Wednesday Dec 04, 2013 at 05:39 GMT
Originally opened as traitecoevo/modeladequacy#56

I've taken a shot at this (see 79c2650), but I'm not sure that things are correct.

library(arbutus)
library(diversitree)
library(nlme)
set.seed(1)
phy <- tree.bd(pars=c(1,0), max.taxa=100)
tx <- sim.character(phy, 1)
ty <- sim.character(phy, 1) + 3
data <- data.frame(x=tx, y=ty, row.names=names(tx))

fit.gls.bm.ml   <- gls(y ~ x, data, corBrownian(1, phy), method="ML")
fit.gls.bm.reml <- gls(y ~ x, data, corBrownian(1, phy), method="REML")

# Likelihood function for the same model
lik.pgls.bm <- make.pgls(phy, y ~ x, data)

# Extract s2 from the fitted object and 
s2.ml <- arbutus:::estimate.sigma2.gls(fit.gls.bm.ml)
s2.reml <- arbutus:::estimate.sigma2.gls(fit.gls.bm.reml)
p.ml <- c(coef(fit.gls.bm.ml), s2=s2.ml)

# Run a ML fit with diversitree
fit.pgls.bm <- find.mle(lik.pgls.bm, p.ml)

n <- fit.gls.bm.ml$dims$N # 100, or length(phy$tip.label)
k <- fit.gls.bm.ml$dims$p # 2, or length(coef(fit.gls.bm.ml))
s2 <- coef(fit.pgls.bm)[[3]] # diversitree's s2 estimate

# Not the same as that from arbutus, and half way between REML and ML estimates
s2 - c(s2.ml, s2.reml)

# This is the relationship with the REML estimate
s2 * n / (n - (k - 1)) - s2.reml

which all suggests that in arbutus:::estimate.sigma2.gls() we should use
(n - (k - 1)) / n as the scaling factor. This doesn't quite match my view of how these work, but I might be being dense.

Ideas?

(this depends on very recent versions of both arbutus and diversitree)

grid and gridExtra

Issue by mwpennell from Monday Feb 24, 2014 at 03:22 GMT
Originally opened as traitecoevo/modeladequacy#72

we currently don't depend on grid or gridExtra but these packages are required for plotting the results of the model adequacy. Luke pointed out that this is probably going to be a highly used feature so we should either depend on these or else rewrite the plotting fxns so that they don't depend on this.

More informative error messages for as.unit.tree

Issue by richfitz from Wednesday Sep 25, 2013 at 04:53 GMT
Originally opened as traitecoevo/modeladequacy#12

luke suggested (and I think he is right) that we include more meaningful error messages for as.unit.tree (and more specifically in the modelinfo fxns) such that if one provides a diversitree model fit and the program also requires a likelihood fxn, have it spit out something like:

"this is a fit.mle object from diversitree. it requires both the model fitted object and the likelihood object"

think this would be helpful and not that difficult to implement.

Use of diversitree::: calls in pkg

R CMD check --as-cran gets grumpy at the calls to internal diversitree functions

* checking dependencies in R code ... NOTE
Unexported objects imported by ':::' calls:
  ‘diversitree:::get.cache’ ‘diversitree:::is.constrained’
  See the note in ?`:::` about the use of this operator.

CRAN?

hey @richfitz i was thinking today that it might be nice to move arbutus to CRAN at some point. However, it will fail -- harshly -- if we send it there. the problem is that some of the functions require() diversitree, geiger, etc., etc. and use internal functions from these. and the CRAN version of geiger still launches a bunch of cores which causes its own problems...

write pkg REAMDE

make a clear readme file for the package that explains how to install the package from github.

Roxygen in DESCRIPTION file

You have

Suggests:
    diversitree,
    testthat,
    roxygen2
Roxygen: list(wrap = FALSE)

in the DESCRIPTION file. R CMD check does not like the use of Roxygen as an element. What is the purpose of the list(wrap=FALSE) line here and can we do without it (or else move it to Suggests: roxygen2