richfitz / forest Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 0.0 1.04 MB

New Phylogenetics Data Structures in R

License: Other

R 46.58% Python 1.24% C++ 51.99% Perl 0.19%

forest's People

Stargazers

Watchers

forest's Issues

pre and post order iteration with subtrees

Be careful with post order iteration on subtrees; it does not seem to work, instead looping over all nodes (swap begin_sub()/end_sub() for begin_sub_post()/end_sub_post() in subtree_at_label to fail some tests. This could cause issues.

Associate taxonomy with a tree

This is the hard-core way of addressing #33 - is there some way of associating a taxonomic hierarchy (family, order whatever) to the tree.

There'd be lots of fun things to do with this, especially with clade trees (#16) - collapse to a different taxonomic level, etc.

This could be something for which it's worth expanding the node type, or having a different node type that has the hierarchy. Enforcing monophyly will be hard, as it's often not a perfect fit.

Enforce label uniqueness

Looks like we're depending on labels. Can take memory location as a default label which will be unique at least.

Document the underlying treetree classes better

Draw up little examples of what the various
insert/insert_above/append/prepend/splice/prune modifications actually
do. I have a start on this in the unpushed document branch.

Wrap up the interface for the lower level class and don't export

Would be useful for compilation/load speed POV to conditionally hide the itree class; it's only useful for tests. So omit it for the CRAN version. Could compile with a -DWITH_EXTRA_TESTS flag to enable it
nicely, and set up the system to recompile with and without that, running tests each time to make sure that we're good.

(this may become redundant if we drop modules -- not sure how much I'd reimplement all of this again)

Better Newick parser

Possibly based on boost::spirit?

Current version is slow and probably brittle.

Example script to demo plotting

Write small example script that puts this through its paces; perhaps use a gh page? Ideally run at the same time as the tests so that there is some way of making sure that the graphical code does not fail.

As well/instead, look at the static docs type things that people have worked out. Examples could shine here.

add_labels should probably be add_tree_labels

Class names for plotting objects.

I think I'm moving the 'tree' class of a treeGrob to 'tg', mostly because it won't conflict with anything else. If I stick with that name, then rename things like tree_branches to tg_branches. There is lots of code turning up with tree_ prefixes that will be affected by this but a bit of thinking is required.

Edge colouring

Two different modes:
* Highlight clades (done). This uses the MEDUSA algorithm
* By taxonomy. This is harder, but more useful. What I'd like to do is specify which tips belong to which families and then work back down the tree. There are two ways of doing this:
1. Work out what the MCRA is for each group and assert monophyly. This will be useful for making trees that display at different taxonomic levels, too.
2. Give the tips and have the colouring continue down. There are some intuitive arguments here about what to do (a clade within another clade should be possible to colour nicely, but I can't immediately see the algorithm).

Motivating examples

Look at ape::bind.tree for the sort of edge matrix hell we get to avoid. Note that this is not a criticism of ape, but a necessary consequence of using the edge matrix to describe the tree. We have just as much complication but it's elsewhere in the package.

Node equality

Need to test equality of RObjects differently than ==; that is not really working.

Tidy up labels

labels by default adds all labels, but takes a vector of names to plot optionally. Perhaps have append argument to add these to existing container of labels.

External pointers etc don't serialise

How to deal with this for saveRDS, etc.

Graphics testing

Use both testthat and graphicsQC? Some grid support there apparently.

Draw bar around groups of species

Again, needs taxonomic information to be really useful (see #34). But by clade is a good start, using classify, or just given a bunch of names.

Things to implement

If things are nontrivial, move them from here into their own issue. This is not a complete list from any of these packages, but things that jumped out as being needed. More to add probably. Some of the help pages for the functions below list multiple functions.

Ape functions (tree manipulation)
- bind.tree (Binds Trees)
- branching.times (Branching Times of a Phylogenetic Tree)
- collapse.singles (Collapse Single Nodes)
- drop.tip (Remove Tips in a Phylogenetic Tree)
- is.binary.tree (Test for Binary Tree)
- is.monophyletic (Is Group Monophyletic)
- is.ultrametric (Test if a Tree is Ultrametric)
- ladderize (Ladderize a Tree)
- [ ]mrca (Find Most Recent Common Ancestors Between Pairs)
- [ ]multi2di (Collapse and Resolve Multichotomies)
- node.depth (Depth and Heights of Nodes and Tips)
  - for this, probably depends how it will be used: could be useful
    on a per-node basis and could be useful as a complete output.
    But I need to know how output from the tree will be used first
    really. The information is there already.
- rotate (Swapping Sister Clades) (done for binary trees only)
- [ ]rotateContr (reorder nodes to give tip order)

Tree comparison (all.equal.phylo) type support that takes into account node rotation is also potentially very useful.

Phytools functions (tree manipulation; not sure about these)
- applyBranchLengths (?)
- bind.tip (see ape::bind.tree, and work out why this is needed)
- collapse.to.star (Collapse a subtree to a star phylogeny)
- drop.leaves (does not seem that useful)
diversitree functions (tree manipulation)
- get.descendants (tips and or nodes)
- ancestors
- branching.heights, branching.depth
I/O
- nexus trees (to and from) (basic version done, but see #29)
- ape trees (to and from, better and faster)
  - from ape is done, to ape remains.
Tree simulation
- should I include a bd tree simulator to help with examples, and as an exercise in how to write one?
Starting to verge intro tree inference territory
- gammaStat (Gamma-Statistic of Pybus and Harvey)
- pic (Phylogenetically Independent Contrasts)
- vcv (Phylogenetic Variance-covariance or Correlation Matrix)
Totally inference territory
- bm via pruning; simulation and inference, as a demo. (mostly done)

Rcpp/gc/methods issue

Still getting issues here, though more rare. See if I can trigger them on source and then put a trace on base::.handleSimpleError, perhaps? I figure it's probably coming from one of the tryCatch clauses.

This will hopefully go away when we drop modules

Rotate node

Mostly only makes sense for bifurcating trees, but we can just use std::rotate here internally.

There is a second variant that will be useful for non-binary trees; permute; take an vector of indices. rotate would then be the same as permute with indices [n..1] or [(n-1)..0] (base 1 and base 0 respectively).

Get node should exist

See get_subtree, but returning the node (equivalent to get_subtree(...).root (C++) or get_subtree(...)$root_node (R).

Can't call tree an 'rtree'

Conflicts with ape::rtree

Andrea suggests something like "pine"

Needs resolving when we pull out modules

tips and nodes

I think that methods tips and nodes would be better as n_tips/n_nodes or
count_tips/count_nodes, etc.

Fix up the `treeapply` function

We need an accumulate function (possibly call it a fold) and an apply
function, and need to do this over the different iterator orders and
over children etc, and with the target being a node or being the node
data. So that we can do

treeapply(tr, order="post", target="data", function(x) x$foo)

to get all the foo elements out of the tree.

The current implementation is very very basic. Needs to work on subtrees, and allow iteration over at one of three levels:

subtree (like sub_ iterators)
node (like the basic iterators)
data (over just the data members when given an rtree)

Decide on how naming of output should go (names=TRUE)?

Write methods for dplyr's %.%? Or too silly?

as vs to

Things like to_rtree; should it be as_rtree (like as.integer?)

tree::tip_labels and ape's phy$tip.label -- confusing.

Probably need to do a stocktake of consistency and names soon.

Handling of NA edge lengths problematic.

We can have a mix of NA and non-NA lengths. Probably checking for these during update_heights() makes sense. But I want to know easily if all edge lenths are non-NA quickly at some point. Of course we can't store that with the tree. Look for ISNA checks floating around.

Add bitmaps and vector graphics to trees

I have hacked together code for this, so just need to copy it over. But for the package, we need some open (CC0 or CC-BY) images to test.

Phylopic is an obvious choice for svg figures (conversion to eps and then to grImports XML requires Inkscape and gs, respectively).

The Biodiversity Heritage Library has lots of good bitmaps.

So, find a small tree that has both bitmaps and vector graphics (or two small trees).

R CMD check

As usual, having an awful fight there.

We require BH (>= 1.51.0-4), but in typical CRAN passive-agressiveness it grumbles if you state this even though it is OK to do so.
testthat tests need pushing into testthat directory (?)

Spacing axis

At the moment I've been very vague about how this works, but this will need sorting out at some point. With things like brackets I'm going to need to offset on the spacing axis. But I'm not sure that anything other than pseudo-native makes sense here.

Who owns the node?

Correct attribution of the node connector is hard; at the moment it's the property of the tip of the previous node; that's fine. But the most natural way of colouring is if we
can associate it with the base. However that plays very poorly with multifurcations; it's not clear which bit belongs where for a polytomy! It also requires a second pass through
because we go back down the tree and say for each node what is the midpoint of the parent node.

The node could be owned by the tip edge (currently done) or the root edge (nicer in most cases). The main reason for not doing the node edge part is working out how to deal with polytomies. That is going to require building some odd different ways of bridging for different node types.

rectangular: connect up to the branch that came off further away from the parent node than you
curvy: connect to the midpoint

So in the book-keeping changes so that we'd track

spacing_mid --> spacing_tip
spacing_min --> spacing_root (attachment point of the base)
spacing_max --> spacing_pass (point that we pass through)

So, given time_rootward and time_tipward a rectangular branch passes through:

(time_rootward, spacing_root), (time_rootward, spacing_pass), (time_tipward, spacing_tip)

But with a polytomy that would still need work.

        +------
        |
--------+
        |
        +------

        +------
        |
        +-----
------- +
        +------
        |
        +------

classify() -- base 0 or base 1?

Most of the times classify() is used it's used as classify(...) + 1L, which is not pretty.

Export to d3?

Still seems like it could be a killer feature. Would be easier if I knew someone who actually did d3. Looks like ggvis might be a better target

Gaussian multiplication

Note that

   -log(2 * M_PI * vv)/2 
     --> -log(2 * M_PI)/2 - log(vv)/2
     --> -M_LN_SQRT_2PI - log(vv)/2

style_thing becomes tree_style

Plotting with data

Should be possible to take a tree with data at the tips and plot that too. That would sweet.

Class names

There is huge potential for namespace collisions with the plotting code, because every grob type creates an object of that class. So I need to prefix all the different types I think.

Gaussian convolutions

Mathematica code:

g[x_,mean_,variance_,scale_] :=
  Exp[(x - mean)^2 / (2 variance)] / Sqrt[2 Pi variance]
obj = g[x, m, v, s]
kern = g[x, mk, vk, sk]
Integrate[kern obj, {x, -Infinity, Infinity}, Assumptions-> {v>0, kv>0}]

Use this to tidy up the gaussian code.

Partial labelling

Add just a few labels, with specific contents, at specific labels. Then reimplement add_labels() etc on top of this.

get subtree could force selection between internals and externals?

Clade trees

These were hugely useful to me and others in diversitree; how to deal with these appropriately? We could have types with lists of species at the node, numbers, etc. Just for plotting? Some for analysis?

Plotting area

I can think of two ways of defining the plotting area.

Here is a viewport that will fit the tree, possibly with some padding around the edges. Labels will overflow. So we would define a viewport that takes up most of the page and then squeeze the labels into whatever space we left over. This is going to be particularly nice if you want a bunch of plots that will be the same logical size, but have different length labels (for example). Probably the easiest type to think about.
Here is a viewport that will fit the plot, possibly with some padding around the edges. Here the labels will fit within the plot and we would work backwards to find out the size of the viewport. The easiest way of implementing this would probably be something that computes the appropriate size of a viewport that would fit the previous type.

Grid's "UI packing model" approach might be better here, but I believe that it is slow.

The print.tree method highlights the need for this; by default we don't leave enough space for the labels! So probably best to embed the trees within some higher level "thing". Decisions, decisions.

Pass by reference and pass by value

If we mostly interact with trees via Rcpp modules, or by wrapping the objects as R's external pointers, then that may present some surprising semantics to R users who expect pass-by-value.

Pros

Probably way faster for big trees
That's going to be needed for subtrees to make any sense and they're wicked.

Cons

Potentially surprising behaviour
Awkward semantic mismatch

Iterators over terminal nodes

This seems to be something of an ommission in the treetree data structure.
I think that we should be able to get that from the pre order
traversal iterator by filtering. Possibly use boost::filter_it? There are a bunch of places where this would be useful in the code already.

Modules

Romain is getting rid of modules in Rcpp11:
https://twitter.com/romain_francois/status/437635948546121728
plus I have general issues using them (slow slow load times, issues with as/wrap, the gcc/Rcpp bug). Can I use pointers alone?

Filenames

There is already a util-grid, but I'm wondering about a plotting-util.R

Some headers have wrong guards

iterator/iterator_wrapper.hpp
tree/common.hpp
tree/manipulation.hpp
tree/misc.hpp

Memory corruption?

I've seen this twice, though it was fixed by a recompile. On test:

Basic tree operations: extra : Assertion failed: (!other.empty()), function const_subtree, file ../inst/include/treetree/tree.hpp, line 872.
make[1]: *** [test] Abort trap: 6
make: *** [test] Error 2
make install test  12.40s user 0.30s system 92% cpu 13.772 total

Run under valgrind on a linux system.

Different plotting primitives

At the moment, the line segment is the only plotting primitive -- as in ape. But in the past I've wondered about using filled lines (more like mesquite). Not sure about the benefits of this and the costs are nontrivial.

Another option would be things like curvy lines. Mesquite has some and they are nice. We can get these pretty easily with grid.curve, but this does involve some changing of how the underlying plotting is done. However, it could leave things much more flexible.

Declare an R-to-C index type?

Dealing with the base0/base1 translation automatically via wrap/as? Could be sweet. Question is how often will it get used; not currently that important. The overhead may outweigh the number of times where this is actually useful.

Someone who knows some ggplot2 should probably audit what I'm doing

Need to be different enough that it's not confusing and similar enough that it's not confusing but I never use it!

richfitz / forest Goto Github PK

forest's People

Stargazers

Watchers

forest's Issues

Recommend Projects

Recommend Topics

Recommend Org