richfitz / forest Goto Github PK
View Code? Open in Web Editor NEWNew Phylogenetics Data Structures in R
License: Other
New Phylogenetics Data Structures in R
License: Other
Be careful with post order iteration on subtrees; it does not seem to work, instead looping over all nodes (swap begin_sub()/end_sub()
for begin_sub_post()/end_sub_post()
in subtree_at_label
to fail some tests. This could cause issues.
This is the hard-core way of addressing #33 - is there some way of associating a taxonomic hierarchy (family, order whatever) to the tree.
There'd be lots of fun things to do with this, especially with clade trees (#16) - collapse to a different taxonomic level, etc.
This could be something for which it's worth expanding the node type, or having a different node type that has the hierarchy. Enforcing monophyly will be hard, as it's often not a perfect fit.
Looks like we're depending on labels. Can take memory location as a default label which will be unique at least.
Draw up little examples of what the various
insert/insert_above/append/prepend/splice/prune modifications actually
do. I have a start on this in the unpushed document
branch.
Would be useful for compilation/load speed POV to conditionally hide the itree class; it's only useful for tests. So omit it for the CRAN version. Could compile with a -DWITH_EXTRA_TESTS
flag to enable it
nicely, and set up the system to recompile with and without that, running tests each time to make sure that we're good.
(this may become redundant if we drop modules -- not sure how much I'd reimplement all of this again)
Possibly based on boost::spirit
?
Current version is slow and probably brittle.
Write small example script that puts this through its paces; perhaps use a gh page? Ideally run at the same time as the tests so that there is some way of making sure that the graphical code does not fail.
As well/instead, look at the static docs type things that people have worked out. Examples could shine here.
I think I'm moving the 'tree' class of a treeGrob to 'tg', mostly because it won't conflict with anything else. If I stick with that name, then rename things like tree_branches
to tg_branches
. There is lots of code turning up with tree_
prefixes that will be affected by this but a bit of thinking is required.
Two different modes:
* Highlight clades (done). This uses the MEDUSA algorithm
* By taxonomy. This is harder, but more useful. What I'd like to do is specify which tips belong to which families and then work back down the tree. There are two ways of doing this:
1. Work out what the MCRA is for each group and assert monophyly. This will be useful for making trees that display at different taxonomic levels, too.
2. Give the tips and have the colouring continue down. There are some intuitive arguments here about what to do (a clade within another clade should be possible to colour nicely, but I can't immediately see the algorithm).
Look at ape::bind.tree
for the sort of edge matrix hell we get to avoid. Note that this is not a criticism of ape, but a necessary consequence of using the edge matrix to describe the tree. We have just as much complication but it's elsewhere in the package.
Need to test equality of RObject
s differently than ==
; that is not really working.
labels
by default adds all labels, but takes a vector of names to plot optionally. Perhaps have append
argument to add these to existing container of labels.
How to deal with this for saveRDS
, etc.
Use both testthat and graphicsQC? Some grid support there apparently.
Again, needs taxonomic information to be really useful (see #34). But by clade is a good start, using classify
, or just given a bunch of names.
If things are nontrivial, move them from here into their own issue. This is not a complete list from any of these packages, but things that jumped out as being needed. More to add probably. Some of the help pages for the functions below list multiple functions.
bind.tree
(Binds Trees)branching.times
(Branching Times of a Phylogenetic Tree)collapse.singles
(Collapse Single Nodes)drop.tip
(Remove Tips in a Phylogenetic Tree)is.binary.tree
(Test for Binary Tree)is.monophyletic
(Is Group Monophyletic)is.ultrametric
(Test if a Tree is Ultrametric)ladderize
(Ladderize a Tree)mrca
(Find Most Recent Common Ancestors Between Pairs)multi2di
(Collapse and Resolve Multichotomies)node.depth
(Depth and Heights of Nodes and Tips)
rotate
(Swapping Sister Clades) (done for binary trees only)rotateContr
(reorder nodes to give tip order)Tree comparison (all.equal.phylo) type support that takes into account node rotation is also potentially very useful.
applyBranchLengths
(?)bind.tip
(see ape::bind.tree, and work out why this is needed)collapse.to.star
(Collapse a subtree to a star phylogeny)drop.leaves
(does not seem that useful)get.descendants
(tips and or nodes)ancestors
branching.heights
, branching.depth
gammaStat
(Gamma-Statistic of Pybus and Harvey)pic
(Phylogenetically Independent Contrasts)vcv
(Phylogenetic Variance-covariance or Correlation Matrix)Still getting issues here, though more rare. See if I can trigger them on source and then put a trace on base::.handleSimpleError, perhaps? I figure it's probably coming from one of the tryCatch clauses.
This will hopefully go away when we drop modules
Mostly only makes sense for bifurcating trees, but we can just use std::rotate
here internally.
There is a second variant that will be useful for non-binary trees; permute; take an vector of indices. rotate would then be the same as permute with indices [n..1]
or [(n-1)..0]
(base 1 and base 0 respectively).
See get_subtree, but returning the node (equivalent to get_subtree(...).root
(C++) or get_subtree(...)$root_node
(R).
Conflicts with ape::rtree
Andrea suggests something like "pine"
Needs resolving when we pull out modules
I think that methods tips
and nodes
would be better as n_tips
/n_nodes
or
count_tips
/count_nodes
, etc.
We need an accumulate function (possibly call it a fold) and an apply
function, and need to do this over the different iterator orders and
over children etc, and with the target being a node or being the node
data. So that we can do
treeapply(tr, order="post", target="data", function(x) x$foo)
to get all the foo
elements out of the tree.
The current implementation is very very basic. Needs to work on subtrees, and allow iteration over at one of three levels:
sub_
iterators)rtree
)Decide on how naming of output should go (names=TRUE)?
Write methods for dplyr's %.%
? Or too silly?
Things like to_rtree
; should it be as_rtree
(like as.integer?)
Probably need to do a stocktake of consistency and names soon.
We can have a mix of NA and non-NA lengths. Probably checking for these during update_heights() makes sense. But I want to know easily if all edge lenths are non-NA quickly at some point. Of course we can't store that with the tree. Look for ISNA
checks floating around.
I have hacked together code for this, so just need to copy it over. But for the package, we need some open (CC0 or CC-BY) images to test.
Phylopic is an obvious choice for svg figures (conversion to eps and then to grImport
s XML requires Inkscape and gs, respectively).
The Biodiversity Heritage Library has lots of good bitmaps.
So, find a small tree that has both bitmaps and vector graphics (or two small trees).
As usual, having an awful fight there.
At the moment I've been very vague about how this works, but this will need sorting out at some point. With things like brackets I'm going to need to offset on the spacing axis. But I'm not sure that anything other than pseudo-native makes sense here.
Correct attribution of the node connector is hard; at the moment it's the property of the tip of the previous node; that's fine. But the most natural way of colouring is if we
can associate it with the base. However that plays very poorly with multifurcations; it's not clear which bit belongs where for a polytomy! It also requires a second pass through
because we go back down the tree and say for each node what is the midpoint of the parent node.
The node could be owned by the tip edge (currently done) or the root edge (nicer in most cases). The main reason for not doing the node edge part is working out how to deal with polytomies. That is going to require building some odd different ways of bridging for different node types.
So in the book-keeping changes so that we'd track
spacing_mid
--> spacing_tip
spacing_min
--> spacing_root
(attachment point of the base)spacing_max
--> spacing_pass
(point that we pass through)So, given time_rootward
and time_tipward
a rectangular branch passes through:
(time_rootward, spacing_root), (time_rootward, spacing_pass), (time_tipward, spacing_tip)
But with a polytomy that would still need work.
+------
|
--------+
|
+------
+------
|
+-----
------- +
+------
|
+------
Most of the times classify()
is used it's used as classify(...) + 1L
, which is not pretty.
Still seems like it could be a killer feature. Would be easier if I knew someone who actually did d3. Looks like ggvis might be a better target
Note that
-log(2 * M_PI * vv)/2
--> -log(2 * M_PI)/2 - log(vv)/2
--> -M_LN_SQRT_2PI - log(vv)/2
Should be possible to take a tree with data at the tips and plot that too. That would sweet.
There is huge potential for namespace collisions with the plotting code, because every grob type creates an object of that class. So I need to prefix all the different types I think.
Mathematica code:
g[x_,mean_,variance_,scale_] :=
Exp[(x - mean)^2 / (2 variance)] / Sqrt[2 Pi variance]
obj = g[x, m, v, s]
kern = g[x, mk, vk, sk]
Integrate[kern obj, {x, -Infinity, Infinity}, Assumptions-> {v>0, kv>0}]
Use this to tidy up the gaussian code.
Add just a few labels, with specific contents, at specific labels. Then reimplement add_labels()
etc on top of this.
These were hugely useful to me and others in diversitree; how to deal with these appropriately? We could have types with lists of species at the node, numbers, etc. Just for plotting? Some for analysis?
I can think of two ways of defining the plotting area.
Grid's "UI packing model" approach might be better here, but I believe that it is slow.
The print.tree
method highlights the need for this; by default we don't leave enough space for the labels! So probably best to embed the trees within some higher level "thing". Decisions, decisions.
If we mostly interact with trees via Rcpp modules, or by wrapping the objects as R's external pointers, then that may present some surprising semantics to R users who expect pass-by-value.
Pros
Cons
This seems to be something of an ommission in the treetree data structure.
I think that we should be able to get that from the pre order
traversal iterator by filtering. Possibly use boost::filter_it
? There are a bunch of places where this would be useful in the code already.
Romain is getting rid of modules in Rcpp11:
https://twitter.com/romain_francois/status/437635948546121728
plus I have general issues using them (slow slow load times, issues with as/wrap, the gcc/Rcpp bug). Can I use pointers alone?
There is already a util-grid, but I'm wondering about a plotting-util.R
I've seen this twice, though it was fixed by a recompile. On test:
Basic tree operations: extra : Assertion failed: (!other.empty()), function const_subtree, file ../inst/include/treetree/tree.hpp, line 872.
make[1]: *** [test] Abort trap: 6
make: *** [test] Error 2
make install test 12.40s user 0.30s system 92% cpu 13.772 total
Run under valgrind on a linux system.
At the moment, the line segment is the only plotting primitive -- as in ape. But in the past I've wondered about using filled lines (more like mesquite). Not sure about the benefits of this and the costs are nontrivial.
Another option would be things like curvy lines. Mesquite has some and they are nice. We can get these pretty easily with grid.curve, but this does involve some changing of how the underlying plotting is done. However, it could leave things much more flexible.
Dealing with the base0/base1 translation automatically via wrap/as? Could be sweet. Question is how often will it get used; not currently that important. The overhead may outweigh the number of times where this is actually useful.
Need to be different enough that it's not confusing and similar enough that it's not confusing but I never use it!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.