Git Product home page Git Product logo

Comments (18)

rneher avatar rneher commented on June 4, 2024 1

Along these lines, what about having augur tree label nodes in the output Newick as NODE_0000071,

that was the initial idea, but that doesn't work since rerooting and polytomy resolution changes topology/adds/removes nodes.

I had initially been reading the command as augur timetree and expecting it to do the one Unix-y task of taking a ML subs tree and outputting a timetree. I would then have augur ancestral to do ancestral state inference. This streamlines both modules and surfaces the hidden ancestral state functionality.

yes and no. timetree inference via treetime will in most cases do ancestral inference. One could do it twice, but then it would be crucial to do it after timetree inference.

node_data.json is opaque, but if augur timetree produced node_dates.json and augur ancestral produced node_nt_seqs.json

initially node_data.json was the only one. agree that the name doesn't make much sense anymore. but we also don't want a separate file for each individual attribute. if you call it dates, then branch lengths should be somewhere else and we get a massive proliferation of files.

augur tree is appropriately agnostic about the underlying program used to build the tree. It could be RAxML, FastTree, etc...

but this only works if we basically hide all method specific commands from the user. Otherwise, we get a big confusion with the hundreds of options that these different tools take. In this spirit, the current treetime module is just a wrapper around treetime that happens to produce output compatible with augur.

from augur.

tsibley avatar tsibley commented on June 4, 2024

I strongly agree with what you've proposed @trvrb. Unlike other augur commands, the treetime command is modal and bundles several things into one. It'd be good to unbundle those.

from augur.

trvrb avatar trvrb commented on June 4, 2024

Oh... and I thought of two other things that bothered me here:

  1. Calling augur treetime --timetree is just weird.

  2. augur tree is appropriately agnostic about the underlying program used to build the tree. It could be RAxML, FastTree, etc... augur tree is about going from alignment to subs tree, irrespective of method to get there. However, augur treetime breaks this and specifies program in the title. We could very well eventually want to allow LSD of some other method to infer timetrees for completeness.

from augur.

emmahodcroft avatar emmahodcroft commented on June 4, 2024

I agree with @rneher . Also, moving ancestral out separately is only going to remove 2 (3 if you count the output vcf) arguments, so may not really change the complexity of the treetime call.

I think given what treetime does it's acceptable that it has a lot of arguments, but I'd agree that they're not so straightforward at the moment and could be made clearer, possibly by renaming some, better help info, and mostly, good documentation somewhere.

I also agree that timetree/treetime is confusing. I mess this up on the regular πŸ™ƒ

from augur.

tsibley avatar tsibley commented on June 4, 2024

but this only works if we basically hide all method specific commands from the user. Otherwise, we get a big confusion with the hundreds of options that these different tools take.

I think this is a false dilemma. It's possible to expose only the most common method-specific options, if any, and also provide a general mechanism for passing any arbitrary option into specific methods without enumerating all of them.

There are several approaches we could take. Using augur tree as an example, one approach might be:

  1. Providing common method-specific options like --raxml-foo or --fasttree-bar-baz which map to --foo and --bar-baz in RAxML and FastTree. These would be "supported" method options.

  2. Providing a --method-options option which is used like --method-options=--foo. This provides people the rope they'd need to do what they wanted, but also implies that something custom is going on which requires you to know what you're doing ("you break it, you keep it").

Again, there are other ways to approach the syntax here, but the general idea is the same.

initially node_data.json was the only one. agree that the name doesn't make much sense anymore. but we also don't want a separate file for each individual attribute. if you call it dates, then branch lengths should be somewhere else and we get a massive proliferation of files.

On what order of magnitude would "massive" be?

from augur.

rneher avatar rneher commented on June 4, 2024

the number of files is ultimately not the issue (would be probably just 2-3 more). The more central issue is how to address nodes across steps.

from augur.

tsibley avatar tsibley commented on June 4, 2024

the number of files is ultimately not the issue (would be probably just 2-3 more). The more central issue is how to address nodes across steps.

Nod. It didn't seem like the number of files would actually be an issue, that's why I asked.

For the core issue of internal nodes, it seems that node annotations must happen after the topology is fixed, whichever step that may be. Is that correct?

from augur.

rneher avatar rneher commented on June 4, 2024

what we could do are simple consistency checks on load:

  • name internal nodes in augur tree (these might change later)
  • if read_node_data is called, that implies a previous augur module requiring a tree was called. We could simply pass the tree file name to that function and check for consistency of names.
  • if an augur module is called with a 'raw' tree without internal labels, name them and rewrite the tree with nothing changed but the labels (kind of violates the in/output rules, but would enable somebody to go into the pipeline with an ML tree and a metadata table and perform traits. result would be a traits.json and a tree with internal node labels, the latter would be a surprise but could be communicated in all caps...)

from augur.

tsibley avatar tsibley commented on June 4, 2024

Sounds reasonable, @rneher, except for one bit:

rewrite the tree with nothing changed but the labels (kind of violates the in/output rules…)

Overwriting an input file in-place is a huge taboo. Even with an all caps warning, it could easily be a nasty surprise. I strongly recommend not doing this, but finding another way of allowing on-the-fly node naming if a tree without them is provided. This could be, for example, augur traits automatically running an augur assign-node-names command with the "raw" tree as input and saving a new "named" tree as output and then automatically using that as the actual input to augur traits.

But also, augur traits could just require a tree with named nodes and tell you to run augur assign-node-names if you have a tree without them.

from augur.

trvrb avatar trvrb commented on June 4, 2024

@rneher: I definitely agree on finding ways to do consistency checks.

@tsibley: I agree that silently rewriting files sounds like a bad idea.

from augur.

rneher avatar rneher commented on June 4, 2024

Overwriting an input file in-place is a huge taboo. Even with an all
caps warning, it could easily be a nasty surprise.

I agree. (but hey, gzip does it...)

but finding another way of allowing on-the-fly node
naming if a tree without them is provided. This could be, for example,
automatically running an |augur assign-node-names| command with the
"raw" tree as input and saving a new "named" tree as output and then
using that as the new input.

yes, I was thinking something similar.

But also, |augur traits| could just require a tree with named nodes and
tell you to run |augur assign-node-names| if you have a tree without them.

it sort of strikes me as clunky to have such an book-keeping step explicit.

from augur.

jameshadfield avatar jameshadfield commented on June 4, 2024

re: creating an ancestral command (to be run after timetree temporal or whatever it's called):

  • πŸ‘Ž it is unnecessary compute, as it has to be done within timetree anyways
  • πŸ‘ it's clearer what's going on
  • πŸ‘ it'd be super useful to add ancestral states to a BEAST tree, where we don't want to mess with topology or branches. (This isn't meant to turn this thread into "how to deal with BEAST", but clearly it's something i've been doing and will move over to modular augur at some point)

For what it's worth, I like the requirement for "new" layers to simply write a flat JSON with the nodes object and have that end up in the tree.JSON

from augur.

trvrb avatar trvrb commented on June 4, 2024

Okay. Latest thoughts...

It's clear that we need to support at least 3 big use cases.

  1. Going from alignment to ML tree and time tree via IQ-Tree + TreeTime and then traits, etc... + export
  2. Going from alignment to ML tree via IQ-Tree and then traits, etc... + export
  3. Going from BEAST tree to ML branch lengths and then traits, etc... + export

The current setup is actually not so bad for these three aims.

  1. Is accomplished with augur tree followed by augur treetime to produce tree.nwk and node_data.json that then form the basis of downstream operations.
  2. Is accomplished with augur tree followed by augur treetime but asking treetime not to make a timetree.
  3. Is accomplished with augur treetime but asking treetime to use the input tree as timetree.

I think the core operations could be made more obvious by a simple rename of augur treetime to augur refine. This makes it clear (at least to me) that you would produce a raw tree tree_raw.nwk with augur tree than then needs to be refined by running augur refine before doing any downstream operations. This seems semantically logical to me. You would still need to "refine" a BEAST tree even if it doesn't need timing done.

In one example... there was the issue of wanting to add traits to an ML tree (option 2 above). In this case, I'd recommend to enforce augur traits to take node_data.json as input. This way it's clear that you go augur tree, augur refine, augur traits. Additionally, augur refine could handle rerooting the ML tree via --root NODENAME as it currently does. augur tree wouldn't expose rooting options of IQ-Tree / RAxML.

I know this is basically exactly as things currently exist. Would just work on the exact interface to augur treetime / augur refine.

from augur.

rneher avatar rneher commented on June 4, 2024

an argument in favor of pulling sequences out of node_data.json and putting them into a separate file:

this would align the file I/O pattern between the vcf/fasta workflows. I am fine with splitting sequence reconstruction from rerooting/timetree if this makes is more transparent. the computational overhead is small.

from augur.

rneher avatar rneher commented on June 4, 2024

392c17c checks whether names in node_data match nodes in tree if provided. We should more generally discuss how we want to handle errors and warnings.

from augur.

trvrb avatar trvrb commented on June 4, 2024

I hadn't noticed that VCF version of augur treetime produces three file outputs https://github.com/nextstrain/augur/blob/master/builds/tb/Snakefile#L68. To me, this is one output too many and would push for separating augur refine from augur ancestral. augur ancestral would produce a single output file (likeaugur traits) that's something like nt_muts.json (or I guess nt_muts.vcf).

Looking at tb/results/node_data.json I see things like "mutations": [["A", 600688, "C"], so I don't quite understand why treetime.vcf needs to exist (I see a corresponding line that reads MTB_anc 600688 . A C , but this is a separate issue and likely due to my own naivety with what's going on with the VCF pipeline.

Edit: I see... The difference is that node_data.json gets a sequence attribute. Vs VCF where this attribute is not supplied.

from augur.

rneher avatar rneher commented on June 4, 2024

I agree this needs streamlining. sequences in node_data.json was really just a temporary fix when I realized I need them in translate.

from augur.

trvrb avatar trvrb commented on June 4, 2024

I'm going to close this now given that #145 is merged. There were some issues surfaced above (like passing arguments to IQTREE and sequences in node data), but I think these exist better as their own issues.

from augur.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.