Git Product home page Git Product logo

cibiv / iq-tree Goto Github PK

View Code? Open in Web Editor NEW
183.0 18.0 44.0 66.93 MB

Efficient phylogenomic software by maximum likelihood

Home Page: http://www.iqtree.org

License: GNU General Public License v2.0

CMake 0.42% C++ 64.15% C 30.50% Shell 0.06% Python 0.15% Makefile 0.25% SAS 0.02% CLIPS 0.05% Pascal 0.69% Ada 0.88% Assembly 1.40% C# 0.55% Batchfile 0.01% DIGITAL Command Language 0.27% HTML 0.30% Module Management System 0.02% Perl 0.04% M4 0.01% Roff 0.04% Objective-C 0.21%
maximum-likelihood phylogenomics phylogenetics mixture-model iq-tree ultrafast-bootstrap model-selection

iq-tree's People

Contributors

bqminh avatar bredelings avatar cassiusma avatar compphylolab avatar cuongbb avatar diepthihoang avatar dschrempf avatar fgp avatar heikotree avatar lamtung avatar mgilsci avatar michaelwoodhams avatar nicoladm avatar njoly avatar olgachern avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iq-tree's Issues

[BUG] Character substitution in names imported from NEXUS

Substitution of unacceptable characters in sequence IDs never happens if names are imported from NEXUS (as opposed to IDs in a FASTA file).

I have a bunch of complex names, something like this:

Asterionellopsis_glacialis,_Strain_CCMP134|CAMPEP_0199907168
Phatr2|1719

They have commas and pipes and whatnot. IQ-TREE itself accepts them just fine by substituting all offending chars with the underscore, but when I try to pass them through NEXUS file as cluster members for the likelihood mapping, it leaves a lengthy log like this:

Cluster 2 "diatom" lists 341 sequences:
WARNING: sequence name "Asterionellopsis_glacialis"! Will be ignored.
WARNING: sequence name ","! Will be ignored.
WARNING: sequence name "_Strain_CCMP134|CAMPEP_0199907168"! Will be ignored.
WARNING: sequence name "Phatr2|1719"! Will be ignored.

And so on for the rest of sequences. Of course, analysis either doesn't happen or proceeds with incorrect clustering. Running something like `s/[,|-]/_/g' on NEXUS file lets it proceed correctly. So the source is obvious: a NEXUS parser makes different assumptions about what a seq ID may and may not be.

macOS linker cannot use --gc-sections even if the compiler is GCC

Currently the IQ-TREE build system will pass --gc-sections to the linker on macOS if the compiler is GCC. This causes build failure.

Specifically the problem in CMakeLists.txt is here:

set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} -Wl,--gc-sections")

For Homebrew, we build IQ-TREE with openmp, so the build uses GCC, and I've worked around the build failure with the following string replacement:

    if OS.mac?
      inreplace "CMakeLists.txt",
        "${CMAKE_EXE_LINKER_FLAGS_RELEASE} -Wl,--gc-sections",
        "${CMAKE_EXE_LINKER_FLAGS_RELEASE}"
    end

See https://github.com/Homebrew/homebrew-science/pull/5377.

Constrained tree search

Allowing users to supply a constraint tree topology (-g option) to guide tree search, such that the resulting tree obeys the constraint. A naive implementation will be released with version 1.5.0.

iqtree -s alignment -g constraint_tree ...

TODOs (as of Sept 8, 2016):

  • Speed up the naive version, as it right now can be slow for large trees. This involves mainly two steps: the initial tree construction and the NNI moves.

"Use -nt AUTO" even when I do use -nt AUTO

% iqtree -nt AUTO

<chooses 6 threads>

WARNING: Number of threads seems too high for short alignments. Use -nt AUTO to determine best number of threads.

Happens in 1.6 beta and 1.5

How to collapse branches?

  1. How to collapse branches with branch length lower than threshold into multifurcations?
    I want to discard those nodes with extremly short branch which usually I have to zoom in many times to found them.
  2. How to collpase branches with bootstrap value lower than threshold into multifurcations?
    Same as above, and "-con" does not help because I want to edit ".treefile" file iqtree made which only have one tree.

There is other software can do this (like Archaeopteryx) but hard to use. That's why I ask help here how to do these things in iqtree.

Thanks a lot.

Ping Wu

Negative constraint tree

Iker Irisarri:

I was wondering whether it is possible to conduct a negative constrained tree search in IQTREE, similar to using the -g option. By negative constraint I mean the best ML tree that does not contain a given bipartition (e.g. best ML where A and B are not monophyletic). If not, do you plan to implement it in the future?

See discussion: https://groups.google.com/forum/#!topic/iqtree/8VZdfnKTwXk

Ancestral sequence reconstruction

Computing the ancestral sequences as well as ancestral state probabilities of all internal nodes for a given tree. Although this was long implemented in several programs, including very good Tal Pupko's FastML, they do not provide the variety of models available in IQ-TREE. A marginal state probabilities computation is straight-forward and was already implemented.

iqtree -s alignment -te user_tree -asr

TODOs (as of Sept 8, 2016):

(non-reversible) Lie Markov models

Implementation of the (non-reversible) Lie Markov models. Right now, it is in the liemarkov branch of the git repository. The code is functioning by e.g.:

iqtree -s example.phy -m LM5.6b
iqtree -s example.phy -m TEST -mset liemarkov #model testing

DONE items:

  • Sanity check with extensive simulations, comparison with Michael's Java code.
  • Integrate eigen decomposition technique (aware of complex eigenvalues) to speed up computation of P(t)=e^(Qt). The default mode is the scaling-squaring technique, which is slow. (see "Nineteen Dubious Ways to Compute the Exponential of a Matrix", http://epubs.siam.org/doi/abs/10.1137/S00361445024180).
  • Vectorize (SSE, AVX) the non-reversible likelihood kernel.
  • Optimize the root position given a fixed tree.
  • Work more transparently with model testing.

TODOs (as of Mar 19, 2018):

  • Allow non-reversible models in the partition model.

IQ-TREE 1.6.1 crashes with -nt > 1 & -nt AUTO on diverse Ubuntu 16.04 LTS boxes

Hi Bui, congrats for the great work and the recognition by the SMBE!!!

1. I've been testing IQ-TREE 1.6.1 (pre-compiled 64-bit version for Linus) on my Ubuntu 16.04 desktop box and servers and found that in all of them the program crashes when -nt > 1 or when -nt AUTO. This happens even with the simplest of the commands:

$ iqtree -s regions.fst -st DNA -m HKY+ASC+G -nt AUTO -redo
IQ-TREE multicore version 1.6.1 for Linux 64-bit built Dec 26 2017
Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor,
Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host: Tenerife (SSE4.1, 7 GB RAM)
Command: iqtree -s regions.fst -st DNA -m HKY+ASC+G -nt AUTO -redo
Seed: 714051 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Wed Dec 27 19:21:21 2017
Kernel: SSE2 - auto-detect threads (4 CPU cores detected)

Reading alignment file regions.fst ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 118 sequences with 1195 columns, 1097 distinct patterns
1037 parsimony-informative, 158 singleton sites, 0 constant sites
Gap/Ambiguity Composition p-value
1 1 0.00% passed 94.87%
...

NOTE: 111 is identical to 101 but kept for subsequent analysis
NOTE: 115 is identical to 113 but kept for subsequent analysis

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 0.036 seconds
Ascertainment bias correction: 4 unobservable constant patterns

NOTE: 16 MB RAM (0 GB) is required!
Measuring multi-threading efficiency up to 4 CPU cores
Increase to 10 rounds for branch lengths
5 trees examined
Threads: 1 / Time: 4.043 sec / Speedup: 1.000 / Efficiency: 100% / LogL: -65164
*** Error in `iqtree': munmap_chunk(): invalid pointer: 0x00000000015c5110 ***
======= Backtrace: =========
[0x94c911]
[0x958fd2]
[0x5e0ba8]
[0x5a8d16]
[0x439f4c]
[0x445713]
[0x41a9d0]
[0x915f26]
[0x91611a]
[0x402b99]
======= Memory map: ========
00400000-00b57000 r-xp 00000000 08:07 1180657 /usr/local/biotools/bin/iqtree
00d56000-00d6e000 rw-p 00756000 08:07 1180657 /usr/local/biotools/bin/iqtree
00d6e000-00d86000 rw-p 00000000 00:00 0
01211000-02713000 rw-p 00000000 00:00 0 [heap]
7fd1b0000000-7fd1b0021000 rw-p 00000000 00:00 0
7fd1b0021000-7fd1b4000000 ---p 00000000 00:00 0
7fd1b7bff000-7fd1b7c00000 ---p 00000000 00:00 0
7fd1b7c00000-7fd1b8000000 rw-p 00000000 00:00 0
7fd1b8000000-7fd1b8021000 rw-p 00000000 00:00 0
7fd1b8021000-7fd1bc000000 ---p 00000000 00:00 0
7fd1bd2ef000-7fd1bd2f0000 rw-p 00000000 00:00 0
7fd1bd2f0000-7fd1bd2f1000 ---p 00000000 00:00 0
7fd1bd2f1000-7fd1bd6f1000 rw-p 00000000 00:00 0
7ffc2577e000-7ffc257a0000 rw-p 00000000 00:00 0 [stack]
7ffc257bf000-7ffc257c1000 r--p 00000000 00:00 0 [vvar]
7ffc257c1000-7ffc257c3000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: *** Log file: regions.fst.log
ERROR: *** Alignment files (if possible)
Aborted (core dumped)

2. with -nt 1 it runs smoothly

Host: Tenerife (SSE4.1, 7 GB RAM)
Command: iqtree -s regions.fst -st DNA -m HKY+G -nt 1 -redo
Seed: 505627 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Wed Dec 27 19:23:34 2017
Kernel: SSE2 - 1 threads (4 CPU cores detected)

HINT: Use -nt option to specify number of threads because your CPU has 4 cores!
HINT: -nt AUTO will automatically determine the best number of threads to use.

Reading alignment file regions.fst ... Fasta format detected
...

3. Also finishes without problems with iqtree-omp 1.5.6

IQ-TREE multicore version 1.5.6 for Linux 64-bit built Dec 4 2017
Copyright (c) 2011-2017 by Bui Quang Minh, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, and Arndt von Haeseler.

Host: Tenerife (SSE4.1, 7 GB RAM)
Command: /home/vinuesa/Software_downloads/iqtree-omp-1.5.6-Linux/bin/iqtree-omp -s regions.fst -st DNA -m HKY+G -nt AUTO -redo
Seed: 977243 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Wed Dec 27 19:28:02 2017
Kernel: SSE2 - auto-detect threads (4 CPU cores detected)

Reading alignment file regions.fst ... Fasta format detected
Alignment most likely contains DNA/RNA sequences
Alignment has 118 sequences with 1195 columns and 1097 patterns (1037 informative sites, 0 constant sites)
...
END

Would be great to get your feedback on this issue!
Keep up with the great work!
Cheers,
Pablo

PoMo IQ-TREEv.1.6

The files and line numbers refer to branch pomo_latest, commit 23aa0965
Amend gitignore. Change output, max branch length.

Enhancements and TODOs for PoMo.

  1. Testing. I changed quite a bit of core code in PoMo which has to be
    thoroughly tested. We will do this when we run simulations for the
    application note that we are planning.

  2. Output. The output needs to be improved.

     ./model/modelpomo.cpp:611:    // TODO DS: Output separation (rvbl and flux).
     ./model/modelpomo.cpp:837:    // TODO DS: Separation (rvbl and flux).
    
  3. Decomposition of the rate matrix. The function performing the
    eigendecomposition requests the rate matrix but also recomputes the rate
    matrix. This leaves room for speed improvements.
    EigenDecomposition::eigensystem_sym() expects a matrix[][] object with
    two indices. However, it is not used, because
    ModelPoMo::computeRateMatrix() is called anyways from within
    eigensystem_sym().

     ./model/modelpomo.cpp:940:        // TODO DS: This leaves room for speed improvements.
    
  4. ModelFinder. Implement model finder. I must admit I did not look into this
    much until now.

     ./main/phylotesting.cpp:1554:        // TODO DS: Implement model finder.
    
  5. Mixture models and gamma rate heterogeneity. A mixture model is used to
    handle Gamma rate heterogeneity at the moment. Hence, it is not possible to
    combine mixture models and gamma rate heterogeneity at the moment, a
    restriction that could be removed.

Things to keep in mind.

  1. Maximum genetic distance. The distance measure of PoMo includes not only
    mutations (which can be compared to substitutions) but also frequency shifts.
    Hence, the branch lengths of PoMo are expected to be longer by a factor of
    N*N. This clashed with constants defined, e.g., in tools.h

     ./utils/tools.h:436:/* TODO DS: For PoMo, this setting does not make sense.
     ./utils/tools.cpp:835:    // TODO DS: This seems inappropriate for PoMo.  It is handled in
    

    I had to manually recompute MAX_GENETIC_DIST whenever it is used, because
    during maximization of the likelihood, branch lengths were limited. This
    should be working now, but it is important to keep in mind when changing
    code in these areas. Maybe, a model dependent maximum distance may make
    sense.

Further TODOs (minor priority).

  1. Do not temper with Params. The problem is the following: when running
    IQ-TREE with PoMo, certain parameters need to be known already when reading
    in the alignment file (counts file at the moment). This is, the virtual
    population size N which affects the data structure and the sampling method
    (sampled or weighted). At the moment, those flags are stored in ModelPoMo
    upon model creation and also in the Params class. It may be advantageous
    to refrain from using the Params class to store model parameters.

     ./alignment/alignment.cpp:2001:    // TODO DS: Do not temper with params;
    
  2. Verbosity. The output with increased verbosity is too long.

     ./utils/eigendecomposition.cpp:255:    // TODO DS: This seems a little too verbose for PoMo.
    
  3. No polymorphic data. At the moment, it is not possible or at least it leads
    to undefined behavior when PoMo is run on data without polymorphisms. Either,
    the level of polymorphism should be fixed, or an clear error message should
    be emitted.

     ./model/modelpomo.cpp:229:    // TODO DS: Check if this works.
     ./model/modelpomo.cpp:230:    // TODO DS: Move this where it belongs.
     ./model/modelpomo.cpp:234:        // TODO DS: Obsolete because mutation rates are scaled
    
  4. R and Phi matrices. IQ-TREE and PoMo is now aware of the symmetric mutation
    rate matrix R and the skew-symmetric mutation matrix Phi (refer to our
    most recent publication, Schrempf, Hobolth 2017). These matrices are
    unnecessarily dragged along during the maximization of the likelihood but
    really only need to be computed at the end when printing output.

     ./model/modelpomo.cpp:457:    // TODO DS: R and PHI are actually not needed during the
    
  5. Theta and sampling method sampled. The calculation of theta (Watterson's
    theta or level of polymorphism) is necessarily faulty, especially when N is
    low. I am not sure how to tackle this because the problem is of statistical
    nature.

     ./model/modelpomo.cpp:791:        // TODO DS: This is wrong because Watterson's estimator is
    
  6. Support alignments with differences only. It may be good to support
    alignments that only contain sites with differences but omit sites that are
    the same in all species. I am not sure how many people will have data like that.

     ./model/modelfactory.cpp:215:            // TODO DS: This is an important feature, because then,
    

Numerical solution for codon models

Codon models have 61 states or so, thus it is quite sometimes the case that some codons are absent in the alignment, making likelihood computation unstable. There are two solutions for this:

  1. Set the frequency of absent codons to a small number (e.g. 10^-4). This is the current approach implemented in IQ-TREE.
  2. Remove absent codons from the Markov process. However, there is a concern, that this disallow all substitutions from/to these absent codons, which might indeed have occurred in the past.

However, we observed numerical problem with approach 1 and also a number of users reported it. We can switch to approach 2, which may avoid this problem. Here is a reply from @ziheng-yang about codon models in PAML:

this may be a problem for the F61 or similar models, with 60 frequency parameters, and less so for the F3x4 or similar models. if too many codons are missing in the F61 model, the states will not be commutative, which will cause problems. is this what you are talking about. if the chain is still commutative, perhaps there should be no problem.

my 2014 book has a section , 2.6, about the calculation of P(t), which also discusses how to deal with 0 frequencies. this is used in paml/codeml. i guess this sounds like your approach 2.

The above algorithm assumes that the frequency ;i > 0 for every i. If m states aremissing
(with ;i = 0), the Markov chain has in effect c – m states. We can rearrange the Q matrix
so that the submatrix for the existing states Q0, of size (c – m) ¡¿ (c – m), has the spectral
decomposition

but you should somehow discourage people from fitting big models to small datasets.

Thus, iqtree will switch to Approach 2 in a later 1.6.X release.

MF: wrong memory usage estimate?

Hi,
I was running iqtree for ModelFinder when it crashed with an "out of memory" error. I found this surprising because before starting to run the models, it states

NOTE: ModelFinder requires 653 MB RAM!

Is that the base memory needed to start running MF or is it the memory it expects to use base on the input options?
This is the command I ran

iqtree-omp -st AA -s concat.archaea.209plus.2017-06-26.faa -m TESTONLY -mset LG,C60,PMSF -nt 4 -pre test_models

Below the complete MF log

NOTE: ModelFinder requires 653 MB RAM!
ModelFinder will test 24 protein models (sample size: 4200) ...
 No. Model         -LnL         df  AIC          AICc         BIC
  1  LG            1049188.283  499 2099374.566  2099509.431  2102539.643
  2  LG+I          1046542.281  500 2094084.563  2094220.005  2097255.983
  3  LG+G4         1003257.220  500 2007514.441  2007649.883  2010685.861
  4  LG+I+G4       1002863.271  501 2006728.542  2006864.562  2009906.304
  5  LG+F          1050273.488  518 2101582.975  2101729.045  2104868.566
  6  LG+F+I        1047711.775  519 2096461.551  2096608.224  2099753.484
  7  LG+F+G4       1003295.430  519 2007628.860  2007775.534  2010920.794
  8  LG+F+I+G4     1002926.773  520 2006893.546  2007040.826  2010191.823
Model C60 is alias for POISSON+G+FMIX{C60pi1:1:0.0169698865,C60pi2:1:0.0211683374,C60pi3:1:0.0276589079,C60pi4:1:0.0065675964,C60pi5:1:0.0141221416,C60pi6:1:0.0068774834,C60pi7:1:0.0146909701,C60pi8:1:0.0067225777,C60pi9:1:0.0018396660,C60pi10:1:0.0102547197,C60pi11:1:0.0230896163,C60pi12:1:0.0057941033,C60pi13:1:0.0125394534,C60pi14:1:0.0204526478,C60pi15:1:0.0070629602,C60pi16:1:0.0117982741,C60pi17:1:0.0068334668,C60pi18:1:0.0433775839,C60pi19:1:0.0318278731,C60pi20:1:0.0222546108,C60pi21:1:0.0102264969,C60pi22:1:0.0150545891,C60pi23:1:0.0134159878,C60pi24:1:0.0148552065,C60pi25:1:0.0239111516,C60pi26:1:0.0128776278,C60pi27:1:0.0222318842,C60pi28:1:0.0247444742,C60pi29:1:0.0214274810,C60pi30:1:0.0115001882,C60pi31:1:0.0076017389,C60pi32:1:0.0130258568,C60pi33:1:0.0093701965,C60pi34:1:0.0467194264,C60pi35:1:0.0441940314,C60pi36:1:0.0322263154,C60pi37:1:0.0402999891,C60pi38:1:0.0150234227,C60pi39:1:0.0104589903,C60pi40:1:0.0214742395,C60pi41:1:0.0154957836,C60pi42:1:0.0101789953,C60pi43:1:0.0227980379,C60pi44:1:0.0123204539,C60pi45:1:0.0066777583,C60pi46:1:0.0004150083,C60pi47:1:0.0344385130,C60pi48:1:0.0113663379,C60pi49:1:0.0127143049,C60pi50:1:0.0124323741,C60pi51:1:0.0262124415,C60pi52:1:0.0064994957,C60pi53:1:0.0103203293,C60pi54:1:0.0142463512,C60pi55:1:0.0215600067,C60pi56:1:0.0199150700,C60pi57:1:0.0038964200,C60pi58:1:0.0113448855,C60pi59:1:0.0128595846,C60pi60:1:0.0117656776}
STACK TRACE FOR DEBUGGING:
1   double* aligned_alloc<double>(unsigned long)
2   PhyloTree::initializeAllPartialLh(int&, int&, PhyloNode*, PhyloNode*)
3   PhyloTree::initializeAllPartialLh()
4   testModel(Params&, PhyloTree*, std::vector<ModelInfo, std::allocator<ModelInfo> >&, std::ostream&, ModelsBlock*, int, std::string, bool, std::string)
5   initializeParams(Params&, IQTree&, std::vector<ModelInfo, std::allocator<ModelInfo> >&, ModelsBlock*)
6   runTreeReconstruction(Params&, std::string&, IQTree&, std::vector<ModelInfo, std::allocator<ModelInfo> >&)
7   runPhyloAnalysis(Params&, Checkpoint*)
8   main()
9   __libc_start_main()
ERROR: Not enough memory, allocation of 40273689632 bytes failed (bad_alloc)

Thanks,
Xabi

UFBoot2

Improving UFBoot implementation to e.g. address model violations.

Heterotachy model - fixing branch lengths

Option -blfix is not available for the heterotachy model. This means it is not possible to calculate the likelihood of the data on a specific tree. The error message received when trying to do this is:

ERROR: Fixing branch lengths is not supported under specified site rate model

iqtree-omp built with gcc/5.2.0 consistently crashes

iqtree-omp -s input.fa -spp parts.nex -m GTR+G -bo 1 -wbt -seed $RANDOM -pre boot_1 -nt 16

builds without errors, but consistently crashes at the same stage. Prebuilt binary segfaults immediately with 'FATAL: kernel too old' on RHEL 6 - 2.6.32-504.30.3.el6.x86_64 kernel.

===> START BOOTSTRAP REPLICATE NUMBER 1

Creating bootstrap alignment (seed: 14945)...

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 13.961 seconds

NOTE: 11387 MB RAM is required!
STACK TRACE FOR DEBUGGING:
1 double* aligned_alloc(unsigned long)
2 PhyloSuperTreePlen::initializeAllPartialLh()
3 runTreeReconstruction(Params&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, IQTree&, std::vector<ModelInfo, std::allocator >&)
4 runStandardBootstrap(Params&, std::cxx11::basic_string<char, std::char_traits, std::allocator >&, Alignment, IQTree)
5 runPhyloAnalysis(Params&, Checkpoint*)
6 main()
7 __libc_start_main()
ERROR: Not enough memory, allocation of 881796096 bytes failed (bad_alloc)

Converting data to a new state space

Allowing users to convert character states to a new state space. For example, the existing -st NT2AA option allows to convert codon into AA sequences. Something like -st NT2RY to convert DNA into RY (purine pyrimidine). A more general syntax might look like:

-st ABC:0,DE:1

to convert states A,B,C into state 0 and D,E into state 1.

As an application to reduce amino-acid state space to Dayhoff group (https://doi.org/10.1093/molbev/msm144):

-st AGPST:0,C:1,FWY:2,HRK:3,MILV:4,NDEQ:5

Crash with MPI version

Reported by Remi Denise

https://groups.google.com/forum/#!topic/iqtree/ux9bvgnNBCk

I launch IQtree 1.5.4 on the cluster of my institute and sometime it crashed with a segmentation fault during the optimizing step.

I launch with the command line :

iqtree-mpi -s FirmicutesSmall.aa.grp.aln -st AA -pre FirmicutesSmall.aa.grp -bb 1000 -wbtl -nm 6000 -m LG+R10

And I backtrack the error from the core file :

#0 0x00000000004e2d54 in errstreambuf::overflow(int) ()
#1 0x00007f8cdf8c048a in std::basic_streambuf<char, std::char_traits >::sputc (__c=10 '\n', this=) at /tmp/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/streambuf:434
#2 std::ostream::put (this=this@entry=0xa44c00 std::cerr, __c=) at /tmp/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:163
#3 0x00007f8cdf8c06bf in std::endl<char, std::char_traits > (__os=...) at /tmp/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ostream:565
#4 0x00000000004e3771 in funcAbort ()
#5
#6 0x00000000004965b1 in IQTree::saveUFBoot(Checkpoint*) ()
#7 0x000000000049c04b in IQTree::syncCurrentTree() ()
#8 0x00000000004a0ac4 in IQTree::doTreeSearch() ()
#9 0x0000000000513448 in runTreeReconstruction(Params&, std::string&, IQTree&, std::vector<ModelInfo, std::allocator >&) ()
#10 0x000000000051dfd8 in runPhyloAnalysis(Params&, Checkpoint*) (
#11 0x00000000004f3e0d in main ()

Support for polymorphic characters

This feature was requested by Steven.

There are two types of polymorphic coding in the nexus format. (0 1) indicates the character is observed as both states 0 AND 1, while {0 1} indicates the character is observed as either state 0 OR 1. Three or more states for a single character are also possible. I have attached a screenshot of how these are entered into the software Mesquite during character scoring as well as the "export > simplified nexus" file that is output by Mesquite.

MrBayes, PAUP and TNT all accept polymorphic coded characters but during analyses treat them as ambiguous/missing. RAxML requires that polymorphic characters are re-coded as missing before loading the alignment. This recoding is no small task with morpholgical matricies that often contain several hundred or even thousands of characters. Also, then there becomes two matrices to keep track of - one with polmorphic scores and one that has been recoded.

If IQ-TREE could also accept polymorphic coding of morphological characters it would be a huge benefit! (which will add to the many other advantages IQ-TREE has over similar software) My initial thought is that when reading in a sequence file, if the character "(" is observed, the sequence would continue to be read until the character ")" is found and then "(...)" is simply replaced by a question mark. Also, perhaps a notice could be printed to the screen that says something like "polymorphic scoring of morphological data has been detected and these characters will be treated as ambiguous/missing". The same would be done with the “{“ and “}” characters.

I hope this suggestion is helpful. You have certainly already made my research more flexible and efficient… thanks so much!

Consistent name mangling between tree reading and alignment reading

(This is low priority as there is an obvious workaround.)

I downloaded an alignment and tree from TreeBase. Annoyingly, some of
the taxon names contain spaces, and use quotes around them. IQtree reads
the alignment OK, doing some auto-substitutions to deal with space and
quote characters. The problem is then when it reads the tree file, it
doesn't do the same auto-substitutions, so it can't reconcile the
alignment taxa with the tree taxa.

./iqtree -s M39346.fasta -te T100530.treefile -redo -m 012345

Linking mixture and partition models

Allowing the possibility to have mixture models (or PMSF) as well as partitioned data analysis. One could imagine allowing the weight parameters of the mixture either being linked across partitions or unlinked).

Message from Andrew Roger:

One thing that might be really useful is for people to be able to use partition models at the same time as these PMSF (or mixture) models. For instance they may wish to have different branchlengths and alpha shape parameters for different partitions, but at the same time use the mixture models (or PMSF) with the same weights for all the partitions (or partition-specific weights maybe).

It is relevant to recent debates between Nicolas Lartillot (phylobayes CAT model) and Ken Halanych over animal phylogeny. Halanych suggests its more important to partition data than accommodate site-heterogeneity (i.e. through mixture models like CAT) for accurate phylogenetic inference. Nicolas Lartillot argues the opposite.I think Nicolas is mostly correct — that site-heterogeneity is more important usually to accommodate than partitioning…but ultimately having both partitions and the ability to have site-heterogeneity would lead to the most model ‘realism’ in my view.

Just to follow up on the rationale for a model that allows partitions AND mixture models (where the mixture models are ‘linked’ across partitions).

I think the issue of gene-specific ‘heterotachy’ — i.e. different genes having different branchlengths — can cause problems if ignored and branchlengths are linked across partitions. This is the general problem of heterotachy (see a paper we wrote on this in 2005). However, as I mentioned in my phyloseminar talk, I think the site-specific ‘constraints’ on evolution are probably and even more important issue in phylogenetics. Hence the need for the site profile mixture models and the PMSF models we’ve developed.

Ideally however, it would be nice to be able to have both at the same time. In this case I really don’t think it is necessary for the partitions to have different ‘weights’ for different mixture classes (or even different gamma distribution shape parameters) — so I think linnking the mixture models so that the weights are the same for all sites in all partitions is fine. The main rationale for partitioning, in my view, is to allow for separate branchlengths for different partitions. In my view, for concatenated protein alignments, allowing for different exchangeabilities or different mixture weigths per partition is not really important. I realize I haven’t given you a lot of literature references to back up my assertions, these are more based on my own intuition.

Improving PartitionFinder implementation

From discussions with @roblanf several improvements in PartitionFinder2 can help to substantially speed up:

  1. -rcluster-max option to only consider, at most the top max(1000, 10*#partitions) pairs of partitions for merging.
  2. rclusterf algorithm: Simultaneously merging the top 50% pairs of partitions instead of only the best pair at each step.
  3. the linked branch lengths option. First an ML is inferred from the superalignment assuming a single model. We then fix this tree and branch lengths and rescale partition rate according to this tree.

Archived tarball changed when release redone

Hi,

I think the 1.6.1 release was redone. This causes packaging to fail due to hash and/or size mismatches. If there are no downsides, in such situations, could a new release such as 1.6.1.1 be made to prevent these problems?

=> Attempting to fetch https://codeload.github.com/Cibiv/IQ-TREE/tar.gz/v1.6.1?dummy=/Cibiv-IQ-TREE-v1.6.1_GH0.tar.gz
fetch: https://codeload.github.com/Cibiv/IQ-TREE/tar.gz/v1.6.1?dummy=/Cibiv-IQ-TREE-v1.6.1_GH0.tar.gz: size unknown
fetch: https://codeload.github.com/Cibiv/IQ-TREE/tar.gz/v1.6.1?dummy=/Cibiv-IQ-TREE-v1.6.1_GH0.tar.gz: size of remote file is not known
Cibiv-IQ-TREE-v1.6.1_GH0.tar.gz                          0  B    0  Bps
=> Fetched file size mismatch (expected 4481806, actual 4481835)
=> Couldn't fetch it - please try to retrieve this
=> port manually into /portdistfiles/ and try again.
*** Error code 1
Stop.
make: stopped in /usr/ports/biology/iqtree

Protein model mixed with PoMo

I am trying to launch an analyses using the beta4 with the following options:
iqtree -nt 16 -s Concat-Tc0410st-rp3.phy -m LG+C20+R4+FO -bb 1000

However, the run stops with the following error message: ERROR: PoMo does not yet support free rate models (+R). which implies that the protein model was confused for the PoMo model. Am I doing something wrong, is there a way to fix that? Thanks!

Constrained tree search crashes with identical taxa

Hi,

I found an edge-case which makes IQTree crash. If a constraint tree is provided, but taxa in the tree are identical in the alignment, IQTree crashes unless you specify not to remove identical taxa. This could be caught and handled better. I assume the same would happen if you provided a user starting tree or fix topology tree.

Thanks,

Simon Harris

Frequent failure of optimization for free rates site model

When optimizing a +R model, the optimization reaches a hard limit of 99 iterations without having converged. For example:

./iqtree -s humanMito.fasta -te humanMito.tree -m K3P+FQ+R -pre temp -redo -v

(For this particular data set, this occurs on nearly all DNA models.)

Here is the tail end of the output. Note how the +R parameters are slowly migrating in one direction.
I've e-mailed the sample files to Minh.

  1. Current log-likelihood: -29902.7686567944
    Optimizing +R4 model parameters by 2-BFGS,EM algorithm...
    Rate parameters: A-C: 1.0000000000 A-G: 28.2498690342 A-T: 0.9694407528 C-G: 0.9694407528 C-T: 28.2498690342 G-T: 1.0000000000
    Base frequencies: A: 0.2500000000 C: 0.2500000000 G: 0.2500000000 T: 0.2500000000
    Site proportion and rates: (0.3777956238,0.0001001047) (0.3661763981,0.0001537014) (0.2515053973,1.1868637281) (0.0045225808,27.0489558811)
  2. Current log-likelihood: -29902.6767806495
    Optimizing +R4 model parameters by 2-BFGS,EM algorithm...
    Rate parameters: A-C: 1.0000000000 A-G: 28.2495458718 A-T: 0.9700523791 C-G: 0.9700523791 C-T: 28.2495458718 G-T: 1.0000000000
    Base frequencies: A: 0.2500000000 C: 0.2500000000 G: 0.2500000000 T: 0.2500000000
    Site proportion and rates: (0.3787780284,0.0001001047) (0.3671282716,0.0001322813) (0.2496048302,1.1952691030) (0.0044888698,27.1063560906)
  3. Current log-likelihood: -29902.5845485122
    Optimizing +R4 model parameters by 2-BFGS,EM algorithm...
    Rate parameters: A-C: 1.0000000000 A-G: 28.2489362374 A-T: 0.9694762182 C-G: 0.9694762182 C-T: 28.2489362374 G-T: 1.0000000000
    Base frequencies: A: 0.2500000000 C: 0.2500000000 G: 0.2500000000 T: 0.2500000000
    Site proportion and rates: (0.3797570600,0.0001001047) (0.3680770166,0.0001139548) (0.2477101348,1.2038509379) (0.0044557886,27.1340971259)
  4. Current log-likelihood: -29902.4926644145
    Optimal log-likelihood: -29902.4926644145
    Parameters optimization took 99 rounds (200.3325202390 sec)
    Best tree printed to temp.treefile
    BEST SCORE FOUND : -29902.4926644145
    Total tree length: 0.0546463944

Total number of iterations: 0
CPU time used for tree search: 0.0000070000 sec (0h:0m:0s)
Wall-clock time used for tree search: 0.0000067140 sec (0h:0m:0s)
Total CPU time used: 200.5198740000 sec (0h:3m:20s)
Total wall-clock time used: 200.4912709580 sec (0h:3m:20s)
Best tree printed to temp.treefile

Analysis results written to:
IQ-TREE report: temp.iqtree
Maximum-likelihood tree: temp.treefile
Screen log file: temp.log

Date and Time: Mon Aug 21 14:44:10 2017

humanMito.fasta.txt
humanMito.tree.txt

Selecting partition scheme is not progressing to tree construction

Hi,

Im trying to construct a partitioned phylogeny using loci locations. I input the partitions into the file part.nex and use the following command to determine the model of best fit for each

iqtree -nt AUTO -s alignment.fasta -spp part.nex -m TESTMERGE --rcluster-max 100

My dataset has 140 tax and 2530 gene Im currently using iq-tree beta 1.6.4. To my understanding once the models have been selected it should progress into tree construction. I tried re-starting the analysis but it again ends after model selection.

Setting maximum number of CPUs with AUTO

Hi,

I am looking to run iqtree on a cluster and would like to be able to take advantage of -nt AUTO. However, by default, iqtree is detecting the total number of cores on the given compute node, which may or may not match the resource allocation requested. For example, if a node has 64 cores but my job only requested 32, I would like to be able to use AUTO while also specifying "no more than 32". Is this possible with the current version?

Thanks!
-Stephanie

Rare failure of optimization for time reversible models

cormorants.nex.txt
cormorants.tree.txt

I have several cases where optimization fails for a time reversible
model. In all cases, the model is 011110+FQ (plus some
rates-across-sites option.) Here is an example:

./iqtree -s cormorants.nex -te cormorants.tree -redo -pre temp -m
011110+FQ+I

[...]

  1. Current log-likelihood: -7012.165
    Optimal log-likelihood: -7012.165
    Rate parameters: A-C: 1.000 A-G: 89.073 A-T: 89.073 C-G: 89.073
    C-T: 89.073 G-T: 1.000
    Base frequencies: A: 0.250 C: 0.250 G: 0.250 T: 0.250
    Proportion of invariable sites: 0.616
    Parameters optimization took 99 rounds (14.088 sec)
    BEST SCORE FOUND : -7012.165

(Sorry, I tried to attach the input files for the example but couldn't make it work. I've e-mailed them to Minh, or you can request them from me.)

Note that this model is like K2P, except with a different pair of rates
being allowed to differ. (K2P would be 010010.)

The problem appears to be that the variable rate parameter leaps to near
100 on the very first iteration, and then only very slowly declines on
further iterations, until iqtree reaches a hard coded maximum of 99
iterations and gives up before converging on the optimum. I suspect the
fix is to look at that very first iteration and place some limit on how
large a leap it can make. I think it starts with rate A-C: 1.000, rate
A-G:1.000 ('-v' doesn't show this) and next step has A-G: 99.9191431513
(this is the first value shown by '-v'.) If you said "from 1 to 100 in
one go is too far, I'll never change by more than a factor of 3 in a
single step" I think the problem would be solved.

Having said all this, it is a pretty rare problem. I've been doing tests
with 72 time reversible models on about 15 data sets, and I've seen this
I think just three times, all on the 011110+FQ model.

By comparison:

./iqtree -s cormorants.nex -te cormorants.tree -redo -pre temp -m
'011110{20}+FQ+I' -optfromgiven

[...]

  1. Current log-likelihood: -6969.449
    Optimal log-likelihood: -6969.444
    Rate parameters: A-C: 1.000 A-G: 5.709 A-T: 5.709 C-G: 5.709 C-T:
    5.709 G-T: 1.000
    Base frequencies: A: 0.250 C: 0.250 G: 0.250 T: 0.250
    Proportion of invariable sites: 0.610
    Parameters optimization took 11 rounds (2.306 sec)
    BEST SCORE FOUND : -6969.444

This is also a Lie-Markov model, so the following is equivalent (but
uses a different parameterization of the model):

./iqtree -s cormorants.nex -te cormorants.tree -redo -pre temp -m MK2.2b+I

[...]

  1. Current log-likelihood: -6969.448
    Optimal log-likelihood: -6969.444
    Model parameters: -0.759
    Substitution rates:
    A-C: 0.241 A-G: 1.379 A-T: 1.379 C-A: 0.241 C-G: 1.379 C-T: 1.379
    G-A: 1.379 G-C: 1.379 G-T: 0.241 T-A: 1.379 T-C: 1.379 T-G: 0.241
    Base frequencies: A: 0.250 C: 0.250 G: 0.250 T: 0.250
    Proportion of invariable sites: 0.610
    Parameters optimization took 7 rounds (1.617 sec)
    BEST SCORE FOUND : -6969.444

Proposal: refactor to make StateFreqType into a class

StateFreqType is an enum. There are in tools.cpp the following methods directly applying to it:
StateFreqType parseStateFreqFromPlusF(string model_name)
StateFreqType parseStateFreqDigits(string digits)
bool freqsFromParams(double *freq_vec, double *params, StateFreqType freq_type)
void paramsFromFreqs(double *params, double *freq_vec, StateFreqType freq_type)
void forceFreqsConform(double *base_freq, StateFreqType freq_type)
int nFreqParams(StateFreqType freq_type)
void setBoundsForFreqType(double *lower_bound,
double *upper_bound,
bool *bound_check,
double min_freq,
StateFreqType freq_type) {

In addition, I'm about to add some new code to correctly checkpoint some new StateFreqTypes.
I suggest that this is now too much functionality to hang off an enum, and it should be a class.

IQ-TREE does not compile on NetBSD

Hi,

Just wanted to compile latest IQ-TREE on NetBSD but it failed to link the iqtree binary

[...]
[100%] Linking CXX executable iqtree
CMakeFiles/iqtree.dir/tools.cpp.o: In function print_stacktrace(std::ostream&, unsigned int)': /home/njoly/temp/IQ-TREE/tools.cpp:4054: undefined reference to backtrace'
/home/njoly/temp/IQ-TREE/tools.cpp:4063: undefined reference to `backtrace_symbols'
--- iqtree ---
*** [iqtree] Error code 1

On some systems the backtrace functions are provided by libC, but on NetBSD they are located in their own library "libexecinfo.so"

Adding a check for the "Backtrace" package in CMakeList.txt, fixes the problem.

find_package(Backtrace)
target_link_libraries(iqtree ${Backtrace_LIBRARY})

Thanks in advance.

Mapping site-specific rates into PDB structure

Given an alignment and an associated PDB structure of one sequence in the alignment, compute site-specific rates and map these onto the PDB structure file by altering the occupancy and bfactor fields.

This is useful to see how protein structure and rates are correlated. This can be then displayed by a 3D structure viewer (see demo here: http://www.iqtree.org/ModelFinder/).

This feature has currently low priority but can be changed depending on user demand.

version 1.5.0a :: cmake -DIQTREE_FLAGS="omp" leads to libmpi not found

Hello,

compiling IQ-TREE-1.5.0a with -DIQTREE_FLAGS="omp" leads to linking to openmpi libraries.

see

test -d /tmp/IQ-TREE || mkdir -m 2775 /tmp/IQ-TREE/
cd /tmp/IQ-TREE && source /local/gensoft2/adm/etc/profile.d/modules.sh && module purge && module load  eigen/3.1.2 gcc/4.9.0 cmake/2.8.12.2 && cmake -DIQTREE_FLAGS="omp" /local/gensoft2/src/IQ-TREE/IQ-TREE-1.5.0a 
IQ-TREE flags : omp
Builde mode   : Release
Target OS     : Unix
Compiler      : GNU Compiler (gcc)
Target binary : 64-bit
Parallel      : OpenMP/PThreads
Vectorization : SSE3
C flags    : -Wall -g  -pthread  -O3 -g
CXX flags  :  -fopenmp  -O3 -g
Using system zlib
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/IQ-TREE
cd /tmp/IQ-TREE && source /local/gensoft2/adm/etc/profile.d/modules.sh && module purge && module load  eigen/3.1.2 gcc/4.9.0 && make -j "$(grep -c ^processor /proc/cpuinfo)"
make[1]: Entering directory `/tmp/IQ-TREE'
make[2]: Entering directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
[  1%] Built target pllavx
make[3]: Leaving directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
[  1%] make[3]: Leaving directory `/tmp/IQ-TREE'
Built target avxkernel
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
[ 19%] [ 30%] Built target pll
Built target ncl
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
[ 31%] Built target lbfgsb
[ 36%] Built target whtest
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
make[3]: Entering directory `/tmp/IQ-TREE'
[ 40%] [ 41%] Built target sprng
Built target vectorclass
make[3]: Leaving directory `/tmp/IQ-TREE'
[ 45%] Built target gsl
make[3]: Leaving directory `/tmp/IQ-TREE'
[ 59%] Built target model
make[3]: Entering directory `/tmp/IQ-TREE'
make[3]: Leaving directory `/tmp/IQ-TREE'
[100%] Built target iqtree
make[2]: Leaving directory `/tmp/IQ-TREE'
make[1]: Leaving directory `/tmp/IQ-TREE'

this generate the iqtree-omp binarie as expected, but hen running ldd on the produced binarie one can note that it tries to link to some mpi libraries.

 ldd /tmp/IQ-TREE/iqtree-omp
        linux-vdso.so.1 =>  (0x00007ffdf43fe000)
        libz.so.1 => /lib64/libz.so.1 (0x0000003cb8e00000)
        libmpi_cxx.so.1 => not found
        libmpi.so.1 => not found
        libstdc++.so.6 => /local/gensoft2/exe/gcc/4.9.0/lib64/libstdc++.so.6 (0x00007fe7cebbf000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003cb8600000)
        libgomp.so.1 => /local/gensoft2/exe/gcc/4.9.0/lib64/libgomp.so.1 (0x00007fe7ce9a9000)
        libgcc_s.so.1 => /local/gensoft2/exe/gcc/4.9.0/lib64/libgcc_s.so.1 (0x00007fe7ce793000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003cb7e00000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003cb7a00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003cb7600000)
        librt.so.1 => /lib64/librt.so.1 (0x0000003cb8a00000)

regards

Eric

-optfromgiven option does not optimize site proportions in +R models

This is related to #38, as I found it when attempting to work around that problem.
The command:
$ ./iqtree -s humanMito.fasta -te humanMito.tree -m 'K3P+R{0.35,0.00001,0.35,0.0001,0.25,1.2,0.05,27}' -pre temp -redo -v -optfromgiven
should optimize the +R parameters, starting at the values specified on the command line. However, it optimizes the rates but not the proportions, in this case finishing at:

Rate parameters: A-C: 1.0000000000 A-G: 27.6510651518 A-T: 0.9658284715 C-G: 0.9658284715 C-T: 27.6510651518 G-T: 1.0000000000
Base frequencies: A: 0.2500000000 C: 0.2500000000 G: 0.2500000000 T: 0.2500000000
Site proportion and rates: (0.3500000000,0.0000060605) (0.3500000000,0.0000606046) (0.2500000000,0.7272557580) (0.0500000000,16.3632545544)
Optimal log-likelihood: -29971.6783480812

As I'm the person who added the -optfromgiven command line option, if I find the time I'll look into this one myself, but I'm logging it here so it doesn't get lost if I forget or don't get around to it.

Polymorphism-aware models

Polymorphism-aware models allow to infer species tree in the presence of incomplete lineage sorting (http://www.iqtree.org/doc/Polymorphism-Aware-Models/). Right now it is in the PoMo branch of the git repository, and will be merged into the master code in a near future version 1.5.X.

TODOs (as of Sept 8, 2016):

  • Thorough testing, such that the code does not break down other options.
  • More user-friendly, eps. with the input count files.
  • Integration of Gamma rate heterogeneity and other models.
  • Integration of the non-reversible PoMo models.

Implement transfer bootstrap expectation (TBE)

https://doi.org/10.1101/154542

Pers. comm. with Frederic Lemoine and Olivier Gascuel:

we have been working on a new way of computing branch supports, given a reference tree and a set of bootstrap trees.

The idea is to find, for each branch of the reference tree, the branch of each bootstrap tree that minimizes the taxa "transfer distance". The support of a ref branch will then be computed as 1-(the average distance over bootstrap trees)/(p-1). p being the number of taxa on the light side of this ref branch.

If you are interested and if you want more information, a preprint is available here:
https://doi.org/10.1101/154542

The paper is otherwise under revision and should hopefully be out in the near future.

We were wondering if you think it could be interesting to integrate this new support computation in IQ-TREE?
As it is a different way of comparing reference branches to bootstrap trees, it is compatible with all methods that generate a set of bootstrap trees (classical bootstrap as well as Ultra-Fast bootstrap).

We have a C implementation of this computation here:
https://github.com/evolbioinfo/booster

MPI parallelization

MPI parallelization allows to run IQ-TREE on distributed (multi-node) computing system. It can be used in conjunction with multicore (OpenMP) parallelization to fully utilize the system. The code is in the mpi branch of the git repository, and will be merged into the master code in a near future version 1.5.X. It has been tested will several data sets showing near linear speedups.

mpirun -np 4 iqtree-mpi -s alignment...
mpirun -np 4 iqtree-omp-mpi -nt 2 -s alignment ....  #4 processes, each with 2 cores

TODOs (as of Sept 8, 2016):

  • Thorough benchmark e.g. with all data sets used in the IQ-TREE paper.

Heterotachy model base frequency crash

On some rare occasions IQ-TREE crashes when running the heterotachy model. Somehow the base frequency vector seems to go to a boundary condition, whereby one base has frequency equal to 1. I suspect the reason is that the upper and lower bounds (1 and 0.0001) on base frequencies are such that it is possible for the sum of fA+fC+fG > 1 as in the example below. The frequency of T is constrained to 1 - (fA+fC+fG) and therefore we have fT < 0 which results in the crash. If this is the cause then a possible solution would be to reduce the upper bound on the base frequencies from 1 down to 0.9997.

taxa4_modJC_1kbp_class1Weight0.2_divQ3_fitHKY_H4.log

taxa4_modJC_1kbp_class1Weight0.2_divQ3.txt

Heterotachy model

The heterotachy model is a mixture model allowing to have separate sets of branch lengths per mixture class, or branch-unlinked mixture models, which generalizes the Gamma as well as FreeRate heterogeneity across sites. It naturally accounts for heterotachous evolution (for more details see http://www.iqtree.org/doc/Heterotachy-Models/). The models will be integrated into version 1.6.0.

Right now the beta-testing implementation is available in the heterotachy branch of the git repository. The run can be invoked by e.g.:

iqtree -s alignment -m GTR+FO+H3

where H for Heterotachy and H3 means 3 heterotachous mixture classes.

Items done (as of Nov 16 2016):

  • More user-friendly (e.g. NEWICK tree file without / so that it is readable by common tree viewers).
  • Better branch length optimization with multi-dimensional Newton-Raphson instead of the EM algorithm.
  • Allow to work transparently with existing mixture models (e.g. the CAT model), so that user can specify something like -m C10+H (CAT profile mixture model, where +H will override the default Gamma model [+G]).
  • Allow a proportion of invariable sites with e.g. -m GTR+I+H3.

TODOs:

  • Thorough testing, e.g., to make sure that the code does not break down other options.
  • Allow model selection to test for +H.

Label empirical state frequencies in partitioned data

This is a small cosmetic enhancement request.

My command line:
./iqtree -s peabacteria.nex -te peabacteria.tree -sp partition.nex -redo -pre temp -m 010010+FO -v

Contents of partition.nex:
#nexus
begin sets;
charset pos1 = 2-1100\3;
charset pos2 = 3-1100\3;
charset pos3 = 1-1100\3;
end;

(This alignment did not start on codon partition 1.)
In the output:
Empirical state frequencies: 0.2290836658 0.2372652251 0.4254410892 0.1082100200
Empirical state frequencies: 0.2776395303 0.2408105227 0.1847849282 0.2967650188
Empirical state frequencies: 0.0844789551 0.4743215279 0.2938955756 0.1473039413

Please could you label which partition each of these state frequency vectors belongs to? (As it turns out, they are in order (pos1, pos2, pos3) which makes sense, but later when model fitting it does things in order (pos3, pos1, pos2). )

Failure restarting from checkpoint (erroneous "does not start with opening bracket" message)

Hi there,

It looks as though the beta 1.6.4 release of IQ-tree is having some trouble restarting from its checkpoint:


| INITIALIZING CANDIDATE TREE SET |

CHECKPOINT: Candidate tree set restored, best LogL: -2991061.651
Finish initializing candidate tree set (18)
Current best tree score: -2991061.651 / CPU time: 0.000
Number of iterations: 124
Refining ufboot trees with NNI...
ERROR: Tree file does not start with an opening-bracket '(' (line 2 column 0)

This is after a crash due to hitting a memory limit set by LSF on our cluster, while running the CAT-Poisson model.

I'd appreciate any advice. The tree file does definitely begin with an opening bracket. (although, indeed, not on line 2). I'd love to be able to re-start and not have to recompute everything, if there's a straightforward fix -- or if it's a real bug, happy to provide more details to help debug!

Best,
Chris L

Mixed partition data and reading error

Hi All,

Im currently trying to construct a tree with data from two files. I have run the files individually with no issue. My partition file is as follows:

begin sets;
charset part1 = file.aln: *;
charset part2 = file2.fa: *;
charpartition mine = model:part1, model:part2;
end;

The command is
iqtree -spp test.nex

I then get the following error:
Reading partition model file test.nex ...

Reading partition part1 (model=model, aln=file.aln, seq=DNA, pos= *) ...
Reading alignment file file.aln ... ERROR: Cannot read file file.aln

I have used this file in a single data analysis (not partitioned) with no issue so I dont know what this is happening.

Thanks

Implement ModelOMatic

Got a consent from @simonwhelan to implement ModelOMatic into IQ-TREE.

ModelOMatic: Fast and Automated Model Selection between RY, Nucleotide, Amino Acid, and Codon Substitution Models.
Simon Whelan James E. Allen Benjamin P. Blackburne David Talavera
Syst Biol (2015) 64 (1): 42-55. DOI: https://doi.org/10.1093/sysbio/syu062

https://github.com/simonwhelan/modelomatic

Further discussions:

@simonwhelan: I’m glad you like the method and I’m very happy for you to implement it into IQ-TREE. If you need any secondary verification of numbers for particular data sets let me know, but I think the standard output provides you with all the values you need.
I’m not sure whether IQ-TREE has RNA models, but our paper in GBE (https://academic.oup.com/gbe/article/6/1/65/664843/Assessing-the-State-of-Substitution-Models) does the same thing for the various different state-spaces in RNA models.

It should extended to the Dayhoff4 grouping easily providing you have the original data, preferably as codons so you can do the whole Codon/AA/Dayhoff4 comparison.
The way I think about the state-space projection is from a generative perspective, so we have data generated from the aggregated process (e.g. Dayhoff4) out to the tips of the tree. We then need a projection from those aggregated states to our general space (e.g. codons). The likelihood that this model generated the original data is exactly what we’re calculating. So that means in the inference framework the two parts are nicely orthogonal since the normal pruning algorithm can be applied on the aggregated alignment for the first part of the likelihood function, whereas the projection provides the second part in terms of the probability of getting the original alignment from our aggregated alignment.

Fast ModelFinder

Currently testing +R is too slow. Some ideas to speed it up:

  • First perform standard model selection without +R. Then take only the best 2 matrices and all matrices with IC score of at most 10 points higher. Only these matrices will then be evaluated with +R.

  • Look at Smart model selection (SMS).

    • Protein: only +G and +I+G. The better of two will be further tested with/without +F. Matrices are ranked according to AA frequencies to +F and the best "decoration" (RAS+FREQ) will be tested with the 1st ranked matrix.
    • DNA: Only four models GTR, TN93, HKY85 and K80 with four RAS none, +I,+G,+I+G. First, find the best RAS for GTR. Then compare GTR with TN93. If GTR is better then stop. Otherwise compare TN93 with HKY85 and stop if TN93 is better. Otherwise compare HKY85 with K80.

Implement jackknife

Jackknife = subsampling without replacement

With a new option like "-j 0.4" to jackknife 40% of the alignment sites.

Less priority: taxon jackknife, as jackknife trees do not have the same taxon set -> consensus tree construction does not work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.