Stopping criterion would be met after a minimum number of samples (say 100) and when the p-value and the confidence interval fall entirely on the same side of the significance level.
Only simple partitioning schemes are currently allowed.
Improve feature to allow partitioning schemes that cover multiple non-contiguous regions of an alignment, that are listed out of order, or that divide by codon position.
"The problem with these arguments is that you have set the maximum number of reps to 2 (--reps=2) - this means that sowhat will generate only two simulated datasets and this is not enough to calculate any statistics, such as a p-value. sowhat won't start printing a sowhat.results.txt file until after the 10th bootstrap replicate to avoid choking on any mathematical errors near the start of the analysis. We strongly recommend each analysis should use 100+ bootstrap replicates, and that the number of bootstraps (the sample size) should be justified by reporting the confidence interval surrounding the p-value"
My fix was to remove the command-line switch from the system call to R in 'Statistics/R/Bridge/Linux.pm'. We need to at least warn users. Probably should write the author(s) of Statistics::R. Not sure if this affects other versions.
This message "Constraint_X is more likely than Y, X will be used as the constraint tree instead" is only printed to STDERR. This information should also go in the results file. In general there should be more details about which tree is which in the results file.
Need to add a description of sample datasets to README.md. These should explain the origins of the datasets, as well as list some basic attributes (number of taxa, number of genes, number of sites). There should also be an indication of which files pertain to which datasets.
Will require running seq-gen in aminoacid mode and up to 20 substitutions. multistate currently fails with: "expecting 2 frequencies. Multi-State only works w/binary matrix".
We should redo some of hte problematic searches with the addition of the --no-bfgs raxml option. If that fixes the problem for those matrices, we should rerun the tests in the manuscript that were impacted by this. If they work, we should revise the manuscript to reflect the fix.
SOWH tests can take a long time on large datasets. Here are some ideas on how to monitor a job and how to cut a job short. These should be considered (and tested) for being added to the documentation:
Monitoring a job
the following command can be used to monitor a job (only if reps = 1 - the default) if run from within the directory that was specified with the --dir option:
At present the output is simple text. This makes it difficult for it to be machine readable since text format may change in unspecified ways from version to version, and therefor for SOWHAT to be wrapped into larger workflows.
We should generate structured output as a json file. This should then be read and formatted as a text output, similar to what we have now. It could also be parsed into an easy-to-read html output.
IQtree may offer advantages to RAxML, GARLi. SOWHAT will need to read the required output from the results files (base freqs, transition rates, likelihood).