Git Product home page Git Product logo

sowhat's Issues

Possibly include stopping criterion

Stopping criterion would be met after a minimum number of samples (say 100) and when the p-value and the confidence interval fall entirely on the same side of the significance level.

Improve Partition Feature Functionality

Only simple partitioning schemes are currently allowed.
Improve feature to allow partitioning schemes that cover multiple non-contiguous regions of an alignment, that are listed out of order, or that divide by codon position.

sowhat should throw error if reps < 10

"The problem with these arguments is that you have set the maximum number of reps to 2 (--reps=2) - this means that sowhat will generate only two simulated datasets and this is not enough to calculate any statistics, such as a p-value. sowhat won't start printing a sowhat.results.txt file until after the 10th bootstrap replicate to avoid choking on any mathematical errors near the start of the analysis. We strongly recommend each analysis should use 100+ bootstrap replicates, and that the number of bootstraps (the sample size) should be justified by reporting the confidence interval surrounding the p-value"

Add more version information to sowhat.results

Currently only the version of RAxML is recorded. It would be helpful to have the following versions recorded in the results:

  1. sowhat version
  2. seq-gen version
  3. PhyloBayes version (if used)
  4. Garli version (if used)

treetwo is better

This message "Constraint_X is more likely than Y, X will be used as the constraint tree instead" is only printed to STDERR. This information should also go in the results file. In general there should be more details about which tree is which in the results file.

Description of sample datasets

Need to add a description of sample datasets to README.md. These should explain the origins of the datasets, as well as list some basic attributes (number of taxa, number of genes, number of sites). There should also be an indication of which files pertain to which datasets.

Failing travis ci due to old R

The TravisCI test is failing: https://travis-ci.org/josephryan/sowhat/builds/221453597

The problem seems to be:

  • The test images is based on Ubuntu 14.04.5 LTS
  • sudo apt-get install -y r-base installs R 3.0.2
  • The ape package is unavailable for this older R, throwing the warning package 'ape' is not available (for R version 3.0.2)
  • Results in the error library(ape) : there is no package called 'ape'

We should figure out how to install a more recent version of R on this container or use a different test image that allows installation of newer R.

Add true support for multistate

Will require running seq-gen in aminoacid mode and up to 20 substitutions. multistate currently fails with: "expecting 2 frequencies. Multi-State only works w/binary matrix".

instructions for monitoring a job - and cutting a job short

SOWH tests can take a long time on large datasets. Here are some ideas on how to monitor a job and how to cut a job short. These should be considered (and tested) for being added to the documentation:

  1. Monitoring a job
    the following command can be used to monitor a job (only if reps = 1 - the default) if run from within the directory that was specified with the --dir option:

    ls -1 sowhat_scratch/ | cut -f 4 -d '.' | sort -n | tail -n1
  2. Cutting a job short
    If the job is currently running, make a copy of the directory that was specified with the --dir option (and its contents) :

    cp -R myoutputdir rerunoutputdir

    run the exact same command as before, but with this new directory as the --dir option, and

    --reps=SMALLER_NUMBER_OF_REPS --restart

json output

At present the output is simple text. This makes it difficult for it to be machine readable since text format may change in unspecified ways from version to version, and therefor for SOWHAT to be wrapped into larger workflows.

We should generate structured output as a json file. This should then be read and formatted as a text output, similar to what we have now. It could also be parsed into an easy-to-read html output.

use IQtree for ML searches

IQtree may offer advantages to RAxML, GARLi. SOWHAT will need to read the required output from the results files (base freqs, transition rates, likelihood).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.