Git Product home page Git Product logo

orthoquoll's Introduction

ppp_logotype

About

This program evaluates the quality of one or more sets of orthologs that were inferred using a tree-based orthology inference program such as PhyloPyPruner. In addition to the usual metrics provided by the forementioned orthology inference program, OrthoQuoll also calculates tree diameter statistics for all input orthologs.

Installation

OrthoQuoll has two dependencies: FastTree and MAFFT. The easiest way to install these is to install these via Anaconda. First, if you don't have Anaconda installed already, follow these instructions. After conda has been installed, simply put the following into your terminal:

$ conda install -c bioconda fasttree mafft

To run OrthoQuoll itself, clone this repository into your desired location, cd into the base directory (orthoquoll) and run the program like so:

$ ./orthoquoll.py

In some cases, you may have to add the permission to execute the program first:

$ chmod +x orthoquoll.py

Tutorial

The first thing you might want to do after you have installed the program is to print the help menu by issuing the following command:

$ ./orthoquoll.py --help
usage: orthoquoll.py [-h] [--id STRING] [--realign] [--subdirs] [--no-trees] [--no-header] [--threads COUNT] [--overwrite] [--output PATH] PATH [PATH ...]

Extract statistics from a directory of alignments.

positional arguments:
  PATH             path to a directory that contains multiple FASTA files or one or more FASTA files

optional arguments:
  -h, --help       show this help message and exit
  --id STRING      give this supermatrix a custom name (default: unknown)
  --realign        realign all alignments using MAFFT's LINSI algorithm
  --subdirs        search for files in subdirectories
  --no-trees       do not infer phylogenetic trees and do not report tree diameter statistics (much faster)
  --no-header      if provided, do not include a header in the CSV output
  --threads COUNT  number of threads used for running MAFFT and FastTree (default: all available
  --overwrite      if provided, overwrite any existing file with the same output path
  --output PATH    path to the output file, pre-existing files are overwritten! (default: supermatrix_stats.csv)

The input to OrthoQuoll is a directory of FASTA files. The program expects that each species is separated using a delimiter (either | or @). For example, Drosophila@16S is a valid entry. There is an example directory included in the base directory called test_data. The test data is made up of a small set of orthologs from the animal group Lophotrochozoa that were inferred using PhyloPyPruner.

In our example run, we will just utilize the base functionality of OrthoQuoll. Move into the project's base directory and type the following:

$ ./orthoquoll.py test_data
OrthoQuoll 0.1.0
author: Felix Thalen <[email protected]>

Generating trees using FastTree

Ortholog statistics:
  No. of alignments:                   10
  No. of sequences:                   819
  No. of OTUs:                         74
  Avg no. of sequences / alignment:    81
  Avg no. of OTUs / alignment:         56
  Avg sequence length (ungapped):     174
  Shortest sequence (ungapped):        52
  Longest sequence (ungapped):        232
  % missing data:                   26.60
  Concatenated alignment length:     1913
  Min tree diameter:                 1.21
  Max tree diameter:                 2.46
  Avg tree diameter:                 1.97
  Median tree diameter:              2.14

Wrote ortholog statistics to supermatrix_stats.csv

completed in 7.14 seconds

The output will also be saved to the file supermatrix_stats.csv. You can change the name of the output file by using the flag --output. The results written to this file is appended unless the flag --overwrite has been set. You can use the flag --id to give your run a different name and you can use --no-header to skip the header line in the output file. Here is an example of what the supermatrix_stats.csv output file looks like:

id;alignments;sequences;otus;meanSequences;meanOtus;meanSeqLen;shortestSeq;longestSeq;pctMissingData;catAlignmentLen;minTreeDiameter;maxTreeDiameter;meanTreeDiameter;medianTreeDiameter
lophotrochozoa;10;819;74;81;56;174;52;232;26.6;1913;1.21;2.46;1.97;2.14

This file is most easily viewed using column -t -s\; supermatrix_stats.csv or by opening it in a spreadsheet program such as Excel or LibreOffice Calc.

© Animal Evolution and Biodiversity 2019

orthoquoll's People

Contributors

fethalen avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.