Git Product home page Git Product logo

bio-cipres's Introduction

Bio::Phylo::CIPRES

Phylogenomic analysis on the CIPRES REST portal

Prerequisites

Usage of CIPRES requires a DEVELOPER account (not a normal user account) for the CIPRES REST API (CRA), and a registration for the app corvid19_phylogeny.With the account and app key, you can then populate a YAML file cipres_appinfo.yml thusly, substituting the fields with pointy brackets with the appropriate values:

---
URL: https://cipresrest.sdsc.edu/cipresrest/v1
KEY: <app key>
CRA_USER: <user>
PASSWORD: <pass>

Additional prerequisites, which should be resolved automatically during your chosen installation procedure (conda, cpanm) are listed under the PREREQ_PM field in the file Makefile.PL.

Installation

CPANM

$ cpanm Bio::Phylo::CIPRES

Example workflow

1. Aligning sequences

To align sequences in a FASTA file with MAFFT:

cipresrun \
     -t MAFFT_XSEDE \
     -p vparam.anysymbol_=1 \
     -i <infile> \
     -y cipres_appinfo.yml \
     -o output.mafft=/path/to/outfile.fasta
  • By adding the -v (or --verbose) flag, the XML returned by the server is shown. In the last status check, this will show additional values for -o, e.g. to retrieve STDERR and other outputs.
  • Most other parameters shown on the REST documentation page can also be used.
  • The output is written to a file with the same name is the output field (i.e. in this case a file called output.mafft), which optionally ends up in a -wd working directory.

2. Inferring trees

To infer trees from an aligned FASTA file using IQTree:

cipresrun \
    -t IQTREE_XSEDE \
    -p vparam.specify_runtype_=2 \
    -p vparam.specify_dnamodel_=HKY \
    -p vparam.bootstrap_type_=bb \
    -p vparam.use_bnni_=1 \
    -p vparam.num_bootreps_=1000 \
    -p vparam.specify_numparts_=1 \
    -i /path/to/outfile.fasta \
    -y cipres_appinfo.yml \    
    -o output.contree=/path/to/tree.dnd

bio-cipres's People

Contributors

ambarishk avatar rvosa avatar tomasmasson avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bio-cipres's Issues

Push container to Docker hub

Once all the individual steps (e.g. #3, #4) are working and are orchestrated using CWL (#5), an essential step for redistribution is that the Docker container does not have to be rebuilt by other users but can just be pulled down by the cwl-runner. For this to work, the container should be pushed to Docker hub. This will likely result in a tag that matches the name of this repo, i.e. naturalis/covid19-phylogeny under the docker hub naturalis organization

Rerunnable workflow as CWL

The goal of the basic workflow is to be able to consume unaligned FASTA, align this (i.e. solve #3) and build a tree with it (by addressing #4). These steps are implemented with tools, scripts, and web service calls that are all provisioned inside a Docker container (whose Dockerfile is in the root of the repo, and whose tag will be the same as the repo name).

Subsequently, these steps will be chained together using CWL, most of which is already scaffolded in PR #1. The essential test is therefore that we should be able to run the whole thing on a clean computer using something like cwl-runner. We will then submit this to covid19.workflowhub.eu.

Tree construction with IQTree

IQTree is more or less the standard for building trees from viral genomes. A somewhat costly step that it does by default is to do full model testing to decide on the right model. Perhaps as per Rambaut et al. we might just settle on HKY85+G.

For the clade assignments, bootstrap support would be good (e.g. 1000 replicates), so experiment with how costly that is (-bnni parameter, I think).

Sequence alignment with MAFFT or Muscle

The experiences detailed here (nextstrain/ncov#268) show that doing the MSA in one big run eventually becomes prohibitive. This was not a problem for the 400 GenBank genomes set, but as those submissions are increasing (or when we add GISAID data) it becomes an issue.

MAFFT has the virtue of being the standard that is now being used (e.g. by Rambaut et al.) but it might be slower than Muscle (@rvosa's subjective experience)? Both can be run on the CIPRES cluster. Test and decide.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.