Git Product home page Git Product logo

h3abionet16s's Introduction

h3abionet16S 16S rDNA analysis package

We have developed an integrated software package that combines together all the steps required in the 16S analysis. It takes raw 16S rDNA reads quality controls them, creates OTUs, does OTU classification and generates a phylogenetic tree of the OTU sequences. The output is a .biom file and a Newick .tre file that can be pulled into R for further analysis. The package is wrapped into a Nextflow pipeline which is accompanied by a configuration file whereby read processing parameters and classification database can be predefined. The resulting pipeline uses FastQC and MultiQC for QC reporting, usearch for reading QC, merging and OTU picking, and QIIME for classification and phylogenetic tree generation. The whole workflow is packaged in Singularity containers and this makes it portable to any system that has Singularity setup.

Two workflow languages were investigated for running this pipeline. CWL and Nextflow.

To access the CWL workflow go here (runs on Docker containers or a locally software installed setup)

To access the Nexftlow workflow go here (runs on Singularity containers)

The Nexflow workflow is the most updated version of the pipeline and for now the recommended to use.

Todos - please let us know if you want to help on any of this.

  • Get usearch replaced with vsearch. This will make containerisation and distribution much easier. usearch is currently license and vsearch not. We have have done comparisons locally and vsearch performs just as well.
  • Work on the current Nextflow pipeline. Some steps need different resource requirements. At the moment changes the resource requirements affects all process which is unnecessary.
  • Include some unit testing with e.g. Travis CI. We have test and resulting output data.
  • Give the options to create Singularity containers directly from a Docker repos (Quay.io). We tested this at the end of 2017 and at that time it did not work but from the recent Nextflow documentation it should be sorted out now.
  • Get the CWL pipeline at the same stage as the current Nextflow pipeline. Get it running with Toil so that we can parallelise jobs. Once the CWL version is updated we can do not think it would be much work to get the pipeline setup in Galaxy.

References

To cite this pipeline, please use: Baichoo, S., Souilmi, Y., Panji, S., Botha, G., Meintjes, A., Hazelhurst, S., Bendou, H., Beste, E. de, et al. 2018. Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics. 19(1):457. DOI: 10.1186/s12859-018-2446-1.

h3abionet16s's People

Contributors

grbot avatar mepstein-gh avatar shakunbaichoo avatar mr-c avatar phelelani avatar azzaea avatar pditommaso avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.