Git Product home page Git Product logo

dahak's Introduction

Dahak

Dahak is a software suite that integrates state-of-the-art open source tools for metagenomic analyses. Tools in the dahak software suite will perform various steps in metagenomic analysis workflows including data pre-processing, metagenome assembly, taxonomic and functional classification, genome binning, and gene assignment. We aim to deliver the analytical framework as a robust and reliable containerized workflow system, which will be free from dependency, installation, and execution problems typically associated with other open-source bioinformatics solutions. This will maximize the transparency, data provenance (i.e., the process of tracing the origins of data and its movement through the workflow), and reproducibility.

Getting Started

Analysis protocols can be found in the workflows directory. It is assumed that analysis will begin with read filtering and instructions for Docker installation are included there.

You can run these protocols interactively using Docker or automate them using Snakemake and Singularity. See the workflows README for Docker, Snakemake and, Singularity istall instructions.

The assembly, comparison, functional inference, and taxonomic classification workflows are depednent upon the output of the read filtering workflow data. You can download our data to use in the read filtering protocol from the openscience framework. See the section below titled Data and the read filtering protocol for more information.

Prerequisites

Currently, for the sake of simplicity, it is assumed that all workflow steps will be run from Ubuntu 16.04 LTS.

dahak is not a standalone program, but rather a collection of workflows that are defined in snakemake files and that utilize bioconda and Docker to install and run software for different tasks.

See the workflows/ directory to get started.

Data

For purposes of benchmarking this project will use the following datasets:

Dataset Description
Shakya complete Complete metagenomic dataset from Shakya et al., 2013* containing bacterial and archaeal genomes
Shakya subset 50 50 percent of the reads from Shakya complete
Shakya subset 25 25 percent of the reads from Shakya complete
Shakya subset 10 10 percent of the reads from Shakya complete

*Shakya, M., C. Quince, J. H. Campbell, Z. K. Yang, C. W. Schadt and M. Podar (2013). "Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities." Environ Microbiol 15(6): 1882-1899.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests to us.

Contributors

Phillip Brooks1, Charles Reid1, Bruce Budowle2, Chris Grahlmann3, Stephanie L. Guertin3, F. Curtis Hewitt3, Alexander F. Koeppel4, Oana I. Lungu3, Krista L. Ternus3, Stephen D. Turner4,5, C. Titus Brown1

1School of Veterinary Medicine, University of California Davis, Davis, CA, United States of America

2Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, Fort Worth, Texas, United States of America

3Signature Science, LLC, Austin, Texas, United States of America

4Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States of America

5Bioinformatics Core, University of Virginia School of Medicine, Charlottesville, VA, United States of America

See also the list of contributors who participated in this project.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments

  • Bioconda
  • Hat tip to anyone whose code was used

dahak's People

Contributors

brooksph avatar charlesreid1 avatar ctb avatar kternus avatar olungu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.