Git Product home page Git Product logo

contrail-annotated's Introduction

Contrail
http://contrail-bio.sf.net
===================================================

The first step towards analyzing a previously unsequenced organism is 
to assemble the reads by merging similar reads into progressively 
longer sequences. New assemblers such as Velvet and Euler  attempt to 
solve the assembly problem by constructing, simplifying, and traversing 
the de Bruijn graph of the read sequences. Nodes in the graph represent 
substrings of the reads, and directed edges connect consecutive substrings. 
Genome assembly is then modeled as finding an Eulerian tour through the 
graph, although repeats may lead to multiple possible tours. As such, 
assemblers primarily focus on correcting errors, reconstructing unambiguous 
regions, and resolving short repeats. These assemblers have successfully 
assembled small genomes from short reads, but have had limited success 
scaling to larger mammalian-sized genomes, in part, because they 
require constructing and manipulating graphs far larger than can fit into
memory.


Addressing this limitation, we have developed a new assembly program Contrail,
that uses Hadoop for de novo assembly of large genomes from short sequencing
reads. Similar to other leading short read assembler, Contrail relies on the
graph-theoretic framework of de Bruijn graphs. However, unlike these programs,
which require large RAM resources, Contrail relies on Hadoop to iteratively
transform an on-disk representation of the assembly graph, allowing an in depth
analysis even for large genomes. Preliminary results show Contrail’s contigs
are of similar size and quality to those generated by Velvet when applied to
small (bacterial) genomes, but provides vastly superior scaling capabilities
when applied to large genomes. We are also developing extensions to Contrail to
efficiently compute a traditional overlap-graph based assembly of large genomes
within Hadoop, strategy that will be especially valuable as read lengths
increase beyond 100bp.


Contrail enables de novo assembly of large genomes from short reads by bridging
research in computation biology with research in high performance computation.
This combination is essential in light of the large data sets involved, and has
the potential to unlock discoveries of critical magnitude. Whereas the
published analysis of the African and Asian human individuals used read mapping
to discover conserved regions and regions with small polymorphisms, de novo
assembly has the unique potential to also discover large scale polymorphisms
between these individuals and the reference human genome. Mapping the
large-scale differences is an important step towards better understanding of
our own biology, and may reveal previously unknown characteristics of the human
genome related to health or disease. Furthermore, a short read assembler for
large genomes is also essential for sequencing the vast numbers of complex
organisms that have never been sequenced before, and will directly contribute
to new biological knowledge. 



Release History
===================================================


Version 0.8.2
Oct 13, 2010
===================================================
Initial public release

contrail-annotated's People

Contributors

agbiotec avatar zenlc2000 avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.