Git Product home page Git Product logo

seq2dagchainer's Introduction

Seq2DagChainer

This is a preliminary implementation of the algorithms proposed in the article ”Using Minimum Path Cover to Boost Dynamic Programming in DAGs: Co-linear Chaining Extended” by A. Kuosmanen, T. Paavilainen, T. Gagie, R. Chikhi, A. Tomescu and V. Mäkinen (presented in RECOMB2018).

ColinearSolver class is the main class implementing the algorithms. SequenceGraph models a graph where each node represents a single base. Currently SequenceGraph class supports only creation from a splicing graph. There are two approaches available for finding maximal exact matches (to be used as anchors for the colinear chain algorithm) between a sequence graph and a sequence: slow naive approach based on graph traversal and fast approach based on GCSA2. Additionally the project includes various utility classes, such as a static Range Maximum Query Tree implementation, SAM format reader and FASTA format reader.

The folder MC-MPC contains a copy of Topi Paavilainen's Master's thesis project[1], for finding the minimum path cover in a graph. The folder traphlor contains a slightly modified copy of Traphlor software[2].

The project includes a pipeline (transcriptPipeline) that takes as input short read alignments and long read sequences, and predicts transcripts from them in the following steps:

  • a splicing graph is created from the short reads
  • long reads are aligned to the splicing graph using colinear chaining
  • colinear chains are converted into subpath constraints (chains of exons)
  • Traphlor is used to predict transcripts based on both short and long reads

Branches

Branch master implements the algorithms of the article exactly. Branch heuristics is testing ground for various heuristics to improve transcript prediction accuracy.

Getting started

Prerequisites

Seq2DagChainer uses GCSA2[3], SDSL[4] and LEMON[5] libraries. Also the software VG[6] needs to be in the system path, as it is used to transform the splicing graphs into input format suitable for GCSA2 indexing.

Building

Change the variables for the library locations in Makefile to correspond to your installation locations. Then issue ”make” command in the main directory.

Note that ”divsufsort” and ”divsufsort64” libraries should have been installed when installing SDSL libraries, but they might be located in a different folder than SDSL libraries.

Running the pipeline

To run the transcript prediction pipeline, issue the command ”transcriptPipeline [options]” with the following options:

Required options:
    -s SHORT --short SHORT        Short reads alignment (SAM) file.
    -l LONG --long LONG           Long reads (FASTA) file.
    -g GENOME --genome GENOME     Genome (FASTA) file.
    -o OUTPUT --output OUTPUT     Output directory.
Other options:
    -m SEEDLEN --minimum SEEDLEN  Minimum seed length (default 5).
    -t INT --stringency INT       How strict the subpath reporting is.
                                  Higher values allow for more distant
                                  anchors to form a chain. (Range: 0-5. Default: 0)
    -d  --debug               	  Debug mode on.

Authors

The main contributors are:

  • Anna Kuosmanen (the modules in the top directory, parts of Traphlor)
  • Topi Paavilainen (the algorithm for finding the minimum path cover, MC-MPC)
  • Ahmed Sobih (parts of Traphlor)

Contact

[email protected]

Citation

If you use this project in an academic setting, please cite

Kuosmanen A., Paavilainen T., Gagie T., Chikhi R., Tomescu A., Mäkinen V. (2018) Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-linear Chaining Extended. In: Raphael B. (eds) Research in Computational Molecular Biology. RECOMB 2018. Lecture Notes in Computer Science, vol 10812. Springer, Cham

The preprint of the paper is available here in arxiv.

References

[1] https://github.com/tobtobtob/MC-MPC

[2] https://sourceforge.net/projects/ilordmpc/

[3] https://github.com/jltsiren/gcsa2

[4] https://github.com/simongog/sdsl-lite

[5] http://lemon.cs.elte.hu/trac/lemon

[6] https://github.com/vgteam/vg

seq2dagchainer's People

Contributors

anna-kuosmanen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.