Seq2DagChainer

This is a preliminary implementation of the algorithms proposed in the article ”Using Minimum Path Cover to Boost Dynamic Programming in DAGs: Co-linear Chaining Extended” by A. Kuosmanen, T. Paavilainen, T. Gagie, R. Chikhi, A. Tomescu and V. Mäkinen (presented in RECOMB2018).

ColinearSolver class is the main class implementing the algorithms. SequenceGraph models a graph where each node represents a single base. Currently SequenceGraph class supports only creation from a splicing graph. There are two approaches available for finding maximal exact matches (to be used as anchors for the colinear chain algorithm) between a sequence graph and a sequence: slow naive approach based on graph traversal and fast approach based on GCSA2. Additionally the project includes various utility classes, such as a static Range Maximum Query Tree implementation, SAM format reader and FASTA format reader.

The folder MC-MPC contains a copy of Topi Paavilainen's Master's thesis project[1], for finding the minimum path cover in a graph. The folder traphlor contains a slightly modified copy of Traphlor software[2].

The project includes a pipeline (transcriptPipeline) that takes as input short read alignments and long read sequences, and predicts transcripts from them in the following steps:

a splicing graph is created from the short reads
long reads are aligned to the splicing graph using colinear chaining
colinear chains are converted into subpath constraints (chains of exons)
Traphlor is used to predict transcripts based on both short and long reads

Branches

Branch master implements the algorithms of the article exactly. Branch heuristics is testing ground for various heuristics to improve transcript prediction accuracy.

Getting started

Prerequisites

Seq2DagChainer uses GCSA2[3], SDSL[4] and LEMON[5] libraries. Also the software VG[6] needs to be in the system path, as it is used to transform the splicing graphs into input format suitable for GCSA2 indexing.

Building

Change the variables for the library locations in Makefile to correspond to your installation locations. Then issue ”make” command in the main directory.

Note that ”divsufsort” and ”divsufsort64” libraries should have been installed when installing SDSL libraries, but they might be located in a different folder than SDSL libraries.

Running the pipeline

To run the transcript prediction pipeline, issue the command ”transcriptPipeline [options]” with the following options:

Required options:
    -s SHORT --short SHORT        Short reads alignment (SAM) file.
    -l LONG --long LONG           Long reads (FASTA) file.
    -g GENOME --genome GENOME     Genome (FASTA) file.
    -o OUTPUT --output OUTPUT     Output directory.
Other options:
    -m SEEDLEN --minimum SEEDLEN  Minimum seed length (default 5).
    -t INT --stringency INT       How strict the subpath reporting is.
                                  Higher values allow for more distant
                                  anchors to form a chain. (Range: 0-5. Default: 0)
    -d  --debug               	  Debug mode on.

Authors

The main contributors are:

Anna Kuosmanen (the modules in the top directory, parts of Traphlor)
Topi Paavilainen (the algorithm for finding the minimum path cover, MC-MPC)
Ahmed Sobih (parts of Traphlor)

Contact

[email protected]

Citation

If you use this project in an academic setting, please cite

Kuosmanen A., Paavilainen T., Gagie T., Chikhi R., Tomescu A., Mäkinen V. (2018) Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-linear Chaining Extended. In: Raphael B. (eds) Research in Computational Molecular Biology. RECOMB 2018. Lecture Notes in Computer Science, vol 10812. Springer, Cham

The preprint of the paper is available here in arxiv.

References

[1] https://github.com/tobtobtob/MC-MPC

[2] https://sourceforge.net/projects/ilordmpc/

[3] https://github.com/jltsiren/gcsa2

[4] https://github.com/simongog/sdsl-lite

[5] http://lemon.cs.elte.hu/trac/lemon

[6] https://github.com/vgteam/vg

bluealin / seq2dagchainer Goto Github PK

seq2dagchainer's Introduction

Seq2DagChainer

Branches

Getting started

Prerequisites

Building

Running the pipeline

Authors

Contact

Citation

References

seq2dagchainer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent