Git Product home page Git Product logo

big_xbwt's Introduction

Big-XBWT

The code is a modification of the repository Big-BWT by Giovanni Manzini. Given a genome and an aligned set of read to this genome, it creates a tree structure and build the XBWT of that tree which indexes the reads and the genome. The modification were done by Garance Gourdel with the help and direction of Giovanni Manzini and Travis Gagie and the participation of Jan Studený.

Dependencies

Those two library are needed:

  • Bamtools a C++ library to handle BAM file [Github][Wiki].
  • SDSL a succint data structure library for C++, used for sparse bitvector [Github].

After installing those library you might need to modify the path to those library in the makefile.

Usage

To compute the structure for a genome stored in FASTA file format and reads in BAM file format, the instructions are the following:

make all
./scan_BAM_READER.x data/ref.fasta data/reads.bam
./bwtparse.x data/ref.fasta
./pfbwt64.x data/ref.fasta

The result is then in data/ref.fasta.bwt. This will also add numerous temporary file to the path where the reference was. For convenience, a simple script run_bam.sh regroups those 3 steps.

Example

./run.sh data/toy_example/ref.in data/toy_example/reads.in

Computes the xbwt of a synthetic dataset, then we check that the occurrences of each char is the same in the xbwt and in the original input.

Input format

The input format for the reference file can be either:

  • The FASTA file format with only one sequence: the reference genome. The filename needs to end by ".fasta".
  • A text file with only the genome sequence and no other character (in particular no \n), as in the example file data/ref.in

For the read file we support two format:

  • A text file with a line per read : first the reference position in the genome and then the sequence of the read data/reads.in.
make
./scan.x data/ref.in data/reads.in
./bwtparse.x data/ref.in
./pfbwt.x data/ref.in
  • The BAM file format, but to parse it we depend on the bamtools library that you need to have installed in you home. The file name extension has to be ".bam".

Reproducibility

To reproduce our experiments you will need to install:

  • Samtools a software to handle SAM and BAM files.
  • BWA a software package for mapping DNA sequences against a large reference genome.
  • ropebwt2 a tool for constructing the FM-index for a collection of DNA sequences.
  • SPRING a compression tool for Fastq files.

To use our experimental pipeline you will then need to modify the path in experiments/makefile to fit your installation path. Then usage of that pipeline is given a path to a genome in fasta format data/ecoli/ref.fasta and a path to the reads in fastq format data/ecoli/reads.fastq:

cd experiments
make DIR=../data/ecoli NAME=ecoli
cat ecoli.runs_summary

big_xbwt's People

Contributors

fnareoh avatar jendas1 avatar

Stargazers

Yingfeng avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.