Git Product home page Git Product logo

unipar's Introduction

UNIPAR

UNItig construction in PARallel for de novo assembly

UNIPAR is a fast assembly tool that use De Bruijn graph based algorithms to assemble short sequencing reads to long unitigs. It uses both CPUs and GPUs to run in parallel, and scales to multiple computer nodes in a cluster.

Pre-requisites:

CUDA 5 or later, with GPU compute capability 3.5 or higher

GCC 4.9 or later

MPI library

tbb library used for parallel sort and scan on CPUs

Build UNIPAR:

git clone https://github.com/ShuangQiuac/UNIPAR

cd <PATH_TO_UNIPAR>

mkdir build

cd build

cmake ..

make

Run UNIPAR:

Run on a single machine:

./unipar -i <input file> -r <read length> -k <kmer length> -n <number of partitions> -c <number of CPUs> -g <number of GPUs> -d <intermediate file directory> -o <unitig output directory> -t <cutoff threshold>

Run with multiple processes:

mpirun -np <number of processes> [host options] ./unipar [parameter options]

A simple example:

./unipar -i <PATH_TO_UNIPAR>/example/test.fa -r 36 -k 27

mpirun -np 2 ./unipar -i <PATH_TO_UNIPAR>/example/test.fa -r 36 -k 27

Parameter options:

-i [STRING]: input file, either a fasta or a fastq file

-r [INT]: read length, the first r number of base pairs in a read will be taken

-k [INT]: kmer length, less than or equal to the read length, suggestted to be an odd number

-n [INT]: [Optional] number of partitions, set to be 512 by default, suggestted to be a number of power of 2

-c [INT]: [Optional] number of CPUs to run, either 0 or 1, set to 1 by default

-g [INT]: [Optional] number of GPUs to run, either set to 0 or the number of GPUs detected with UNIPAR, set to the number of GPUs detected by default

-d [STRING]: [Optional] intermediate output directory for partitions, set to ./partitions by default

-o [STRING]: [Optional] unitig output directory, set to the current directory by default

-t [INT]: [Optional] the cutoff threshold for kmer coverage, set to 1 by default

Output files:

Miminizer based partitioning files [intermediate file]

De Bruijn subgraph files [optional]: Users can choose to output constructed De Bruijn graph if they only needed the raw graph instead of the unitigs. The number of subgraph files is a user defined parameter, and set to 512 by default. Output of subgraph files is turned off by default.

Unitig files [these are output results of UNIPAR]: The total number of unitig files equals to the total number of processors run with UNIPAR.

Format: contig_<processor id>_<process id>.fa

Unitigs in all the files contributes to the final results.

Tested Datasets:

Ecoli on SRA (SRR001665) https://www.ncbi.nlm.nih.gov/sra/?term=SRR001665

Human Chr14 on GAGE: http://gage.cbcb.umd.edu/data/Hg_chr14/

Bumbblebee on GAGE http://gage.cbcb.umd.edu/data/Rhodobacter_sphaeroides/

Whole Human Genomes on SRA (SRX016231) https://www.ncbi.nlm.nih.gov/sra?term=SRX016231

Tested Machine Configuration:

GPU Nvidia K80, Nvidia P40

Total number of GPUs Upto 24 (2*12 K40)

Total number of Computer Nodes Upto 6

unipar's People

Contributors

shuangqiuac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.