Git Product home page Git Product logo

pblat-cluster's Introduction

pblat-cluster - parallelized blat (cluster version)

blat with cluster parallel hybrid computing support

image

When the query file format is fasta, you can specify many processes in a cluster to process it. Processes running in the same node use shared memory. Thus pblat-cluster works in hybrid computing mode. This will minimize the memory usage per node. It can reduce run time linearly. This program is useful when you blat a big query file to a huge reference like human whole genome sequence.

The program is based on the original blat program which was written by Jim Kent.

pblat-cluster can run on Linux clusters with MPI support.


Install

To compile the source code, simply enter the source code directory in terminal and issue the "make" command. When the compiling finished, the executable pblat-cluster will locate in the same directory. Then it can be moved to where you want.

By default the makefile will use mpicc to compile the codes. You can specify other MPI compilers installed in your cluster. For example, using Intel MPI compiler by typing "make CC=mpiicc" .

Run

Two ways to run pblat-cluster in a cluster:

  1. without PBS
mpirun -n <N> pblat-cluster database query output.psl
  1. with PBS

You can write a bash script and submit to PBS using qsub/sbatch.

* qsub script example :

#!/bin/bash
#PBS -N pblat
#PBS -l nodes=32:ppn=4

cd workingdir

mpirun pblat-cluster genome.fa reads.fa out.psl

* sbatch script example :

#!/bin/bash
#SBATCH -J pblat
#SBATCH -N 32
#SBATCH -n 4

cd workingdir

mpirun pblat-cluster genome.fa reads.fa out.psl

Licence

pblat is modified from blat, the licence is the same as blat. The source code and executables are freely available for academic, nonprofit and personal use. Commercial licensing information is available on the Kent Informatics website.

Cite

Wang M & Kong L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinformatics 2019, 20(1). [full text]


Copyright (C) 2012 - 2020 Wang Meng

Contact me: [email protected]

pblat-cluster's People

Contributors

icebert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pblat-cluster's Issues

Analysis doesn't finish

Hello Meng,

Great tool!! The cluster version is quick and accurate.
I have no issues using the cluster version against the genome file size of around 6MB and reads file size of 800MB with 149 read sequences. It does the analysis in about 3-4 hours.
However, I now have a genome file of size around 30MB and reads file of size around 1GB (with 49 read sequences). I left it to run for a day or so and it's still running. Sometimes it just crashes. The files in question can be found in the links below:
https://www.mediafire.com/file/v5sgvajfnweeih7/genome.fa/file
https://www.mediafire.com/file/64jiwm4cgn0bbuq/reads.fa/file

Could you please advice as to what would be the ideal ratio of nodes/cores and memory to get this analyzed as fast as possible. It still doesn't finish even if I increase the cores/nodes and memory.
I use the following sbatch script:

#SBATCH -N 1
#SBATCH -n 8
#SBATCH --mem=14G
#SBATCH --time=06:00:0
module load nixpkgs/16.09  gcc/7.3.0  openmpi/3.1.4
mpirun ./pblat-cluster genome.fa reads.fa -out=pslx out.pslx

The above script works well on the 6MB genome and 800MB reads files.

Thanks,
Vijay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.