Git Product home page Git Product logo

ncbi_blast_tutorial's Introduction

NCBI blast tutorial

Short introduction to using NCBI blast tools from the command line

Using Blast from the command line

Sometimes, you may have to use blast on your own computer to query thousands of sequences against a custom database of hundreds of thousands of sequences. To do that, you will need to install Blast on your computer, format the database, and then blast the sequences.

Here is a short tutorial on how to do this.

Installing Blast+ tools

Get the compiled executables from this URL:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Decompress the archive. For example:

tar xvfz ncbi-blast-2.9.0+-x64-linux.tar.gz

Add the bin folder from the extracted archive to your path. For example, add the following line to your ~/.bashrc file:

export PATH="/PATH/TO/ncbi-blast-2.9.0+/bin":$PATH

And change the /PATH/TO part to the path where you have put the extracted archive.

Example sequences to use with the tutorial

In order to test blast, you need a test fasta file. Use the following files that come with the tutorial:

  • sequences.fasta
  • reference.fasta

Create blast database

The different blast tools require a formatted database to search against. In order to create the database, we use the makeblastdb tool:

makeblastdb -in reference.fasta -title reference -dbtype nucl -out databases/reference

This will create a list of files in the databases folder. These are all part of the blast database.

Blast

We can now blast our sequences against the database. In this case, both our query sequences and database sequences are DNA sequences, so we use the blastn tool:

blastn -db databases/reference -query sequences.fasta -evalue 1e-3 -word_size 11 -outfmt 0 > sequences.reference

You can use different output formats with the outmft option:

 -outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1)

Blast with parallel

If you need to run your blasts faster (and who doesn't?), you can maximise CPU usage with gnu parallel. You will find it at this link.

Download the archive, extract it (with tar xvfB parallel-latest.tar.bz2) and install it with the following commands:

./configure
make
sudo make install

We can now use parallel to speed up blast:

time cat sequences.fasta | parallel -k --block 1k --recstart '>' --pipe 'blastn -db databases/reference -query - -evalue 1e-3 -word_size 11 -outfmt 0' > sequences.reference

More options and getting help

If you need help to know the options and parameters you can pass blastn and the other blast+ utilities, use the --help option and pipe the output into less, for example:

blastn --help | less

NCBI blast tools cover more cases than DNA against DNA searches. For example, you can search a protein database with either DNA or protein sequences. Here is an exhaustive list of the programs that come with the blast+ distribution:

blastdb_aliastool
blastdbcheck
blastdbcmd
blast_formatter
blastn
blastp
blastx
convert2blastmask
deltablast
dustmasker
legacy_blast.pl
makeblastdb
makembindex
makeprofiledb
psiblast
rpsblast
rpstblastn
segmasker
tblastn
tblastx
update_blastdb.pl
windowmasker

References

O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

Licence

Creative Commons Licence
NCBI blast tutorial by Eric Normandeau is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://github.com/enormandeau/ncbi_blast_tutorial.

ncbi_blast_tutorial's People

Contributors

enormandeau avatar

Watchers

James Cloos avatar

Forkers

mehediazim

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.