Git Product home page Git Product logo

calc_protein_conservation's Introduction

calc_protein_conservation

This script computes conservation score per aminoacid for a protein.

Credits

Many thanks to authors of BLAST+ suite, CD-HIT, MUSCLE and Jensen-Shannon divergence method. This script is based on ConSurf DB and ConCavity paper

Preliminaries

Installing all necessary tools

  1. Download BLAST+ suite from here.
  2. Install/unpack the BLAST+ archive and make sure your directory is PATH.
  3. Download CD-HIT from here.
  4. Unpack CD-HIT archive, run make command and make sure cd-hit is in PATH.
  5. Download MUSCLE from here.
  6. Unpack MUSCLE archive and add the directory to PATH.
  7. Download python script for Jensen-Shannon divergence method from here.
  8. Finally, unpack the pipeline archive from this repository.
  9. Copy the archive contents to the directory where Python script for Jensen-Shannon divergence (score_conservation.py) is located.

Installing databases

  1. Create a directory where you want to store the databases.
  2. Add environmental variable BLASTDB={path to the directory}
  3. Change directory to BLASTDB directory and run these commands. Each command will download and parse one database. We recommend to use SwissProt and UniRef90, but using TrEBML instead of UniRef90 is also possible. Note that it will take a lot of time.
curl {UniRef90 URI} | gunzip | makeblastdb -out uniref90 -dbtype prot -title UniRef90 -parse_seqids

curl {SwissProt URI} | gunzip | makeblastdb -out swissprot -dbtype prot -title SwissProt -parse_seqids

curl {TrEMBL URI} | gunzip | makeblastdb -out trembl -dbtype prot -title TrEMBL -parse_seqids

Running the script

  1. cd to the directory where you copied calc_conservation.sh.
  2. Run this command: ./calc_conservation.sh {path to your fasta file}

How it works

PSIBLAST is ran with your fasta input on SwissProt database (1 iteration, eval=1e-5).
The result is filtered (min 80% coverage and seq identity between 30% and 95%), then clustered with CD-HIT (default parameters).
If we are left with less then 50 hits, repeat the same for UniRef90 database.

Run MUSCLE and Jensen-Shannon divergence on the query hits.

calc_protein_conservation's People

Contributors

jendelel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.