Git Product home page Git Product logo

ptm_hotspots's Introduction

ptm_hotspot: prediction of PTM hotspots

Description

Method to infer PTM (currently only phosphorylation) hotspots in conserved protein alignments based on (Strumillo et al. bioRxiv).

Dependencies

ptm_hotspot requires Python (v3) as well as the following packages:

  • numpy
  • pandas
  • scipy
  • statsmodels

How to predict hotspots

If you want to predict all hotspots in all protein domains just run:

python3 ptm_hotspots.py -o predicted_hotspots.csv

To obtain particular domain predictions:

python3 ptm_hotspots.py -o kinase_domain_hotspots.csv -d PF00069

To obtain hotspot residue predictions instead of hotspot ranges

python3 ptm_hotspots.py -o hotspot_residues.csv --printSites

You can obtain more help and options by typing:

python3 ptm_hotspots.py -h

usage: ptm_hotspots.py [-h] [--dir [PATH]] [--ptmfile [PATH]] [-d [PFXXXXX]]
					   [--iter [INTEGER]] [--threshold [FLOAT]]
					   [--foreground [FLOAT]] -o PATH [--printSites]

Estimate PTM hotspots in sequence alignments

optional arguments:
  -h, --help            show this help message and exit
  --dir [PATH]          fasta alignments dir (default: db/alignments)
  --ptmfile [PATH]      file containing PTMs (default: db/all_phosps)
  -d [PFXXXXX], --domain [PFXXXXX]
						query single domain (i.e. PF00069)
  --iter [INTEGER]      number of permutations (default: 100)
  --threshold [FLOAT]   Corrected p-value threshold (default: 0.01)
  --foreground [FLOAT]  effect-size foreground cutoff (default: 2)
  -o PATH, --out PATH   output csv file
  --printSites          print all residue predictions

Note: Since the Bonferroni correction depends on the total number of predictions, small disimilarities might emerge in the same domain hotspots depending on whether you run only a domain or the full set of domains. Similarly, the stochastic nature of the permutation analysis might make the results vary between runs.

Customizing alignments or PTM data

By default ptm_hotspot uses a database containing precalculated domain alignments (as described in Strumillo et al.) as well as a collection of phosphorylated residues derived from public high-throughput mass spectrometry experiments. In order to update the database please consider the next points:

Alignments

Every alignment file should be in FASTA format and the header should contain the start and the end of the domains in the alignment coordinates separated by ";". For example:

>EDP05298 pep:known supercontig:v3.1:DS4 ;51;337

For full protein predictions just include the first and last positions in the multiple sequence alignment.

PTM database

The ptm database should be included as a csv file containing id, amino acid and position of the phosphosite within the protein.

Citation

  • Strumillo, M. J., Oplova, M., Viéitez, C., Ochoa, D., Shahraz, M., Busby, B. P., et al. Sopko, M., Studer, R. A., Perrimon, N., Panse, V. G., Beltrao, P. (2018). Conserved phosphorylation hotspots in eukaryotic protein domain families. bioRxiv. https://doi.org/10.1101/391185

ptm_hotspots's People

Contributors

d0choa avatar mgalardini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.