Git Product home page Git Product logo

ppi-sites-prediction's Introduction

Prediction of protein-protein interaction sites using convolutional neural network and improved data sets.

Authors

Zengyan Xie, Xiaoya Deng, Kunxian shu.

Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

Description

Protein-protein interaction (PPI) sites play a key role in the formation of protein complex which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which leads to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI sites prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under curve (AUC)=0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false positive PPI sites in the positive samples defined by distance between residue atoms.

Usage

If you publish pictures or models using our software please cite the following paper:

Xie, Z.; Deng, X.; Shu, K. Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int. J. Mol. Sci. 2020, 21, 467.

DEPENDENCIES

Our tools depends upon the following:

  • Python 3.5

  • Tensorflow 1.10.0

  • Python modules: Numpy, Matplotlib, re, sys, os, random, sklearn

  • Tools: PSAIA, PSI-BLAST

Please install these dependencies before using our tools.

USAGE

  1. Feature Extraction(section 4.5 in our paper for details):
  • Amino Acid Encoding

Twenty amino acids were coded as one-hot encoding. (Table S4 in the Supplementary Material).

  • Profile Features

PSSM and PSFM were computed by running 3 iterations of PSIBLAST [66] against the NCBI NR database for a given protein with E-value set to 0.001. PSSM and PSFM columns were taken within a length 3 window centered at a residue of the protein to obtain a 3 x 40 matrix.

  • Amino Acid Physicochemical Properties

Twenty-four physicochemical properties of amino acids [67] are used in this study. Twenty amino acids are divided into three groups according to these properties and each group is encoded using one-hot encoding, thus each amino acid is represented as a 72-dimensional vector.

  • Structure Features

Five structure-based features (ASA, RASA, DPX, CX, Hydrophobicity) were computed using PSAIA.

You can use Python scripts for all the above steps, including data preprocessing.

  1. The training and testing is pretty simple. Just follow the following steps:
  • Put feature files of each complex in a fold.

  • Run leave_one_complex.py, then you can get AUC of each complex by using leave-one-complex-out validation.

  • Run kfold.py, then you can get the result of 5-fold cross-validation.

We tested our model on 8 Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz and NVIDIA Corporation GP102 [TITAN Xp] (rev a1).

ppi-sites-prediction's People

Contributors

xiaoya-deng avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.