Prediction of protein-protein interaction sites using convolutional neural network and improved data sets.

Authors

Zengyan Xie, Xiaoya Deng, Kunxian shu.

Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

Description

Protein-protein interaction (PPI) sites play a key role in the formation of protein complex which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which leads to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI sites prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under curve (AUC)=0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false positive PPI sites in the positive samples defined by distance between residue atoms.

Usage

If you publish pictures or models using our software please cite the following paper:

Xie, Z.; Deng, X.; Shu, K. Prediction of Protein–Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int. J. Mol. Sci. 2020, 21, 467.

DEPENDENCIES

Our tools depends upon the following:

Python 3.5
Tensorflow 1.10.0
Python modules: Numpy, Matplotlib, re, sys, os, random, sklearn
Tools: PSAIA, PSI-BLAST

Please install these dependencies before using our tools.

USAGE

Feature Extraction(section 4.5 in our paper for details):

Amino Acid Encoding

Twenty amino acids were coded as one-hot encoding. (Table S4 in the Supplementary Material).

Profile Features

PSSM and PSFM were computed by running 3 iterations of PSIBLAST [66] against the NCBI NR database for a given protein with E-value set to 0.001. PSSM and PSFM columns were taken within a length 3 window centered at a residue of the protein to obtain a 3 x 40 matrix.

Amino Acid Physicochemical Properties

Twenty-four physicochemical properties of amino acids [67] are used in this study. Twenty amino acids are divided into three groups according to these properties and each group is encoded using one-hot encoding, thus each amino acid is represented as a 72-dimensional vector.

Structure Features

Five structure-based features (ASA, RASA, DPX, CX, Hydrophobicity) were computed using PSAIA.

You can use Python scripts for all the above steps, including data preprocessing.

The training and testing is pretty simple. Just follow the following steps:

Put feature files of each complex in a fold.
Run leave_one_complex.py, then you can get AUC of each complex by using leave-one-complex-out validation.
Run kfold.py, then you can get the result of 5-fold cross-validation.

We tested our model on 8 Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz and NVIDIA Corporation GP102 [TITAN Xp] (rev a1).

fatancy2580 / ppi-sites-prediction Goto Github PK

ppi-sites-prediction's Introduction

Prediction of protein-protein interaction sites using convolutional neural network and improved data sets.

Authors

Description

Usage

ppi-sites-prediction's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent