General

Experimental deep learning architecture for scoring protein-protein interactions.

See PointNet paper for original architecture description. This implementation contains two architectures, neither of which contain the transformer networks, so can be considered variants of the vanilla version of PointNet. The first differs merely in its dropout rate (50%), whereas the second is a novel architecture called Siamese PointNet, visible in the image below.

Other adaptations include cosine annealing learning rate decay, which has been implemented to improve accuracy and generalizability of the trained network (see Stochastic Gradient Descent with Warm Restarts), and a custom loss function introducing a bias in learning towards higher scoring decoys.

Dependencies

Python 3.x
H5Py for fast data retrieval
PyTorch <0.4 and its dependencies
Data conversion uses DeepRank and its dependencies
Seaborn for plotting

Usage

python train.py

  --batch_size BATCH_SIZE   Input batch size (default = 256)
  --num_points NUM_POINTS   Points per point cloud used (default = 1024)
  --num_epoch NUM_EPOCH     Number of epochs to train for (default = 15)
  --CUDA                    Train on GPU
  --out_folder OUT_FOLDER   Model output folder
  --model MODEL             Model input path
  --data_path DATA_PATH     Path to HDF5 file
  --lr LR                   Learning rate (default = 0.0001)
  --optimizer OPTIMIZER     What optimizer to use. Options: Adam, SGD, SGD_cos
  --avg_pool                Use average pooling for feature pooling (instead of default max pooling)
  --dual                    Use Siamese PointNet architecture
  --metric METRIC           Metric to be used. Options: irmsd, lrmsd, fnat, dockQ (default)
  --dropout DROPOUT         Dropout rate in last layer. When 0 replaced by batchnorm (default = 0.5)
  --root                    Apply square root on metric (for DockQ score balancing)
  --patience PATIENCE       Number of epochs to observe overfitting before early stopping
  --classification          Classification instead of regression

The network takes the atoms taking part in an interaction as point cloud data. Data conversion can be performed using the extract_pc.py script.

Data is saved in HDF5 format containing 3 groups: train, test and "holdout" data. Datasets within these groups contain atom features with float32 precision and attributes containing the iRMSD, lRMSD, FNAT, and DockQ scores.

Current state

Architecture & training scripts have been fully implemented

deeprank / ponder Goto Github PK

ponder's Introduction

General

Dependencies

Usage

Current state

ponder's People

Contributors

Stargazers

Watchers

Forkers

ponder's Issues

Recommend Projects

Recommend Topics

Recommend Org