Git Product home page Git Product logo

gnina_tensorflow's Introduction

gnina_tensorflow: A TensorFlow implementation of models based on the gnina framework

Gnina is a method of featurisation of 3D protein-ligand complexes [1] for input into convolutional neural networks. This repo is a collection of machine learning algorithms built on top of gnina. It can be used as is, but for testing to work properly a local installation is required:

cd gnina_tensorflow
pip install -e .
python3 -m pytest

Requirements

The following are required for all or part of gnina_tensorflow:

libmolgrid (https://github.com/gnina/libmolgrid)
scikit-learn (https://github.com/scikit-learn/scikit-learn)
joblib (https://github.com/joblib/joblib)
tensorflow 2 (https://github.com/tensorflow/tensorflow)

Please note: 3D convolutions on the data format provided by gnina do not work when running TensorFlow on a CPU. Hopefully this will change in a future release of TensorFlow.

Previous work using gnina by this author can be found in Ref. [3]

Functionality included

There are two virtual screening architectures: the original implementation [1], and DenseFS [2]. There is a random forest, is trained for the same task but uses low dimensional encodings of gnina grids as input (mainly used as a diagnostic for a paper which is in the works). These encodings are generated by an autoencoder, which aims to reduce the dimensionality of the original gnina input.

Virtual Screening

Virtual screening is tasked with discriminating between active compounds and decoy compounds - that is, molecules that bind vs do not bind to a given protein target. An example of how to train a model on the small training set provided is shown below:

cd gnina_tensorflow
python3 classifier/gnina_tensorflow.py --data_root data --train data/small_chembl_test.types --batch_size 16 --iterations 100 --save_path classifier_example --densefs --inference_on_training_set

Autoencoder

An autoencoder aims to learn a meaningful low-dimensional representation of a high-dimensional input. In this instance, gnina inputs which are similar in some sense (perhaps the proteins are the same and the ligands only differ by the location of a functional group on an aromatic ring) should be converted to encodings (vectors) which are a low distance apart; the inverse should hold true for dissimilar inputs.

Dimensionality reduction is an important part of my ongoing work with gnina; an example is show below:

cd gnina_tensorflow
python3 autoencoder/gnina_autoencoder.py --data_root data --train data/small_chembl_test.types --batch_size 1 --iterations 1000 --save_path autoencoder_example --encoding_size 200 --optimiser adamax --learning_rate 0.001 --final_activation sigmoid --dimension 18.0 --resolution 1.0 --save_encodings

Random Forest

Random forests are used to test the quality of the encodings generated by the autoencoder. The autoencoder is first trained on, for example, the DUD-E dataset [4]; the resulting encodings are used to train a random forest to discriminate between actives and decoys. The same trained autoencoder is then used to generate encodings for a test set - like the ChEMBL validation set from Ref. [3] - which are used to validate the model. If good performance is achieved on the test set, we can deduce that the encodings are good representations of the original inputs:

cd gnina_tensorflow
python3 classifier/random_forest.py rf_example --train_dir data/small_chembl_test_encodings

References

[1] M Ragoza, J Hochuli, E Idrobo, J Sunseri, DR Koes. (2017). Protein–Ligand Scoring with Convolutional Neural Networks J. Chem. Inf. Model. 57, 4, 942–957

[2] Imrie, F.; Bradley, A. R.; van der Schaar, M.; Deane, C. M. (2018). Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data J. Chem. Inf. Model. 58, 2319−2330.

[3] Scantlebury, J.; Brown, N.; von Delft, F.; Deane, C. M. (2020). Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions J. Chem. Inf. Model. 60, 3722-3730.

[4] Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. (2012). Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking J. Med. Chem. 2012, 55, 14, 6582–6594.

gnina_tensorflow's People

Contributors

jscant avatar

Stargazers

Seyong Kim avatar

Watchers

 avatar

Forkers

olivert1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.