ReLeaSE solution solved with: H2O4GPU, CUDA 10, Anaconda, RDKit, TensorFlow-GPU, OpenChem and more!
Drug Discovery using H2O4GPU based off of the following paper: Deep ReLeaSE (Reinforcement Learning for de-novo Drug Design) by: Mariya Popova, Olexandr Isayev, Alexander Tropsha. Deep Reinforcement Learning for de-novo Drug Design. Science Advances, 2018, Vol. 4, no. 7, eaap7885. DOI: 10.1126/sciadv.aap7885. Please note that this implementation of Deep Reinforcement Learning for de-novo Drug Design aka ReLeaSE method only works on Linux.
H2O4GPU is a collection of GPU solvers by H2Oai with APIs in Python and R. The Python API builds upon the easy-to-use scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn.
-
OpenChem: OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend
-
- 2D and 3D molecular operations
- Descriptor generation for machine learning
- Molecular database cartridge for PostgreSQL
- Cheminformatics nodes for KNIME (distributed from the KNIME community site: https://www.knime.com/rdkit)
- TUTORIALS:
https://github.com/rdkit/rdkit-tutorials
-
- Compute modal decompositions and reduced-order models, easily, efficiently, and in parallel
-
TensorFlow for GPU v1.13.1: Machine Learning
-
TensorBoard: Understand, debug, and optimize
-
PyTorch: Neural Networks from research to production
-
Dask Distributed: Distributed ingestion of data
-
Feature Tools: Automted feature engineering
-
NVIDIA TensorRT inference accelerator and CUDA 10: CUDA + TPUs makes you awesome
-
PyCUDA 2019: Python interface for direct access to GPU or TPU
-
CuPy:latest: GPU accelerated drop in replacement for numpy
-
cuDNN7.4.1.5 for deeep learning in CNN's: GPU-accelerated library of primitives for deep neural networks
- tqdm: Progess bars
- Ubuntu 18.04 so you can 'nix your way through the cmd line!
-
Hot Reloading: code updates will automatically update in container from /apps folder.
-
TensorBoard is on localhost:6006 and GPU enabled Jupyter is on localhost:8888.
-
Python 3.7
-
Only Tesla Pascal and Turing GPU Architecture are supported
-
Test with synthetic data that compares GPU to CPU benchmark, and Tensorboard example:
-
[CPU/GPU Benchmark](https://github.com/joehoeller/ /blob/master/apps/gpu_benchmarks/benchmark.py)
-
[Tensorboard to understand & debug neural networks](https://github.com/joehoeller/ /blob/master/apps/gpu_benchmarks/tensorboard.py)
-
- JAK2_min_max_demo.ipynb -- JAK2 pIC50 minimization and maximization
- LogP_optimization_demo.ipynb -- optimization of logP to be in a drug-like region from 0 to 5 according to Lipinski's rule of five.
- RecurrentQSAR-example-logp.ipynb -- training a Recurrent Neural Network to predict logP from SMILES using OpenChem toolkit.
Disclaimer: JAK2 demo uses Random Forest predictor instead of Recurrent Neural Network, since the availability of the dataset with JAK2 activity data used in the "Deep Reinforcement Learning for de novo Drug Design" paper is restricted under the license agreement. So instead we use the JAK2 activity data downladed from ChEMBL (CHEMBL2971) and curated. The size of this dataset is ~2000 data points, which is not enough to build a reliable deep neural network. If you want to see a demo with RNN, please checkout logP optimization
Link to nvidia-docker2 install: Tutorial
You must install nvidia-docker2 and all it's deps first, assuming that is done, run:
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
sudo systemctl daemon-reload
sudo systemctl restart docker
How to run this container:
docker build -t <container name> .
< note the . after
Run the image, mount the volumes for Jupyter and app folder for your fav IDE, and finally the expose ports 8888
for TF1, and 6006
for TensorBoard.
docker run --rm -it --runtime=nvidia --user $(id -u):$(id -g) --group-add container_user --group-add sudo -v "${PWD}:/apps" -v $(pwd):/tf/notebooks -p 8888:8888 -p 0.0.0.0:6006:6006 <container name>
-
Exec into the container and check if your GPU is registering in the container and CUDA is working:
-
Get the container id:
docker ps
- Exec into container:
docker exec -u root -t -i <container id> /bin/bash
- Check if NVIDIA GPU DRIVERS have container access:
nvidia-smi
- Check if CUDA is working:
nvcc -V
(It helps to use multiple tabs in cmd line, as you have to leave at least 1 tab open for TensorBoard@:6006)
-
Demonstrates the functionality of TensorBoard dashboard
-
Exec into container if you haven't, as shown above:
-
Get the
<container id>
:
docker ps
docker exec -u root -t -i <container id> /bin/bash
- Then run in cmd line:
tensorboard --logdir=//tmp/tensorflow/mnist/logs
- Type in:
cd /
to get root.
Then cd into the folder that hot reloads code from your local folder/fav IDE at: /apps/apps/gpu_benchmarks
and run:
python tensorboard.py
- Go to the browser and navigate to:
localhost:6006
-
Demonstrate GPU vs CPU performance:
-
Exec into the container if you haven't, and cd over to /tf/notebooks/apps/gpu_benchmarks and run:
-
CPU Perf:
python benchmark.py cpu 10000
- CPU perf should return something like this:
Shape: (10000, 10000) Device: /cpu:0 Time taken: 0:00:03.934996
- GPU perf:
python benchmark.py gpu 10000
- GPU perf should return something like this:
Shape: (10000, 10000) Device: /gpu:0 Time taken: 0:00:01.032577
AppArmor on Ubuntu has sec issues, so remove docker from it on your local box, (it does not hurt security on your computer):
sudo aa-remove-unknown
If building impactful data science tools for pharma is important to you or your business, please get in touch.
Email: [email protected]