Git Product home page Git Product logo

traversing_chem_space's Introduction

python version license Static Badge Static Badge

Traversing Chemical Space with Active Deep Learning: A Computational Framework for Low-data Drug Discovery

Description

This repository contains all code used in our study.

Abstract

Deep learning is accelerating drug discovery. However, current approaches are often affected by limitations in the available data, e.g., in terms of size or molecular diversity. Active deep learning has an untapped potential for low-data drug discovery, as it allows to improve a model iteratively during the screening process by acquiring new data, and to adjust its course along the way. However, several known unknowns exist when it comes to active learning: (a) what the best computational strategies are for chemical space exploration, (b) how active learning holds up to traditional, non-iterative, approaches, and (c) how it should be used in the low-data scenarios typical of drug discovery. These open questions currently limit the wider adoption of active learning in drug discovery. To provide answers, this study simulates a real-world low-data drug discovery scenario, and systematically analyses six active learning strategies combined with two deep learning architectures, on three large- scale molecular libraries. Not only do we show that active learning can achieve up to a six-fold improvement in hit discovery compared to traditional methods, but we also identify the most important determinants of its success in low-data regimes. This study lays the first-in-time foundations for the prospective use of active deep learning for low-data drug discovery and is expected to accelerate its adoption.

Figure 1

Modules

  • data_prep.py: Processes data and clusters compounds for sampling diversity.
  • nn.py: Contains neural network models (MLP, GCN, etc.).
  • screening.py: Core script for active learning cycles.
  • utils.py: Utility functions for data handling and evaluation.
  • main.py: Entry point for running experiments with customizable parameters.

Requirements

This codebase uses Python 3.9 and primarily depends on:

Tested on Ubuntu 22.04.3 and macOS 13.3.1

Installation

Install dependencies from the provided env.yaml file. This typically takes a couple of minutes (tested on Ubuntu 22.04.3).

conda env create -f env.yaml

Manual installation of requirements (tested on Ubuntu 22.04.3 and macOS 13.3.1):


conda create -n traversing_chem python=3.9
conda activate traversing_chem
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install pyg -c pyg
pip3 install scikit-learn==1.3.0 rdkit==2023.3.2 pandas tqdm argparse h5py

Usage

Data

Before running the active learning pipeline on the same data used in the paper, the original data from LITPCBA needs to be processed.

  1. run python active_learning/preprocess_data.py to process and cluster all data. This takes around an hour on a regular computer. Clustering requires >128GB of RAM.

Alternatively, pre-processed data (~40GB) can be found here

Running the pipeline

  1. Run python main.py with desired command-line arguments to start the active learning process. Ensure necessary data files are present.

Demo

  1. Run python experiments/preprocess_demo.py
  2. Run python experiments/demo.py -o demo_results.csv -acq bald -arch mlp -dataset DEMO

After several minutes, this should produce a csv with mock screening results.

Replicate results

Change the paths in experiments/replicate_all_results.sh and run it. Running all experiments will take several thousand hours using an NVIDIA A100 GPU (40GB), but this script is easily modified to run in parallel.

How to cite

You can currently cite our preprint

Traversing Chemical Space with Active Deep Learning: A Computational Framework for Low-data Drug Discovery. Derek van Tilborg and Francesca Grisoni. ChemRxiv, 2023. DOI: https://doi.org/10.26434/chemrxiv-2023-wgl32-v3

License

All code is under MIT license.

traversing_chem_space's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.