Git Product home page Git Product logo

smiles_generator's Introduction

SMILES_generator

SMILES-based generative RNN for molecule de novo design

Introduction

Inspired by natural language processing, this repository presents generative AI for de novo design in drug discovery. The here presented method utilizes generative recurrent neural networks (RNN) with long short‐term memory (LSTM) cells. Molecules are represented with a sequential syntax called SMILES that can be captured by the RNN model with close to perfect accuracy.

After training, the learned SMILES character probabilities can be used for de novo molecule generation. Compared to classical design techniques, this method eliminates the need for virtual compound library enumeration and can be used for transfer learning, fragment growing or hit to lead optimization.

Installation

git clone https://github.com/alexarnimueller/SMILES_generator.git
cd SMILES_generator

Requirements

The project has the following requirements:

Usage

Explenations on the different options are given inside the individual files.

Training

Training a new model: python train.py

Example:

python train.py --dataset data/test.csv --name test --train 20 --lr 0.005 --batch 512 --after 2 --sample 100 --augment 5 --preprocess True --stereo 1 --reinforce False --ref None --val 0.1 --seed 42

Sampling

Sampling SMILES strings from a trained model: python sample.py

Example:

python sample.py --model checkpoint/test --out generated/test_sampled.csv --epoch 9 --num 1000 --temp 1.0 --frag ^ --seed 42

Finetuning

Finetuning (aka transfer learning) a trained model towards molecules of interest: python finetune.py

Example:

python finetune.py --model checkpoint/test --dataset data/actives.csv --name test-finetune --lr 0.005 --epoch 19 -- train 20 --sample 100 --temp 1.0 --after 1 --augment 10 --batch 16 --preprocess False --stereo 1 --reinforce False --mw_filter 200-400 --reference None --val 0.0 --seed 42 --workers 1

Analysis

Analyzing sampled SMILES strings: python analyze.py

Example:

python analyze.py --generated generated/test_sampled.csv --reference data/test.csv --name test --n 3 --fingerprint ECFP4

References

Publications employing similar techniques:

  • Gupta, A., Müller, A. T., Huisman, B. J. H., Fuchs, J. A., Schneider, P. and Schneider, G. (2018) Generative recurrent networks for de novo drug design. Mol. Inf. 37, 1700111.
  • Merk, D., Grisoni, F., Friedrich, L. and Schneider, G. (2018) Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators. Communications Chemistry 1, 68.
  • Merk, D., Friedrich, L., Grisoni, F. and Schneider, G. (2018) De novo design of bioactive small molecules by artificial intelligence. Mol. Inf. 37, 1700153

smiles_generator's People

Contributors

alexarnimueller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.