Git Product home page Git Product logo

guacamol_baselines's Introduction

GuacaMol Baselines

A series of baseline model implementations for the guacamol benchmark for generative chemistry.
A more in depth explanation of the benchmarks and scores for these baselines is can be found in our paper.

Dependencies

To install all dependencies:

conda install rdkit -c rdkit
pip install -r requirements.txt

Dataset

Some baselines require the guacamol dataset to run, to get it run:

bash fetch_guacamol_dataset.sh

Random Sampler

Dummy baseline, always returning random molecules form the guacamol training set.

To execute the goal-directed generation benchmarks:

python -m random_smiles_sampler.goal_directed_generation

To execute the distribution learning benchmarks:

python -m random_smiles_sampler.distribution_learning

Best from ChEMBL

Dummy baseline that simply returns the molecules from the guacamol training set that best satisfy the score of a goal-directed benchmark.
There is no model nor training, its only purpose is to establish a lower bound on the benchmark scores.

To execute the goal-directed generation benchmarks:

python -m best_from_chembl.goal_directed_generation

No distribution learning benchmark available.

SMILES GA

Genetic algorithm on SMILES as described in: https://www.journal.csj.jp/doi/10.1246/cl.180665

Implementation adapted from: https://github.com/tsudalab/ChemGE

To execute the goal-directed generation benchmarks:

python -m smiles_ga.goal_directed_generation

No distribution learning benchmark available.

Graph GA

Genetic algoritm on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751

Implementation adapted from: https://github.com/jensengroup/GB-GA

To execute the goal-directed generation benchmarks:

python -m graph_ga.goal_directed_generation

No distribution learning benchmark available.

Graph MCTS

Monte Carlo Tree Search on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751

Implementation adapted from: https://github.com/jensengroup/GB-GB

To execute the goal-directed generation benchmarks:

python -m graph_mcts.goal_directed_generation

To execute the distribution learning benchmarks:

python -m graph_mcts.distribution_learning

To re-generate the distribution statistics as pickle files:

python -m graph_mcts.analyze_dataset

SMILES LSTM Hill Climbing

Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329

This implementation optimizes using hill climbing algorithm.

Implementation by BenevolentAI

A pre-trained model is provided in: smiles_lstm/pretrained_model

To execute the goal-directed generation benchmarks:

python -m smiles_lstm_hc.goal_directed_generation

To execute the distribution learning benchmark:

python -m smiles_lstm_hc.distribution_learning

To train a model from scratch:

python -m smiles_lstm_hc.train_smiles_lstm_model

SMILES LSTM PPO

Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329

This implementation optimizes using proximal policy optimization algorithm.

Implementation by BenevolentAI

A pre-trained model is provided in: smiles_lstm/pretrained_model

To execute the goal-directed generation benchmarks:

python -m smiles_lstm_ppo.goal_directed_generation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.