Git Product home page Git Product logo

reinvent-benchmarking's Introduction

REINVENT benchmarking

Code is adapted from MarcusOlivecrona/REINVENT.

Updated for use with PyTorch v1.10.0. Made for use on CC cluster.

Environment

All required imports and modules are found in make_env.sh. Create virtual environment using

./make_env.sh
source ~/env/reinvent/bin/activate

Prepare data

Put data into the data/ directory. Convert the smiles of data into data/data.smi by running

python prepare_data.py

You may need to modify the script depending on the format of dataset.

Then run

python data_structs.py

to create the mols_filtered.smi and Voc file required in the data/ directory. Note that original filtering from REINVENT 1.0 is removed, and SMILES are just check for validitiy and canonicalised.

Train model

Pretrain the model for specified num-epochs by running:

python train_prior.py --num-epochs 100

You can view progress bar by toggling flag --verbose.

Early stopping is implemented, default split is 80/20 using the first 80% of SMILES as training. Learning rate is decreased every 50 steps. The model is saved in data/Prior.ckpt.

Fitness function

The fitness function is a class in custom.py. Modify the fitness function by changing the fitness_function function, which takes in a SMILES and outputs a float value. Return None for invalid smiles.

The values will be automatically scaled by a sigmoid. For sampled molecules m with fitness f, the score is (invalid molecules have 0 reward and do not affect the agent)

where a is the average of all known fitnesses from previous generations, and b is the slope of the sigmoid, calculated from a threshold.

For a set F of fitnesses from previous generations

where c is the specified threshold (default 0.8). This means for the current fitnesses F, the maximum fitness will map to 0.8 on the sigmoid. The smaller the threshold c, the stronger the reward for fitnesses larger than F.

All other parameters are left as default from REINVENT 1.0.

Run REINVENT

Run the REINVENT for generations = num-steps with evaluations = batch_size per generation. Run the script

python main.py --num-steps 10 --batch-size 500

Results will be saved in data/results, in a folder named by the timestamp of run. In this folder, a trace of the optimisation is plotted, and the results for each generation (results.csv), and the best molecule per generation (best_results.csv) are saved.

reinvent-benchmarking's People

Contributors

gkwt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.