Git Product home page Git Product logo

hill-climbing-lstm's Introduction

Hill Climbing LSTM

Use LSTM coupled with hill climbing algorithm to optimize a certain property in generated molecules.

This code is part of a benchmarking project for molecular discovery using generative models.

Training the model

Train the model by running

python train.py

In the train.py file, modify the data_path, and string_type (selfies or smiles) used.

The data_path can be a csv or plain text file. The data will be read using utils.get_lists() function. Specify the

  • data_path: path to the data file
  • sep (default= ' '): delimiter for columns of data in file
  • header (default=None): number of rows used as column names
  • smiles_name (default=0): name of column for the molecule smiles (if header=None, use smiles_name=0)

Climbing

Climb the LSTM using

python climb.py

In the climb.py file, modify the fitness_function, which takes in a smiles and returns a target value. Specify the data_path (see above), the model_path, where the trained LSTM is stored, the string_type (selfies or smiles), and the out_path, where the climbing results are stored.

You can change the sampling strategies by changing the keywords.

  • num_generation: how many generations of climbing
  • num_best: top k values used after sorting
  • num_randomize: number of randomized SELFIES generated from top k smiles (SMILES LSTM cannot handle randomized smiles, so the smiles are just duplicated num_randomize times)
  • samps_per_seed: number of times each of the randomized seeds are sampled
  • num_seed_chars: number of characters in starting sequence used for sampling. If None, a random number of characters from 25% to 75% of the string length is used.
  • temperature: controls randomness of sampling (larger is more random)
  • retrain: boolean that specifies whether the model is retrained on the new sampled molecules per generation

Note that the total number of sampled molecules per generation is num_best * num_randomize * samps_per_seed.

hill-climbing-lstm's People

Contributors

gkwt avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

rnaimehaom

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.