Git Product home page Git Product logo

solvbert's Introduction

SolvBERT

Description

This is the code for "SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes" paper. The preprint of this paper can be found in ChemRxiv with https://doi.org/10.26434/chemrxiv-2022-0hl5p

Installation

  1. python 3.7.12
  2. transformers 2.11.0
  3. wandb 0.12.15
  4. tokenizers 0.7.0
  5. rxnfp 0.0.7

use pip install solv-bert==0.0.7 install this package

Dataset

CombiSolv-QM. The CombiSolv-QM dataset originally came from a study by Vermeire et al. who computed the dataset using a commercial software called COSMOtherm. The dataset consists of 1 million datapoints randomly selected from all possible combinations of 284 commonly used solvents and 11,029 solutes.

CombiSolv-Exp-8780. We managed to download a portion of the CombiSolv-Exp dataset, which originally contains experimental solvent free energy data for 10,145 different solute and solvents combinations from Vermeire et al. The dataset was curated from multiple sources, including the Minnesota Solvation Database, the FreeSolv database, the CompSol database, and a dataset published by Abraham et al. We named the downloaded subset containing 8,780 combinations as CombiSolv-Exp-8780 to distinguish it from the original CombiSolv-Exp dataset.

Solubility. The Solubility dataset was originally from Boobier et al. It was curated from the Open Notebook Science Challenges water solubility dataset and the Reaxys database. This dataset includes ethanol with 695 solutes, benzene with 464 solutes, acetone with 452 solutes, and water with 900 solutes, for a total of 2,511 different combinations, with solubility expressed as log S.

Train

Model use the SMILESlanguagemodel.py to train and the Finetune.py to fine-tune. Each pre-training (SMILESlanguagemodel.py) needs to save the best model for the next step of fine-tune(Finetune.py).

Test

Model use the Predictandeval.py to shart testing. In this step, we also need to save the best model trained in the previous step (Finetune.py) for prediction and evaluation

solvbert's People

Contributors

yujiahui-icon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.