View Code? Open in Web Editor NEW

This project forked from su-group/solvbert

This is the code for "SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes" paper. The preprint of this paper can be found in ChemRxiv with https://doi.org/10.26434/chemrxiv-2022-0hl5p

solvbert's Introduction

SolvBERT

Description

Installation

python 3.7.12
transformers 2.11.0
wandb 0.12.15
tokenizers 0.7.0
rxnfp 0.0.7

use　pip install solv-bert==0.0.7 install this package

Dataset

CombiSolv-QM. The CombiSolv-QM dataset originally came from a study by Vermeire et al. who computed the dataset using a commercial software called COSMOtherm. The dataset consists of 1 million datapoints randomly selected from all possible combinations of 284 commonly used solvents and 11,029 solutes.

CombiSolv-Exp-8780. We managed to download a portion of the CombiSolv-Exp dataset, which originally contains experimental solvent free energy data for 10,145 different solute and solvents combinations from Vermeire et al. The dataset was curated from multiple sources, including the Minnesota Solvation Database, the FreeSolv database, the CompSol database, and a dataset published by Abraham et al. We named the downloaded subset containing 8,780 combinations as CombiSolv-Exp-8780 to distinguish it from the original CombiSolv-Exp dataset.

Solubility. The Solubility dataset was originally from Boobier et al. It was curated from the Open Notebook Science Challenges water solubility dataset and the Reaxys database. This dataset includes ethanol with 695 solutes, benzene with 464 solutes, acetone with 452 solutes, and water with 900 solutes, for a total of 2,511 different combinations, with solubility expressed as log S.

Train

Model use the SMILESlanguagemodel.py to train and the Finetune.py to fine-tune. Each pre-training (SMILESlanguagemodel.py) needs to save the best model for the next step of fine-tune(Finetune.py).

Test

Model use the Predictandeval.py to shart testing.　In this step, we also need to save the best model trained in the previous step (Finetune.py) for prediction and evaluation

Recommend Projects

drmaruyama / solvbert Goto Github PK

solvbert's Introduction

SolvBERT

Description

Installation

Dataset

Train

Test

solvbert's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent