Git Product home page Git Product logo

synthesis's Introduction

SYNTHESIS

The directory contains data and results from article

Tetko, I. V.; Karpov, P.; Van Deursen, R.; Godin, G. Augmented Transformer for Direct and Single-Step Retrosynthesis Predictions Overperforms All Published Approaches. Nat. Commun. 2020, submitted.

The data used to develop models and predict test sets are provided in the respective directories

USPTO-50k Schneider, N.; Lowe, D. M.; Sayle, R. A.; Tarselli, M. A.; Landrum, G. A., Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter. J. Med. Chem. 2016, 59 (9), 4385-402.

USPTO-MIT Jin, W.; Coley, C.; Barzilay, R.; Jaakkola, T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc., 2017; pp 2607–2616.

USPTO-full Dai, H.; Li, C.; Coley, C.; Dai, B.; Song, L. In Retrosynthesis Prediction with Conditional Graph Logic Network, Advances in Neural Information Processing Systems 32, Wallach, H.; Larochelle, H.; Beygelzimer, A.; Alché-Buc, F.; Fox, E.; Garnett, R., Eds. Curran Associates, Inc: 2019; pp 8872-8882

For the latter set we remove reactions, which contained no products or just single ions as reactants (e.g., US08163899B2,>>[OH2:11], US06048982,CC(=O)OCCCCC[I:22]>>[I-:22], US07425593B2,>>[K:12], US08114877B2,CC[I:13]>>[I-:13]). We also eliminated reactions where reactants had less than five atoms in total, since these were likely to be incorrect reactions.

The modified training and test sets are uspto_trainT.csv uspto_testT.csv

The augmened versions of these files were used for training (5x times) and testing (100x times) of the models.

All files are compressed using xz (to use it on Mac install, e.g. xz in MacPort; xz is available on some Linux systems, e.g. Ubuntu by default )

These sets and results are described in our pre-print, which is submitted for a review now.

The USPTO-Full, USPTO-MIT results files were too large (>100MB in archive) to be uploaded to GitHub and thus we uploaded a subset of the test file with augmentation 20x instead of 100x. The full test files can be received on the request from the authors.

The accuracy of model predictions using 20x and 100x augmentations was very similar and within 0.1% for Top-1 predictions.

The models will be made available after the acceptance of the article.

Script compare.pl can be used to calculate statistical performance using canonical files, e.g.

perl compare.pl patents_test100.csv.can result_patents_test100.csv.can 1

    will calculate top-1 performance

perl compare.pl patents_test100.csv.can result_patents_test100.csv.can 2

    will calculates top-2 performance and so on. There are other processing options, which are listed in the script output. 

synthesis's People

Contributors

bigchem avatar carpovpv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.