Git Product home page Git Product logo

patent-reaction-extraction's Introduction

Patent Reaction Extractor

A presentation on this software is available here

Reactions extracted using this software, in collaboration with NextMove Software, covering US patents from 1976 to September 2016 are freely available here

NextMove Software commercially provides an up to date database of automatically extracted reactions as part of their Pistachio product.

Older results from this software are available here

This software is licensed under the GPLv3 for compatibility with Epam's Indigo toolkit


Instructions for use

The system takes as input either an XML patent (recent USPTO and EPO patents tested as working) or a list of "heading" and "p" elements in the order they appear in a document.

For the former use case, where inputStream is an inputStream from an XML patent:

Document doc = Utils.buildXmlFile(inputStream);
ReactionExtractor extractor = new ReactionExtractor(doc);
extractor.extractReactions();
Map<Reaction, IndigoObject> completeReactions = extractor.getAllCompleteReactions();

completeReactions are those for which an atom map that accounts for the origins of all atoms in the product/s could be accounted for. The returned map contains associated Reaction objects which can be inspected or trivially serialised to CML via their toCML() method. The IndigoObjects are Indigo reactions (created by the Indigo toolkit) that contains the unique structure resolvable components from the Reaction objects. They can be inspected to retrieve the atom mapping Indigo assigned.

Utils.serializeReactions(outputDir, completeReactions) is a useful convenience method for serialising reactions to CML and graphical depictions


Advanced Usage

In the presentation, precision was enhanced by restricting the reactions to those that had no reactants/spectators/products with a ChemicalEntityType of chemicalClass or fragment Additionally all products were required to have been associated with a chemical structure (can be checked with hasInchi() and hasSmiles())

ExtractOrganicChemistryPatents may be used filter patents downloaded from Google (http://www.google.com/googlebooks/uspto-patents.html) to just organic chemistry patents.

Performance can be adjusted by using extractor.setIndigoAtomMappingTimeout

This sets how long, at maximum, may be spent atom mapping a reaction

Build Status

patent-reaction-extraction's People

Contributors

dan2097 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.