Git Product home page Git Product logo

materialparser's Introduction

MaterialParser

The class providing functionality to extract chemical data from a string of chemical terms/formulas/material names

This parser was created in order to address the problem of unification of materials entitites found in scieitific publications to facilitate text mining.

Material Parser functionality includes:

  • converting chemical terms into chemical formula
  • parsing chemical formula into composition,
  • constructing dictionary of materials abbreviations from a text snippets,
  • finding values of stoichiometric and elements variables froma a text snippets,
  • splitting mixtures/composites/alloys/solid solutions into compounds

Installation:

git clone https://github.com/CederGroupHub/MaterialParser.git
cd MaterialParser
pip install -r requirements.txt -e .

Initialization:

from material_parser import MaterialParser
mp = MaterialParser(verbose=False, pubchem_lookup=False, fails_log=False)

Material parser

Initialization

verbose: <bool> print supplemental information
pubchem_lookup: <bool> look for unknown chemical names in PubChem (not implemented, slows down computations significantly)
fails_log: <bool> outputs log of materials for which mp.parse_material failed (useful when long list of materials in processes)

Primary functionality

  • Main method to compile string of chemical terms/formulas into data structure

     mp.parse_material_string(material_string)
    
  • Method to convert chemical name into formula

    mp.string2formula(material_string)
    
  • Method to compile chemical formula into data structure containing composition

    mp.formula2composition(chemical_formula)
    

Auxiliary functions

  • Extracting snippets of the string recognized as doped elements, stabilizers, coatings, activators, etc.

    mp.separate_additives(material_string)
    
  • Spliting mixtures, alloys, composites, etc into list of constituting compounds with their fractions

    mp.split_formula_into_compounds(material_string)
    
  • Extracting species from material string

    mp.get_species(material_string)
    

Additional functionality

  • Constructing dictionary of acronyms based on provided list of materials strings and text

    mp.build_acronyms_dict(list_of_materials, text)
    
  • Looking for the values of elements variables in the text

    mp.get_elements_values(variable, text)
    
  • Looking for the values of the variables for stoichiometric amounts in the text

    mp.get_stoichiometric_values(variable, text)
    
  • Spliting in into list of chemical names material string in the format list of cations+anion

    mp.split_materials_list(material_string)
    
  • Substituting doped elements into original chemical formula to complete total stoiciometry to integer value

    mp.substitute_additives(list_of_additives, data_structure)
    

Citing

If you use Material Parser in your work, please cite the following paper:

  • Kononova et. al "Text-mined dataset of inorganic materials synthesis recipes", Scientific Data 6 (1), 1-11 (2019) 10.1038/s41597-019-0224-1

materialparser's People

Contributors

olgagkononova avatar hhaoyan avatar zherenwang avatar vtshitoyan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.