Git Product home page Git Product logo

php-unstructured-text-parser's Introduction

Unstructured Text Parser [PHP]

Build Status Coverage Status Latest Stable Version Latest Unstable Version Total Downloads License

About Unstructured Text Parser

This is a small PHP library to help extract text out of documents that are not structured in a processing friendly format. When you want to parse text out of form generated emails for example you can create a template matching the expected incoming mail format while specifying the variable text elements and leave the rest for the class to extract your pre-formatted variables out of the incoming mails' body text.

Useful when you want to parse data out of:

  • Emails generated from web forms
  • Documents with definable templates / expressions

Installation

PHP Unstructured Text Parser is available on Packagist (using semantic versioning), and installation via Composer is recommended. Add the following line to your composer.json file:

"aymanrb/php-unstructured-text-parser": "~2.0"

or run

composer require aymanrb/php-unstructured-text-parser
<?php
include_once __DIR__ . '/../vendor/autoload.php';

$parser = new aymanrb\UnstructuredTextParser\TextParser('/path/to/templatesDirectory');

$textToParse = 'Text to be parsed fetched from a file, mail, web service, or even added directly to the a string variable like this';

//performs brute force parsing against all available templates, returns first match successful parsing
$parseResults = $parser->parseText($textToParse);
print_r($parseResults->getParsedRawData());

//slower, performs a similarity check on available templates to select the most matching template before parsing
print_r(
    $parser
        ->parseText($textToParse, true)
        ->getParsedRawData()
);

Parsing Procedure

1- Grab a single copy of the text you want to parse.

2- Replace every single varying text within it to a named variable in the form of {%VariableName%}

3- Add the templates file into the templates directory (defined in parsing code) with a txt extension fileName.txt

4- Pass the text you wish to parse to the parse method of the class and let it do the magic for you.

Template Example

If the text documents you want to parse looks like this:

Hi GitHub-er,
If you wish to parse message coming from a website that states info like:
Name: Pet Cat
E-Mail: [email protected]
Comment: Some text goes here

Thank You,
Best Regards
Admin

Your Template file (example_template.txt) could be something like:

Hi {%nameOfRecipient%},
If you wish to parse message coming from a website that states info like:
Name: {%senderName%}
E-Mail: {%senderEmail%}
Comment: {%comment%}

Thank You,
Best Regards
Admin

The output of a successful parsing job would be:

Array(
    'nameOfRecipient' => 'GitHub-er',
    'senderName' => 'Pet Cat',
    'senderEmail' => '[email protected]',
    'comment' => 'Some text goes here'
)

Upgrading from v1.x to v2.x

Version 2.0 is more or less a refactored copy of version 1.x of the library and provides the exact same functionality. There is just one slight difference in the results returned. It's now a parsed data object instead of an array. To get the results as an array like it used to be in v1.x simply call "getParsedRawData()" on the returned object.

<?php
//ParseText used to return array in 1.x
$extractedArray = $parser->parseText($textToParse);

//In 2.x you need to do the following if you want an array
$extractedArray = $parser->parseText($textToParse)->getParsedRawData();

php-unstructured-text-parser's People

Contributors

aymanrb avatar fredericseiler avatar beriw98 avatar germanllop avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.