Git Product home page Git Product logo

edifact-parser's Introduction

EDIFACT Parser

Scrutinizer Code Quality Type Coverage CI Minimum PHP Version

EDIFACT stands for Electronic Data Interchange For Administration, Commerce, and Transport.

This repository contains a parser for any EDIFACT file to extract the values from any segment defined in an EDIFACT formatted file.

Ok, but... What is EDIFACT?

Format of an EDIFACT file

  • Each line of the file consists of a set of data that belongs to a specific segment of a message.

  • A segment is defined by a tag. Following the rest of the data that belongs to that segment. More about segments here.

  • A message is a list of segments. Usually, all segments between the UNH and UNT segments compound a message.

  • A transaction is the list of messages that belongs to a file.

Installation

composer require chemaclass/edifact-parser

Contribute

You are more than welcome to contribute reporting issues, sharing ideas, or contributing with your Pull Requests.

Basic examples

You can see a full example of printing segments.

You can see a full example of extracting data.

<?php declare(strict_types=1);

use EdifactParser\EdifactParser;
use EdifactParser\Segments\NADNameAddress;

require dirname(__DIR__) . '/vendor/autoload.php';

$fileContent = <<<EDI
UNA:+.? '
UNB+UNOC:3+9457386:30+73130012:30+19101:118+8+MPM 2.19+1424'

UNH+1+IFTMIN:S:93A:UN:PN001'
TDT+20'
NAD+CZ+0410106314:160:Z12++Company Centre+c/o Carrier AB+City1++12345+DE'
NAD+CN+++Person Name+Street Nr 2+City2++12345+DE'
UNT+18+1'

UNZ+2+8'
EDI;

$parserResult = EdifactParser::createWithDefaultSegments()->parse($fileContent);
$firstMessage = $parserResult->transactionMessages()[0];

$cnNadSegment = $firstMessage->segmentByTagAndSubId('NAD', 'CN');
$personName = $cnNadSegment->rawValues()[4];

var_dump($personName); // 'Person Name'

edifact-parser's People

Contributors

chemaclass avatar jaimies avatar jesusvalera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

edifact-parser's Issues

End a message when a UNT segment is found

Current behavior

Currently, it seems that it's supported the start and end of a message the segments.

Service Segments are used to keep track of the transmission.

Every line in an EDI file represents a segment, and it must starts with the segment name. It is indeed the first 3 chars of the line.

UNH <- Start of Message
UNT <- End of Message

But actually, the ending of a message it's when a new start of the message is found.
See TransactionMessage::groupSegmentsByMessage()

Acceptance Criteria

  • End a message when a UNT segment is found.
  • Ignore every line that it's not between a UNH(start of message) and UNT (end of message) segments.

A test would be:

UNA:+.? '
UNH+1+anything' <- it starts the first message
CNT+7:0.1:KGM'
UNT+19+1' <- it finishes the first message
IGN+ORE:ME' <- This line should be ignored and not included in the first message
UNH+2+anything'  <- it starts the second message
UNT+19+2' <- it finishes the second message
UNZ+2+3'

This issue blocks #7

Storing line item data outside of groupedSegments

Currently, line item data (LIN segments, and the ones related to them, like QTY and PRI) is being stored together with normal data in gruopedSegments.

However, the line items data does not follow the structure of normal grouped segments:

// normal segments:

[
    ...
    'RFF' => [
        'ADE' => RFFReference(...),
        'PD' => RFFReference(...),
    ],
    ...
]

// line item data

[
    ...
    'LIN' => [
        '1' => [
            'LIN' => [ '1' => LINLineItem(...)], 
            'QTY' => ['21' => QTYQuantity(...), ... ],
            ...
        ],
    ],
    ...
]

Basically, regular segments are structured like Tag > SubID > Segment, while line items follow Tag = 'LIN' > SubID > Tag > SubID, which does not make enough sense.

What we could do instead is have all the line item data under a separate field in TransactionMessage, let's say lineItems. And the structure might look like this:

TransactionMessage(
    groupedSegments => [...],
    lineItems => [
         '1' => [
              'LIN' => [ '1' => LINLineItem(...)], 
              'QTY' => ['21' => QTYQuantity(...), ... ],
              ...
        ],
    ]
)

What do you think?

Handle unknown segments better

I feel like the way unknown segments are currently being treated is not the ideal option.

Maybe, we should call the "unknown" something else, like "other", and maybe we should also put them separately from the grouped segments, i.e.

EdifactParser\TransactionMessage Object
(
    groupedSegments: [...],
    otherSegments: [...],
)  

@Chemaclass what do you think?

Segments and its order

Question. A segment can have different meaning/belonging depending on where it is found in the file.
Example a COM segment (containing communication info). Is usually belonging to a NAD segment. It belongs to the last NAD segment before the COM segment. Example :
NAD+CN+++Happy Coder APS+Stasjonsgata 12+LYNGDAL++4580+NO'CTA+IC'[email protected]:EM'COM+47555555:SM'

Using my current homemade :) parser, i know what NAD the COM belongs to.
But in the edifact-parser, it seems like it puts the COM (and other) tags in one array, making it impossible to know who the COM belongs to.

Handling line segments properly

Currently, LIN (Line item) segments are not implemented, but that is not the problem.

The thing is that the segments related to LIN segments only make sense within the context of their line item. For example

LIN+1++9783898307529:EN'
QTY+21:5'
PRI+AAA:27.5'
LIN+2++390787706322:UP'
QTY+23:1'
PRI+AAA:10.87'

Here, the QTY (Quantity) and PRI (Price) segments only have meaning in relation to their LIN (Line item), so if we just group them together with each other as shown below, the meaning will be lost.

"LIN" => [
    1 => LINLineItem(...),
    2 => LINLineItem(...),
],

"QTY" => [
    21 => QTYQuantity(...),
   // 
    23 => QTYQuantity(...),
],
....

What we need is to group line items and their attributes together, for example

"LIN" => [
    1 => [
        LINLineItem(...),
        QTYQuantity(...),
        PRIPrice(...),
    ],
    2 => [
        LINLineItem(...),
        QTYQuantity(...),
        PRIPrice(...),
    ],
],

That will require a bit of refactoring, but I can implement it easily

Encoding error ?

Back on the script again, and trying to parse a edifact file. However i get an error if it contains nordic characters. æøå. The scripts gives me an exception like this: ["There's a not printable character on line 17: NAD+CN+++Nes videregaende skole+Kjuushagen 3+\u00c5RNES.

The file is utf-8 encoded.
Any help appreciated :)

Thanks
Tom

Grouping messages by "group" segments

Current behavior

Currently, only the "message segment grouping" it's supported. This is done by grouping the messages around a TransactionMessage object.

But, as you can see in the Service Segments, we could have an extra group definition within the "Interchange" segments.

UNB - Start of Interchange
UNG - Start of Group <- this is completely missing right now
UNH - Start of Message
UNT - End of Message
UNE - End of Group <- this is completely missing right now
UNZ - End of Interchange

The actual grouping (by message segment level) it's done in TransactionMessage::groupSegmentsByMessage().

Acceptance Criteria

  • #6 needs to be done first in order to be able to group the message segments properly (by the ending message segment definition).
  • Group the list of messages in a list of groups surrounding the UNG and UNE segments.
  • Ignore every line that it's not between a UNG(start of a group) and UNE (end of a group) segments.

A test would be:

UNG+1' <- it starts the first group 
IGN+ORE+ME' <- This line should be ignored and not included in any message
UNH+1+anything'  <- it starts the first message
UNT+19+1'  <- it finishes the first message
UNE+1+2' <- it finishes the first group 

IGN+ORE+ME' <- This line should be ignored and not included in any group

UNG+2'  <- it starts the first group 
UNH+2+anything'  <- it starts the second message
UNT+19+2' <- it finishes the second message
UNE+2+2' ' <- it finishes the first group 

UNZ+2+3' <- it ends of Interchange

[Note: the new lines in the previous example are just to help the visualization of the separation of each group and what represent each line]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.