chemaclass / edifact-parser Goto Github PK

View Code? Open in Web Editor NEW

13.0 5.0 3.0 884 KB

A parser for a UN/EDIFACT file in PHP

Home Page: https://packagist.org/packages/chemaclass/edifact-parser

License: Other

PHP 99.48% Dockerfile 0.52%

edi edifact php parser

edifact-parser's Introduction

EDIFACT Parser

EDIFACT stands for Electronic Data Interchange For Administration, Commerce, and Transport.

This repository contains a parser for any EDIFACT file to extract the values from any segment defined in an EDIFACT formatted file.

Ok, but... What is EDIFACT?

Format of an EDIFACT file

Each line of the file consists of a set of data that belongs to a specific segment of a message.
A segment is defined by a tag. Following the rest of the data that belongs to that segment. More about segments here.
A message is a list of segments. Usually, all segments between the UNH and UNT segments compound a message.
A transaction is the list of messages that belongs to a file.

Installation

composer require chemaclass/edifact-parser

Contribute

You are more than welcome to contribute reporting issues, sharing ideas, or contributing with your Pull Requests.

Basic examples

You can see a full example of printing segments.

You can see a full example of extracting data.

<?php declare(strict_types=1);

use EdifactParser\EdifactParser;
use EdifactParser\Segments\NADNameAddress;

require dirname(__DIR__) . '/vendor/autoload.php';

$fileContent = <<<EDI
UNA:+.? '
UNB+UNOC:3+9457386:30+73130012:30+19101:118+8+MPM 2.19+1424'

UNH+1+IFTMIN:S:93A:UN:PN001'
TDT+20'
NAD+CZ+0410106314:160:Z12++Company Centre+c/o Carrier AB+City1++12345+DE'
NAD+CN+++Person Name+Street Nr 2+City2++12345+DE'
UNT+18+1'

UNZ+2+8'
EDI;

$parserResult = EdifactParser::createWithDefaultSegments()->parse($fileContent);
$firstMessage = $parserResult->transactionMessages()[0];

$cnNadSegment = $firstMessage->segmentByTagAndSubId('NAD', 'CN');
$personName = $cnNadSegment->rawValues()[4];

var_dump($personName); // 'Person Name'

edifact-parser's People

Contributors

Stargazers

Watchers

Forkers

jaimies webcraftniray

edifact-parser's Issues

End a message when a UNT segment is found

Current behavior

Currently, it seems that it's supported the start and end of a message the segments.

Service Segments are used to keep track of the transmission.

Every line in an EDI file represents a segment, and it must starts with the segment name. It is indeed the first 3 chars of the line.

UNH <- Start of Message
UNT <- End of Message

But actually, the ending of a message it's when a new start of the message is found.
See TransactionMessage::groupSegmentsByMessage()

Acceptance Criteria

End a message when a UNT segment is found.
Ignore every line that it's not between a UNH(start of message) and UNT (end of message) segments.

A test would be:

UNA:+.? '
UNH+1+anything' <- it starts the first message
CNT+7:0.1:KGM'
UNT+19+1' <- it finishes the first message
IGN+ORE:ME' <- This line should be ignored and not included in the first message
UNH+2+anything'  <- it starts the second message
UNT+19+2' <- it finishes the second message
UNZ+2+3'

This issue blocks #7

Parsing unb segment

Need example how to parse unb segment.
Thank you

Storing line item data outside of groupedSegments

Currently, line item data (LIN segments, and the ones related to them, like QTY and PRI) is being stored together with normal data in gruopedSegments.

However, the line items data does not follow the structure of normal grouped segments:

// normal segments:

[
    ...
    'RFF' => [
        'ADE' => RFFReference(...),
        'PD' => RFFReference(...),
    ],
    ...
]

// line item data

[
    ...
    'LIN' => [
        '1' => [
            'LIN' => [ '1' => LINLineItem(...)], 
            'QTY' => ['21' => QTYQuantity(...), ... ],
            ...
        ],
    ],
    ...
]

Basically, regular segments are structured like Tag > SubID > Segment, while line items follow Tag = 'LIN' > SubID > Tag > SubID, which does not make enough sense.

What we could do instead is have all the line item data under a separate field in TransactionMessage, let's say lineItems. And the structure might look like this:

TransactionMessage(
    groupedSegments => [...],
    lineItems => [
         '1' => [
              'LIN' => [ '1' => LINLineItem(...)], 
              'QTY' => ['21' => QTYQuantity(...), ... ],
              ...
        ],
    ]
)

What do you think?

Handle unknown segments better

I feel like the way unknown segments are currently being treated is not the ideal option.

Maybe, we should call the "unknown" something else, like "other", and maybe we should also put them separately from the grouped segments, i.e.

EdifactParser\TransactionMessage Object
(
    groupedSegments: [...],
    otherSegments: [...],
)

@Chemaclass what do you think?

Expose a method on TransactionMessage to access all segments

The title speaks for itself

Question. A segment can have different meaning/belonging depending on where it is found in the file.
Example a COM segment (containing communication info). Is usually belonging to a NAD segment. It belongs to the last NAD segment before the COM segment. Example :
NAD+CN+++Happy Coder APS+Stasjonsgata 12+LYNGDAL++4580+NO'CTA+IC'[email protected]:EM'COM+47555555:SM'

Using my current homemade :) parser, i know what NAD the COM belongs to.
But in the edifact-parser, it seems like it puts the COM (and other) tags in one array, making it impossible to know who the COM belongs to.

Investigate and develop all possible missing segments

Investigate and develop all possible or common segments that we should provide by default as part of this library inside src/Segments

Handling line segments properly

Currently, LIN (Line item) segments are not implemented, but that is not the problem.

The thing is that the segments related to LIN segments only make sense within the context of their line item. For example

LIN+1++9783898307529:EN'
QTY+21:5'
PRI+AAA:27.5'
LIN+2++390787706322:UP'
QTY+23:1'
PRI+AAA:10.87'

Here, the QTY (Quantity) and PRI (Price) segments only have meaning in relation to their LIN (Line item), so if we just group them together with each other as shown below, the meaning will be lost.

"LIN" => [
    1 => LINLineItem(...),
    2 => LINLineItem(...),
],

"QTY" => [
    21 => QTYQuantity(...),
   // 
    23 => QTYQuantity(...),
],
....

What we need is to group line items and their attributes together, for example

"LIN" => [
    1 => [
        LINLineItem(...),
        QTYQuantity(...),
        PRIPrice(...),
    ],
    2 => [
        LINLineItem(...),
        QTYQuantity(...),
        PRIPrice(...),
    ],
],

That will require a bit of refactoring, but I can implement it easily

Encoding error ?

Back on the script again, and trying to parse a edifact file. However i get an error if it contains nordic characters. æøå. The scripts gives me an exception like this: ["There's a not printable character on line 17: NAD+CN+++Nes videregaende skole+Kjuushagen 3+\u00c5RNES.

The file is utf-8 encoded.
Any help appreciated :)

Thanks
Tom

Releasing a new version of the library with the recently-added features

The plan is to improve the handling of unknown segments first (I'm working on it) and then release a new version of the library that contains all the newly-added features.

I'm pretty sure I can even make the release on my own, though I'm not sure how it's done.

Changing the class names of the segments to not duplicate the segment tag

Since we are already returning the actual value of the tag in SegmentInterface::tag(), I don't think it makes all that much sense to include the 3-letter code in the name of segment classes, i.e. we can use NameAndAddress as the class name instead of NADNameAddress.

Grouping messages by "group" segments

Current behavior

Currently, only the "message segment grouping" it's supported. This is done by grouping the messages around a TransactionMessage object.

But, as you can see in the Service Segments, we could have an extra group definition within the "Interchange" segments.

UNB - Start of Interchange
UNG - Start of Group <- this is completely missing right now
UNH - Start of Message
UNT - End of Message
UNE - End of Group <- this is completely missing right now
UNZ - End of Interchange

The actual grouping (by message segment level) it's done in TransactionMessage::groupSegmentsByMessage().

Acceptance Criteria

#6 needs to be done first in order to be able to group the message segments properly (by the ending message segment definition).
Group the list of messages in a list of groups surrounding the UNG and UNE segments.
Ignore every line that it's not between a UNG(start of a group) and UNE (end of a group) segments.

A test would be:

UNG+1' <- it starts the first group 
IGN+ORE+ME' <- This line should be ignored and not included in any message
UNH+1+anything'  <- it starts the first message
UNT+19+1'  <- it finishes the first message
UNE+1+2' <- it finishes the first group 

IGN+ORE+ME' <- This line should be ignored and not included in any group

UNG+2'  <- it starts the first group 
UNH+2+anything'  <- it starts the second message
UNT+19+2' <- it finishes the second message
UNE+2+2' ' <- it finishes the first group 

UNZ+2+3' <- it ends of Interchange

[Note: the new lines in the previous example are just to help the visualization of the separation of each group and what represent each line]

Should use a 3-letter segment tag instead of the class name

SegmentInterface::tag() is supposed to return A three-character alphanumeric code that identifies the segment., but the implementations all return their self::class.

I can easily fix that, just need approval.

chemaclass / edifact-parser Goto Github PK

edifact-parser's Introduction

EDIFACT Parser

Format of an EDIFACT file

Installation

Contribute

Basic examples

edifact-parser's People

Contributors

Stargazers

Watchers

Forkers

edifact-parser's Issues

Current behavior

Acceptance Criteria

Current behavior

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org