Git Product home page Git Product logo

omniparser's Introduction

omniparser

CI codecov Go Report Card PkgGoDev Mentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.

Golang Version: 1.14

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the .snapshots sub directory, you'll find their corresponding output files.

Online Playground

Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.

Requirements

  • Golang 1.14

Recent Major Feature Additions/Changes

  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
      • 'result_type' -> 'type'
      • 'ignore_error_and_return_empty_str -> 'ignore_error'
      • 'keep_leading_trailing_space' -> 'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
    • Changed the way how we package custom functions for extensions: previously we collect custom functions from all extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom functions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • Introduced IDR for in-memory data representation.
  • Added trie based high performance times.SmartParse.
  • Command line interface (one-off transform cmd or long-running http server mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

omniparser's People

Contributors

jf-tech avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.