Git Product home page Git Product logo

textract-demo's Introduction

End-to-End Smart OCR

Amazon Textract's advanced extraction features go beyond simple OCR to recover structure from documents: Including tables, key-value pairs (like on forms), and other tricky use-cases like multi-column text.

However, many practical applications need to combine this technology with use-case-specific logic - such as:

  • Pre-checking that submitted images are high-quality and of the expected document type
  • Post-processing structured text results into business-process-level fields (e.g. in one domain "Amount", "Total Amount" and "Amount Payable" may be different raw annotations for the same thing; whereas in another the differences might be important!)
  • Human review and re-training flows

This solution demonstrates how Textract can be integrated with:

...on a simple example use-case: extracting vendor, date, and total amount from receipt images.

The design is modular, to show how this pre- and post-processing can be easily customized for different applications.

Solution Architecture Overview

This overview diagram is not an exhaustive list of AWS services used in the solution.

Smart OCR Architecture Diagram

The solution orchestrates the core OCR pipeline with AWS Step Functions - rather than direct point-to-point integrations - which gives us a customizable, graphically-visualizable flow (defined in /source/StateMachine.asl.json):

AWS Step Functions Screenshot

The client application and associated services are built and deployed as an AWS Amplify app, which simplifies setup of standard client-cloud integration patterns (e.g. user sign-up/login, authenticated S3 data upload).

Rather than have our web client poll the state machine for progress updates, we push messages via Amplify PubSub - powered by AWS IoT Core.

The Amplify build settings (in amplify.yml with some help from the Makefile) define how both the Amplify-native and custom stack components are built and deployed... Leaving us with the folder structure you see in this repository:

├── amplify                   [Auto-generated, Amplify-native service config]
├── source
│   ├── ocr                       [Custom, non-Amplify backend service stack]
│   │   ├── human-review              [Human review integration with Amazon A2I]
│   │   ├── postprocessing            [Extract business-level fields from Textract output]
│   │   ├── preprocessing             [Image pre-check/cleanup logic]
│   │   ├── textract-integration      [SFn-Textract integrations]
│   │   ├── ui-notifications          [SFn-IoT push notifications components]
│   │   ├── StateMachine.asl.json     [Processing flow definition]
│   │   └── template.sam.yml          [AWS SAM template for non-Amplify components]
│   └── webui                     [Front-end app (VueJS, BootstrapVue, Amplify)]
├── amplify.yml               [Overall solution build steps]
└── Makefile                  [Detailed build commands, to simplify amplify.yml]
NOTE For details on each component, check the READMEs in their subfolders!

Deploying the Solution

If you have:

...then you can go ahead and click the button below, which will fork the repository and deploy the base solution stack(s):

One-click deployment

From here, there are just a few extra (but not trivial) manual configuration steps required to complete your setup:

Now you should be all set to upload images through the app UI, review low-confidence results through the Amazon A2I UI, and see the results!

The App in Action

"Successful extraction with review screenshot"

textract-demo's People

Contributors

athewsey avatar vanithar75 avatar wonwilli avatar bbonik avatar yuansingapore avatar nutchanon-l avatar seemag avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.