Git Product home page Git Product logo

anuvaad-language_translators's Introduction

Service Build Status
Zuul Build Status
NMT Build Status
Workflow Manager Build Status
Aligner Build Status
User Management Build Status
Tokeniser Build Status
Translator Build Status

Anuvaad Solution Diagram

Components

Component Details
Workflow Manager(WM) Centralized Orchestrator based on user request.
Auditor Python package/library used for formatting , exception handling.
File Uploader Microservice to upload and maintain user documents.
File Converter Microservice to convert files from one format to other. E.g: .doc to .pdf files.
Aligner Microservice accepts source and target sentances and align them to form parallel corpus.
Tokenizer Microservice tokenises pragraphs into independently translatable sentences.
Layout Detector Microservice interface for Layout detection model.
Block Segmenter Handles layout detection miss-classifications , region unifying.
Word Detector Word detection.
Block Merger An OCR system that extracts texts, images, tables, blocks etc from the input file and makes it avaible in the format which can be utilised by downstream services to perform Translation. This can also be used as an independent product that can perform OCR on files, images, ppts, etc.
Translator Translator pushes sentences to OpenNMT which are translated and pushed back during the document translation flow.
Content Handler Repository Microservice which maintains and manages all the translated documents
Translation Memory X(TMX) System translation memory to facilitate overriding NMT translation with user preferred translation. TMX provides three levels of caching - Global , User , Organisation.
User Translation Memory(UTM) System tracks and remembers individual user translations or corrected translations and applies automatically when same sentences are encountered again.

AI/ML Assets

Component Details
PRIMA Layout detection model.
Google Vision Used for OCR in Document Digitization v1.0 , v1.5. Replaced with custom trained Tesseract in latest versions.
CRAFT Used for Line detection.
Tesseract Custom trained Tesseract used for OCR.
OpenNMT Custom trained OpenNMT used for translation.

Technology Stack

Component Details
Apache Kafka Translator and OpenNMT are integrated through Kafka messaging.
MongoDB Primary data storage.
Redis Secondary in memory storage.
Cloud Storage Samba storage is used to store user input files.
NGINX Serve as a redirection server and also takes care of system level configs. Ngnix acts as the gateway.
Zuul API Gateway to apply filters on client requests,authenticate,authorize,throttle client requests.

anuvaad-language_translators's People

Contributors

imsatyamshandilya avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.