Git Product home page Git Product logo

lucy's Introduction

Lucy Logo

Lucy Overview

Lucy is an Entity Recognition engine which defines a simple yaml syntax for recognizing entities in text.

Core concept

Lucy's core concept is that:

Entities are cascading patterns of tokens and entities.

Lucy focuses on capturing your knowledge by focusing on token patterns.

Token patterns are a natural way of building a model, because it allows you to focus on the fragments of information you understand in way that is instantly testable and naturally builds up to more complex entities.

Key tenants

Tenant 1: Minimize concept count

Lucy has 2 core concepts:

  1. Name your Entities - name each entity in your system.
  2. Define their Patterns - Define entity patterns as simple sequences of tokens and other entities.

Tenant 2: Instant feedback

When authoring you want immediate feedback on the impact of your changes. Lucy acheives this by not requiring any training or even CLI tool to be invoked to see the results of your changes.

Tenant 3: Maximize your input

Traditional ML systems require you to "invent" many permutations of sentence structure and the laboriously label them for the system to understand your patterns.

Lucy focuses on a syntax which needs minimal changes to affect your desired outcome and maximizes the reuse.

Lucy's relationship to ML

The Lucy engine can be run on it's own and work perfectly fine, but it can also be coupled with existing LU engines such as LUIS or Orchestrator to great benefit.

  • External Entity Recognizer - Lucy can be used as a runtime source of external entities to a machine learning system such as LUIS.
  • Generate LU files - Lucy can be used to generate .LU files which can contain labeled examples and entity references.

Language support

There are 2 factors controlling language support in Lucy.

Tokenizers

Lucy uses off the shelf Lucene token analyzers to tokenize text into tokens. There are currently token analyzers for 29 lanuages:

  • Arabic Armenian Basque Catalan Chinese Czech Danish Dutch English Finnish French Galician German Greek Hindi Hungarian Chinese Japanese Korean Indonesian Irish Italian Latvian Norwegian Persian Portuguese Romanian Russian Spanish Swedish Turkish

Microsoft.Text.Recognizers

The Microsoft.Text.Recognizers libraries provide support for core entity types:

  • age boolean currency datetime dimension email guid hashtag ip mention number numberrange ordinal percentage phonenumber temperature url

Microsoft.Text.Recognizers support 15 languages

  • Arabic Bulgarian Chinese German English Spanish French Hindi Italian Japanese Korean Dutch Portuguese Swedish Turkish

See Microsoft.Text.Recognizers for a complete listing of languages and support.

lucy's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lucy's Issues

entity extraction question

hi,

i hope to extract entities from below examples:

  • what did A think of X => extract A and X
  • what did A think of X when discussing Y => extract A, X and Y

i tried several ways of writing the Lucy patterns but it looks i can't handle these two cases at the same time. whenever there is the pattern "what did (speaker:___)+3 think of (x:___)*", the pattern "what did (speaker:___)+3 think of (x:___)* when discussing (y:___)*" will be ignored. is there a way to resolve this? thanks!

details of trials:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.