Git Product home page Git Product logo

odinson's Introduction

Odinson

Odinson can be used to rapidly query a natural language knowledge base and extract structured relations. Query patterns can be designed over (a) surface (e.g. #1), syntax (e.g., #2), or a combination of both (e.g., #3-5). These examples were executed over a collection of 8,479 scientific papers, corresponding to 1,105,737 sentences. Please note that the rapidity of the execution allows a user to dynamically develop these queries in real-time, immediately receiving feedback on the coverage and precision of the patterns at scale.

Project overview

Odinson supports several features:

  • Patterns over tokens, including boolean patterns over token features like lemma, POS tags, NER tags, chunk tags, etc
  • Patterns over syntax by matching paths in a dependency graph. Note that this actually agnostic to the tags in the graph edges and it could be repurposed for matching over semantic roles or something else.
  • Named captures for extracting the different entities involved in a relation

And there are many more on the way:

  • Testing framework and extensive test suite
  • Better error messages
  • Better support for greedy and lazy quantifiers
  • Lookaround assertions
  • Support for an internal state to hold intermediate mentions, allowing for the application of patterns in a cascade
  • Support for grammars (similar to odin)
  • Filtering results by document metadata (e.g., authors, publication date)

We would also love to hear any questions, requests, or suggestions you may have.

It consists of several subprojects:

  • core: the core odinson library
  • extra: these are a few apps that we need but don't really belong in core, for example, licensing issues
  • backend: this is a REST API for odinson
  • ui: this is a webapp that we are building to interact with the system and visualize results to enable rapid development

The three apps in extra are:

  • AnnotateText: parses text documents using processors
  • IndexDocuments: reads the parsed documents and builds an odinson index
  • Shell: this is a shell where you can execute queries (we will replace this with the webapp soon)

Examples

We have made a few example queries to show how the system works. For this we used a collection of 8,479 scientific papers (or 1,105,737 sentences). Please note that the rapidity of the execution allows a user to dynamically develop these queries in real-time, immediately receiving feedback on the coverage and precision of the patterns at scale.

Example of a surface pattern for extracting casual relations.

This example shows odinson applying a pattern over surface features (i.e., words) to extract mentions of causal relations. Note that Odinson was able to find 3,774 sentences that match the pattern in 0.18 seconds.

example 1

Example of a doubly-anchored Hearst pattern to extract hypernymy (i.e., X isA Y)

This example shows how Odinson can also use patterns over syntax. In this case it tries to find hypernym relations. It finds 10,562 matches in 0.37 seconds.

example 2

Example of how a surface pattern can be extended (using syntax patterns) to extract additional contextual information.

This example shows how surface and syntax can be combined in a single pattern. This pattern finds 12 sentences that match in our corpus of 1,105,737 sentences. It does this in 0.01 seconds.

example 3

Example of a causal pattern written over dependency syntax with a lexical trigger (i.e., cause).

This example shows how we can match over different aspects of tokens, lemmas in this example. Note that the ability to utilize syntax helps with the precision of the extractions (as compared with the overly simple surface rule above). Odinson finds 5,489 matches in 0.18 seconds.

example 4

Example of how more complex patterns can be developed, for example, to extract the polarity of a causal influence and a context in which it applies.

This is an example of a slightly more complex pattern. Odinson is able to apply it over our corpus and finds 228 matches in 0.04 seconds.

example 5

Web UI

We are also working on a web interface that will simplify debugging by displaying more information than the shell. This interface will allow us to display syntactic information when needed. We would also like to be able to interact with it to correct extractions or bootstrap patterns.

example 6

odinson's People

Contributors

marcovzla avatar myedibleenso avatar

Watchers

Hillel Taub-Tabib avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.