Git Product home page Git Product logo

liangkai / uncc2014watsonsim Goto Github PK

View Code? Open in Web Editor NEW

This project forked from seantater/uncc2014watsonsim

0.0 2.0 0.0 134 MB

Open-domain question answering system from UNC Charlotte

Home Page: http://blog.watsonphd.com

License: GNU General Public License v2.0

Shell 0.73% Makefile 0.98% C 23.69% Python 6.15% HTML 0.94% JavaScript 0.52% CSS 0.44% PLSQL 0.46% Java 65.52% Prolog 0.11% Scala 0.45%

uncc2014watsonsim's Introduction

Watsonsim Question Answering System Build Status

Quick Intro

Watsonsim works using a pipeline of operations on questions, candidate answers, and their supporting passages. In many ways it is similar to IBM's Watson, and Petr's YodaQA. It's not all that similar to more logic based systems like OpenCog or Wolfram Alpha. But there are significant differences even from Watson and YodaQA.

  • We don't use a standard UIMA pipeline, which is a product of our student-project history. Sometimes this is a hindrance but typically it has little impact. We suspect it reduces the learning overhead and boilerplate code.
  • Unlike YodaQA, we target Jeopardy! questions, but we do incorporate their method of Lexical Answer Type (LAT) checking, in addition to our own.
  • Our framework is rather heavyweight in terms of computation. Depending on what modules are enabled, it can take between about 1 second and 2 minutes to answer a question. We use Indri to improve accuracy but it is now an optional feature that we highly recommend. (We are investigating alternatives as well.)
  • We include (relatively) large amounts of preprocessed article text from Wikipedia as our inputs. Be prepared to use about 100GB of space if you want to try it out at its full power.

Installing the Simulator

  • Use git to clone this repository, as in: git clone https://github.com/SeanTater/uncc2014watsonsim.git
  • Install Java 8, either:
  • libSVM machine learning library (native)
  • Download Gradle (just unzip it; keep in mind it updates very often)
  • Download the latest data and place them in the data/ directory
  • Copy the configuration file config.properties.sample to config.properties and customize to your liking
  • Run gradle eclipse -Ptarget in uncc2014watsonsim/ to download platform-independent dependencies and create an Eclipse project.
  • Possibly enable some Optional Features

Running the Simulator

We recommend running the simulator with Gradle:

gradle run -Ptarget=WatsonSim

But, if you prefer, you can also use Eclipse. First create a project.

gradle eclipse -Ptarget

Then you can run WatsonSim.java directly.

There are a few other features as well

# Generate statistics reports for accuracy and other measurements
gradle run -Ptarget=scripts.ParallelStats
# Regenerate the Indri, Lucene, SemanticVectors, Bigram and Edge indices
gradle run -Ptarget=index.Reindex

Technologies Involved

This list isn't exhaustive, but it should be a good overview

  • Search
    • Text search from Lucene and Indri (Terrier upcoming)
    • Web search from Bing (Google is in the works)
    • Relational queries using PostgreSQL and SQLite
    • Linked data queries using Jena
  • Sources
    • Text from all the articles in Wikipedia, Simple Wikipedia, Wiktionary, and Wikiquotes
    • Linked data from DBPedia, used for LAT detection
    • Wikipedia pageviews organized by article
    • Source, target, and label from all links in Wikipedia
  • Machine learning with Weka and libSVM
  • Text parsing and dependency generation from CoreNLP and OpenNLP
  • Parsing logic in Prolog (with TuProlog)

Notes:

  • You should probably consider using PostgreSQL if you scale this project to more than a few cores, or any distributed environment. It should support both engines nicely.
  • The data is sizable and growing, especially for statistics reports; 154.5 GB as of the time of this writing.
  • Can't find libindri-jni? Make sure you enabled Java and SWIG and had the right dependencies when compiling Indri.

Tools

uncc2014watsonsim's People

Contributors

adaava avatar bhavnasiyeshvant avatar csteph16 avatar dhaval257 avatar ipate258 avatar jagan120 avatar kenoverholt avatar pavan27 avatar rahulpedduri avatar seantater avatar thestephenstanton avatar unimpossible avatar varshadevadas avatar walid-shalaby avatar wlodz avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.