Git Product home page Git Product logo

question-answering-mx's Introduction

Crime and Punishment

A machine for federal crime sentences

By Carlos Grandet & Hector Salvador

Objective

Make an automatic question-answering program that finds criminal and administrative sentences (e.g. jail years, fines, political disablement) in Mexican federal laws.

Data sources

  • Active federal laws (~300, http://www.diputados.gob.mx/LeyesBiblio/index.htm). These are available as word files, and can be easily scraped from the website.
  • We will explore using sources such as Yahoo! Answers, online newspapers/news blogs, and other websites with relevant information (e.g. specialized websites on law)

Expected original work

According to Jurafsky and Martin (2016), there are three stages for question answering: question processing (decomposing the question into a query that can be then ?asked? to the documents), passage retrieval (take que query and the corpora, find relevant documents, find relevant passages, and pass these to the next step), and answer processing (returning the retrieved passage into a human-readable result).

Thus our project will do the following:

  1. Data generation
  • Scraper for the Congress website. A program that processes documents into pure text (e.g. remove headers, remove old decrees, remove page footnotes and numbering).
  1. Query processing
  • We will create a function that transforms a question into a query to apply Information Retrieval algorithms. We will limit it to questions related to fines and jail times and if there are no matches, suggest an alternative query.
  • Question classification should be very similar for all our queries. There exist some answer type taxonomies that can be built semi-automatically (Wordnet) but they can also be built manually.
  1. Passage retrieval
  • Use information retrieval techniques (e.g. DF-IDF) to obtain the relevant sections, articles, and/or paragraphs.
  • Thesaurus generation - Clustering words that are close enough in the text.
  1. Answer processing
  • Function that uses regular expressions and a trained model of answers to indicate if an answer is the right type of answer (year, fees, etc)

Evaluation of results

We will use Precision/Recall curves and Mean Average Precision to evaluate if an answer is the type of answer we are expecting and if it is correct. To do so, we will create some questions with answers to test our machine.

We will also use the Mean Reciprocal Rank, which is commonly used to evaluate the relevance of ranked results (according to the predicted relevance).

Timeline

Week 5:

  • Generate scraper and store data (Carlos)
  • Design IR methodology for passage retrieval (Hector)
  • Design algorithms for query formulation and answer processing (Hector)

Week 6 :

  • Generate thesaurus of related concepts to penalties and jail sentences (Carlos)
  • Implement and test function for question processing (Hector)

Mid-quarter check-in

  • Show working functions for question processing, show corpus of concepts related to penalties in the Mexican law. (Carlos & Hector)

Week 7 - 8 :

  • Use IR algorithms to retrieve queries from Mexican laws related to penalties and sentences (Carlos)
  • Implement and test function that processes answers (Hector)
  • Generate testing set of questions. (Carlos & Hector)

Week 9 - 10

  • Test and evaluate model and tune in through extensions (Carlos & Hector)
  • User Interface* (if time permits)

Final presentation

  • Present a program that asks a series of predetermined and non-predetermined questions and provides an answer. (Carlos & Hector)

Expected libraries that will be used

  • Shell scripts
  • BeautifulSoup + Requests (for scraping laws)
  • Pandas (Feature preprocessing)
  • Scikit-learn (for model cross-validation and other out-of-the-box ML tools)
  • Possibly others for nlp (e.g. nltk)

question-answering-mx's People

Contributors

hectorsalvador avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.