Git Product home page Git Product logo

predom_sense's Introduction

This package contains scripts and python tools for learning predominant sense using HDP, a hierarchical topic model.

Directory Structure and Files

  • ComputeSenseRanking.py: program that computes sense ranking and predominant sense.
  • GenMacmillanSenses.py: program that generates Macmillan word distribution of senses.
  • GenWordnetSenses.py: program that generates WordNet word distribution of senses.
  • hdp_output: contains an example of the output generated by HDP.
  • lemmatiser_tools: contains OpenNLP and Morpha for tokenising and lemmatising the words when generating dictionary word distribution.
  • predom_data: contains example input files for running the program.
  • run_predom_sense.sh: script that drives the execution of the program.

Running the System

Prerequisites: -Wordnet -HDP topic model (https://github.com/jhlau/hdp-wsi) -Python lxml

  1. Create lemmas.txt: a text file that contains the target lemmas of interest.
  2. Run HDP to induce senses for the target lemmas.
  3. If using Macmillan dictionary, create a directory that contains xml files of word definitions. If using WordNet, make sure it is installed (i.e. "wn word -over" command needs to work).
  4. Set up the parameters in run_predom_sense.sh and execute the script!

Input Format

  • lemmas.txt: one line per lemma, in the format word.n (for nouns) or word.v (for verbs). E.g. "bank.n"

Licensing

Publications

  • Jey Han Lau, Paul Cook, Diana McCarthy, Spandana Gella and Timothy Baldwin (to appear). Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA.
  • Jey Han Lau, Paul Cook, Diana McCarthy, David Newman and Timothy Baldwin (2012). Word Sense Induction for Novel Sense Detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp 591โ€”601.

predom_sense's People

Contributors

jhlau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.