Git Product home page Git Product logo

2-crf-mwe's Introduction

This is the implementation of the double chained CRF used for predicting Multiword Expressions (MWE) and supersenses.

UW-CSE at SemEval-2016 Task 10: Detecting multiword expressions and supersenses using double-chained conditional random fields. Mohammad Javad Hosseini, Noah A. Smith, and Su-In Lee. In Proceedings of the NAACL Workshop on Semantic Evaluations (SemEval 2016), San Diego, CA, June 2016.

We participated at the SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM). Our submitted models ranked first overall in the competition.

We have implemented a Conditional Random Field and a Double-Chained Conditional Random Field model for joint learning of multiword expressions and supersenses.

The feature extraction is based on AMALGrAM 2.0 (A Machine Analyzer of Lexical Groupings And Meanings) and the dependencies are the same as AMALGrAM 2.0.

Software

  • Python 2.7
  • Cython (tested on 0.21.1)
  • NLTK 3.0.2+ with the WordNet resource installed

Running:

After downloading the code, given the above softwares are installed, you can run the code from the scripts folder to replicate the paper's results and/or test on new data. (best model: Double_CRF_open.sh)

Tagging Scheme

Multiword Expressions:

The annotation for MWEs extends the conventional BIO scheme to include gappy MWEs with one level of nesting. Segmentations are represented using six tags; the lower-case variants indicate that an expression is within another MWE’s gap.

-- O and o: single word expression -- B and b: the first word of a MWE -- I and i: a word continuing a MWE

Supersenses:

Each noun or verb expression is also annotated with a supersense; there are 26 supersenses for nouns and 15 for verbs. Only the first word of a MWE receives a supersense tag.

The input must be sentence and word tokenized and part-of-speech tagged (with the Penn Treebank POS tagset).

Please refer to dimsum-data-1.5/TAGSET.md for more details.

Data:

The datasets are in the folder dimsum-data-1.5. There is a readme file in the folder explaining the format. For prediction on new data, input should be formatted as described there. Our original submission is in the folder submitted_results.

Please email the first author ([email protected]) in case of any questions and/or requests.

2-crf-mwe's People

Contributors

mjhosseini avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.