Git Product home page Git Product logo

antonymdetection's Introduction

AntonymDetection

Implementation of Word Embedding-based Antonym Detection using Thesauri and Distributional Information in NAACL2015

This implementation is based on GloVe and ivLBL, and uses the SpaceSaving algorithm. Please refer to the following papers for detail.

  • Masataka Ono, Makoto Miwa, and Yutaka Sasaki. "Word Embedding-based Antonym Detection using Thesauri and Distributional Information", NAACL/HLT-2015, 2015.
  • Metwally, Ahmed, Divyakant Agrawal, and Amr El Abbadi. "Efficient computation of frequent and top-k elements in data streams." Database Theory-ICDT 2005. Springer Berlin Heidelberg, 2005. 398-412.
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. "GloVe: Global Vectors for Word Representation.", EMNLP 2014, 2014.
  • Andriy Mnih and Koray Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS 26, 2013.

Prerequisites

  • Eigen 3.0 or more
  • gzstream
  • boost 1.55 or more
  • g++ 4.8 or more
  • 2GB or more RAM
  • 10GB or more disk (temporal files are stored in tmp/)
  • capable of running 10 or more threads (Please change the code if required.)

Compilation:

make

Usage:

Use a pretrained model

wget http://tti-coin.jp/data/antonym_model.tar.gz
tar xzf antonym_model.tar.gz
./TestAntonym model/100000.wc.bin model/we-td.model data/antonym/devset.txt

Training a new model

  1. Prepare Data

    • put gzipped text files (each sentence per line) into a directory (e.g., texts)
    • put (replace) antonym dictionary into a data/dict directory
  2. Count top-N frequence words (with MINIMUM_COUNT)

    ./WordCount TEXT N FREQFILE [MIN_COUNT]
    (e.g., ./WordCount texts/ 100000 100000.wc.bin 0)
    
  3. Count co-occurrences in a sentence

    ./SequenceCount DATA FREQFILE COOCCFILE WINDOW_SIZE WEIGHT
    (WEIGHT  2: ivlbl, 1: glove, 0: no weighting)
    (e.g., ./SequenceCount data 100000.wc.bin 100000.seq.bin 5 2) 
    
  4. Train Antonym model on co-occurrences and dictionary

    ./TrainAntonym FREQFILE COOCCFILE MODEL DIM ITER
    (e.g., ./TrainAntonym 100000.wc.bin 100000.seq.bin we-td.model 100 20)
    
  5. Test Antonym model

    ./TestAntonym FREQFILE MODEL TEST_FILE
    (e.g., ./TestAntonym 100000.wc.bin we-td.model data/antonym/devset.txt)
    

Notes

Please cite our paper when using this tool.

  • Masataka Ono, Makoto Miwa and Yutaka Sasaki. Word Embedding-based Antonym Detection using Thesauri and Distributional Information. In the Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT 2015). pp. 984--989, June 2015.

When using the resources under data, please check and follow the original license.

antonymdetection's People

Contributors

mmiwa avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.