Git Product home page Git Product logo

emo_lex's Introduction

emo_lex

Experiment runner for paper Cross-Lingual Emotion Lexicon Induction using Representation Alignment in Low-Resource Settings.

Usage

Setup

  1. Clone this repo
  2. Clone fastText, multilingual-nlm and vecmap.
  3. Install python packages numpy, cupy, torch and pot

Embedding Alignment

Some conventions:

  • Languages are referred to by their 3 letter ISO code.
  • Each bible file should be named <iso>.txt, e.g. Spanish would be spa.txt.
  • Each line in a Bible file should contain 1 sentence/verse. The text should be pre-processed to be lowercased and contain space-separated words (no punctuation unless it's in the middle of a word, e.g. hypenation).

The 3 algorithms used for embedding alignment:

Wasserstein-Procrustes

python align.py \
  --langs <space-separated list of languages> \
  --bible_dir <directory with Bibles> \
  --align_dir <directory to save aligned embeddings> \
  --emb_dir <directory to save initial unaligned fasttext embeddings> \
  --num_gpus <number of GPUs to use> \
  --algorithm fb \
  --fasttext_dir <path to fastText clone>

Neural Language Model

python align.py \
  --langs <space-separated list of languages> \
  --bible_dir <directory with Bibles> \
  --align_dir <directory to save aligned embeddings> \
  --num_gpus <number of GPUs to use> \
  --algorithm nlm \
  --nlm_dir <path to nlm clone> \
  --nlm_preproc_dir <directory to save preprocessed stuff like vocabulary>  \
  --nlm_preprocess <flag to not skip preprocessing, can remove after running once> \
  --nlm_modified <run modified version, not original>

Orthogonal Refinement

python align.py \
  --langs <space-separated list of languages> \
  --bible_dir <directory with Bibles> \
  --sid_bible_dir <directory with Bibles with first column having sentence ID> \
  --align_dir <directory to save aligned embeddings> \
  --emb_dir <directory to save initial unaligned fasttext embeddings> \
  --num_gpus <number of GPUs to use> \
  --algorithm vecmap \
  --vecmap_dir <path to vecmap clone> \
  --fasttext_dir <path to fastText clone>

The sentence ID Bibles are like the normal Bibles, except each line is prefixed with a sentence ID followed by a tab. Sentence ID for translations of the same sentence across different language should be the same.

Omit the --sid_bible_dir argument to run the original vecmap algorithm.

Emotion Lexicon Induction and Evaluation

python eval.py \
  --langs <space-separated list of languages> \
  --align_dir <directory with aligned embeddings from previous step> \
  --exp_id <experiment ID> \
  --trans_dir <directory with ground-truth word translations> \
  --emo_lex_dir <directory with ground-truth emotion lexicons> \
  --nns_dir <directory to save derived word translations to> \
  --reports_dir <directory to save evaluation reports and induced emotion lexicons>
  • The path provided to nns_dir and reports_dir is suffixed by exp_id so that multiple runs can share the same paths and just have different experiment IDs.
  • The ground-truth word translation files should be named <src_iso>_<tgt_iso>.txt, e.g. spa_eng.txt for Spanish-to-English translations. They should be tab-separated files, with the first column being a word in the source language and the second column being the translation of the word into the target language.
  • The ground-truth emotion lexicons should be named <iso>.txt. They can be obtained from the NRC EIL webpage.

To run only the lexicon induction and skip the evaluation:

python eval.py \
  --langs <space-separated list of languages> \
  --align_dir <directory with aligned embeddings from previous step> \
  --exp_id <experiment ID> \
  --nns_dir <directory to save derived word translations to> \
  --reports_dir <directory to save evaluation reports and induced emotion lexicons> \
  --skip_eval

emo_lex's People

Contributors

jhurricane96 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.