Git Product home page Git Product logo

lexcomp's Introduction

Evaluating Text Representations on Lexical Composition

Dependencies

Downloading Data:

Download the pre-trained models using bash download.sh.

The VPC classification and LVC classification tasks need a copy of the BNC corpus. Please download the XML version from here, and update its path in the JSON files.

Once you do, you will need to extract the sentences themselves:

python preprocessing/get_sentences_from_bnc.py \ 
    [/path/to/corpora]/bnc/2554/download/Texts/ \ 
    diagnostic_classifiers/data/vpc_classification/ \ 
    diagnostic_classifiers/data/vpc_classification

Running experiments:

To train all the models for a given task, e.g. NC literality, run:

bash diagnostic_classifiers/experiments/nc_literality/train.sh 

To evaluate:

bash diagnostic_classifiers/experiments/nc_literality/evaluate.sh

To get the predictions for the test set:

bash diagnostic_classifiers/experiments/nc_literality/predict.sh

Adding a new task:

You will need to create a directory under experiments with the JSON files specifying the architecture and hyper-parameters. Each model requires a DatasetReader, Model, and a Predictor. You can use the ones implemented in this repository or implement new ones according to the specific model's needs.

See the AllenNLP tutorial for additional instructions on configuring models.

If you'd like to create new data, follow the preprocessing instructions.

Adding a new representation:

You will need to implement a new TokenIndexer and TokenEmbedder or TextFieldEmbedder. The first takes a sequence of words and returns their IDs, and the second gets the IDs and returns the vectors. Look at the implementations in this repository and in the AllenNLP repository, and read the documentation there.

You will also need to add a JSON file for the task + representation combination and add the command to the train/evaluate/predict bash files.

Citation

Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition

Vered Shwartz and Ido Dagan. arXiv 2019.

lexcomp's People

Contributors

vered1986 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.