Git Product home page Git Product logo

corefud-baseline's Introduction

A simplified multilingual coreference baseline model for CorefUD

Based on https://github.com/ufal/crac2022-corpipe

Setup

  1. Fetch data.
    cd data
    chmod +x get.sh
    ./get.sh
    • If you wish to reduce the number of languages, either edit the get.sh file accordingly, or delete/move folders.
  2. Convert all data to jsonlines (if needed)
    cd data_handling
    chmod +x corefud_convert.sh
    ./corefud_convert.sh
  3. Train the model in src/models/simple-corpipe. Example:
    cd src/models/simple-corpipe
    python train.py --langs germanic
    # or with a specific language, e.g. no_bokmaalnarc:
    python train.py --langs no_bokmaalnarc
    This will run the "germanic languges" found in the following definitions:
    romance_langs = "ca_ancora es_ancora fr_democrat".split()
    germanic_langs = "de_parcorfull de_potsdamcc en_gum en_parcorfull no_bokmaalnarc no_nynorsknarc".split()
    slavic_baltic_langs = "cs_pcedt cs_pdt pl_pcc lt_lcc ru_rucor".split()
    urgic_turkic_langs = "hu_korkor hu_szegedkoref tr_itcc".split()
    
    langs_dict = {
        "romance": romance_langs,
        "germanic": germanic_langs,
        "slavic": slavic_baltic_langs,
        "urgic": urgic_turkic_langs,
        "all": langs,
    }
    
    Omitting any args will default to "all", which requires all languages in the data folder.

Training configuration:

Argument Default Type Description
--langs [] List[str] Languages to train on.
--batch_size 16 int Batch size.
--bert xlm-roberta-base str Bert model.
--debug False bool Debug mode.
--epochs 10 int Number of epochs.
--exp run str Exp name.
--label_smoothing 0.0 float Label smoothing.
--learning_rate 2e-5 float Learning rate.
--learning_rate_decay False bool Decay LR.
--max_links None int Max antecedent links to train on.
--right 50 int Reserved space for right context, if any.
--seed 42 int Random seed.
--segment 512 int Segment size.
--train [] List[str] Additional train data.
--warmup 0.1 float Warmup ratio.

corefud-baseline's People

Contributors

tollefj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.