Git Product home page Git Product logo

grammatical / neural-naacl2018 Goto Github PK

View Code? Open in Web Editor NEW
88.0 13.0 15.0 181 KB

Neural models and instructions on how to reproduce our results for our neural grammatical error correction systems from M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL 2018.

License: MIT License

Shell 16.53% Makefile 7.34% Perl 13.16% Python 62.97%

neural-naacl2018's Introduction

Approaching Neural GEC as a Low-Resource MT Task

This repository contains neural models and instructions on how to reproduce our results for our neural grammatical error correction systems from M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL 2018.

Citation

@InProceedings{neural-naacl2018,
    author    = {Junczys-Dowmunt, Marcin  and  Grundkiewicz, Roman  and  Guha,
                 Shubha  and  Heafield, Kenneth},
    title     = {Approaching Neural Grammatical Error Correction as a
                 Low-Resource Machine Translation Task},
    booktitle = {Proceedings of the 2018 Conference of the North American
                 Chapter of the Association for Computational Linguistics:
                 Human Language Technologies, Volume 1 (Long Papers)},
    month     = {June},
    year      = {2018},
    address   = {New Orleans, Louisiana},
    publisher = {Association for Computational Linguistics},
    pages     = {595--606},
    url       = {http://www.aclweb.org/anthology/N18-1055}
}

Models

We have prepared the top neural GEC system described in the paper that is an ensemble of four transformer models and a neural language model. Each translation model is pretrained with a language model and trained using edit-weighted MLE objective on NUCLE and Lang-8 data.

The systems are created using training settings that are very similar to those described in the paper. Small performance differences occur mainly due to the use of a more recent version of the Marian toolkit that comes with new features. The most noticable change is replacing averaged model checkpoints by exponential smoothing. Differences in less significant training hyperparameters might also exist. Other settings, including data and data preprocessing, remain exactly the same as in the original paper.

Content

  • models - pretrained neural models, instructions on how to use them, and scripts to evaluate the system on the CoNLL and JFLEG data sets
  • outputs - corrected outputs and evaluation scores for the CoNLL and JFLEG data sets generated by the prepared GEC system
  • training - complete training pipeline reproducing the prepared neural models

In case of any questions, please open an issue or send me (Roman) an email.

neural-naacl2018's People

Contributors

emjotde avatar kellymarchisio avatar snukky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural-naacl2018's Issues

BPE model

Hi,
I want to re-train the bpe model. So I wonder which dataset you used for training the bpe model and if you only use the source part, target part or both. Besides, what is the parameter you used, i.e., bpe_operations. Thanks!

Replication of the training process

Hello, I tried to re-trained the model twice following the readme file step by step but the M2 score was only 54.41 for CoNLL-14 test set. I found the recall is similar to the downloaded model but my re-trained model has a lower precision (either the ensemble or the single transformer model). Do you know which causes this issue?

Besides, I notice you mentioned the pre-training embeddings and edit-weighted loss in your paper. but I have not seen any parameters related to these techniques in the transformer.sh script. Where are these techniques used?

No License File

We intend to use this pipeline to re-create results from the source research paper. However, there is no LICENSE file to clarify usage of the code in the repository.

Could you please add a LICENSE file to this repository, to help make fair-use of the code clear?

Error evaluating test datasets

I'm trying to evaluate the test datasets on the pre-trained models as instructed in models/README.md.

However, when running bash evaluate.sh 0, I get an AssertionError saying that candidates, sources and gold_edits have different lengths. I found out that JFLEG works well but not CoNLL 2013 and 2014.

Anyone have any advice? I'm currently using CUDA 9.0, Python 3.6 and PyTorch 0.4.0. Here is the error I currently have:

CoNLL Test 2013
Traceback (most recent call last):
  File "./tools/remove_repetitions.py", line 62, in <module>
    main()
  File "./tools/remove_repetitions.py", line 16, in main
    ngrams = get_ngrams(words, args.max_size, args.min_freq)
  File "./tools/remove_repetitions.py", line 47, in get_ngrams
    for ngram in counts.keys():
RuntimeError: dictionary changed size during iteration
Traceback (most recent call last):
  File "/mnt/gwena/Gwena/IncompleteIntentionClassifier/baseline/gec_models/neural-naacl2018/models/tools/subword-nmt/subword_nmt/apply_bpe.py", line 373, in <module>
    args.output.write(bpe.process_line(line))
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "./tools/m2scorer/scripts/m2scorer.py", line 137, in <module>
    p, r, f1 = levenshtein.batch_multi_pre_rec_f1(system_sentences, source_sentences, gold_edits, max_unchanged_words, beta, ignore_whitespace_casing, verbose, very_verbose)
  File "/mnt/gwena/Gwena/IncompleteIntentionClassifier/baseline/gec_models/neural-naacl2018/models/tools/m2scorer/scripts/levenshtein.py", line 107, in batch_multi_pre_rec_f1
    assert len(candidates) == len(sources) == len(gold_edits)
AssertionError

Question regarding oversampling

  • In the Low-resource paper, you have mentioned that NUCLE was over-sampled 10 times for domain-adaptation for CoNLL-14 dataset.

  • I tried benchmarking the pre-trained models provided in this repo on the WI+LOCNESS test-set.

  • Single model gave an F-score of 34.15 whereas the ensemble of 4 models+reranking gave an F-score of 53.27. The ensemble gives fewer false positives than the single model leading to higher precision.

  • Metrics of single model on WI+LOCNESS test-set
    lrgec_single

  • Metrics of ensemble on WI+LOCNESS test-set
    lrgec_ensemble

  • Does oversampling on NUCLE data lead to a decrease in precision for the single model from 69-70 on CONLL-14 test set to 31.3 on WI+LOCNESS test-set?

Thanks!

Training

I tried to follow the training documentation. When I run the

cd data
make all
cd ..

It gives the following error.

python ../tools/spellcheck.py < ../tools/jfleg/dev/dev.src > jflegdev.err
Traceback (most recent call last):
  File "../tools/spellcheck.py", line 30, in <module>
    main()
  File "../tools/spellcheck.py", line 19, in main
    print(d.suggest(w)[0], end=" ")
IndexError: list index out of range
make: *** [Makefile:54: jflegdev.err] Error 1
make: *** Deleting file 'jflegdev.err'

I have the file dev.src in tools/jfleg/dev/

I'm using Python 2 for this experiment as it seems it does not support Python 3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.