Git Product home page Git Product logo

jfleg's Introduction

JFLEG (JHU FLuency-Extended GUG) corpus

Last updated: December 7th, 2018

(Make sure to download and use the latest version.)

link to the paper


Data

.
├── EACL_exp      # experiments in the EACL paper
│   ├── m2converter # script to create m2 format from plain texts
│   ├── mturk     # mechanical turk experiments
│   │   ├── sample.csv
│   │   ├── pairwise.csv
│   │   └── template.html
│   └── manual_eval # manual analysis of 100 sentences
│       ├── README.md
│       └── coded_sentences.csv
├── README.md     # This file
├── EACLshort037.pdf
├── dev           # dev set (754 sentences originally from the GUG **test** set)
│   ├── dev.ref0
│   ├── dev.ref1
│   ├── dev.ref2
│   ├── dev.ref3
│   ├── dev.spellchecked.src (spellchecked by enchant)
│   └── dev.src   # source (This should be the input for your system.)
├── eval
│   └── gleu.py   # evaluation script (sentence-level GLEU score)
└── test          # test set (747 sentenses ogirinally from the GUG **dev** set)
    ├── test.ref0
    ├── test.ref1
    ├── test.ref2
    ├── test.ref3
    ├── test.spellchecked.src (spellchecked by enchant)
    └── test.src   # source (This should be the input for your system.)

Evaluation

e.g. python ./eval/gleu.py -r ./dev/dev.ref[0-3] -s ./dev/dev.src --hyp YOUR_SYSTEM_OUTPUT

This returns the mean, standard deviation, and confidence interval.

Leader Board (published results)

N.B. Sytems with asterisk (*) are tuned on different data.

System GLEU (dev) GLEU (test)
Coyne et al. (2023) 60.10* 65.02
Ge et al. (2018) N/A 62.42
Liu et al. (2021) N/A 61.61
Grundkiewicz and Junczys-Dowmunt (2018) N/A 61.50
Junczys-Dowmunt et al. (2018) N/A 59.90
Chollampatt and Ng (2018) 52.48 57.47
Chollampatt and Ng (2017) 51.01 56.78
Xie et al. (2018)* N/A 56.20
Sakaguchi et al. (2017) 49.82 53.98
Ji et al. (2017)* 48.93 53.41
Yuan and Briscoe (2016)* 47.20 52.05
Junczys-Dowmunt and Grundkiewicz (2016) 49.74 51.46
Chollampatt et al. (2016)* 46.27 50.13
Felice et al. (2014)* 42.81 46.04
=================================== ========== ==========
SOURCE 38.21 40.54
REFERENCE 55.26 62.37
  • If you want to add your score, please send an e-mail to keisukes[at]allenai.org a link to your paper and system outputs.
  • The reference scores are computed by averaging each reference.

Reference

The following paper should be cited in any publications that use this dataset:

Courtney Napoles, Keisuke Sakaguchi and Joel Tetreault. (EACL 2017): JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain. April 03-07, 2017.

Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mulholland, and Joel Tetreault. (ACL 2014): Predicting Grammaticality on an Ordinal Scale. In Proceedings of the Association for Computational Linguistics. Baltimore, MD, USA. June 23-25, 2014.

bibtex information:

@InProceedings{napoles-sakaguchi-tetreault:2017:EACLshort,
  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Tetreault, Joel},
  title     = {JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction},
  booktitle = {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {229--234},
  url       = {http://www.aclweb.org/anthology/E17-2037}
}

@InProceedings{heilman-EtAl:2014:P14-2,
  author    = {Heilman, Michael  and  Cahill, Aoife  and  Madnani, Nitin  and  Lopez, Melissa  and  Mulholland, Matthew  and  Tetreault, Joel},
  title     = {Predicting Grammaticality on an Ordinal Scale},
  booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
  month     = {June},
  year      = {2014},
  address   = {Baltimore, Maryland},
  publisher = {Association for Computational Linguistics},
  pages     = {174--180},
  url       = {http://www.aclweb.org/anthology/P14-2029}
}

Questions

  • Please e-mail Courtney Napoles (napoles[at]cs.jhu.edu) and Keisuke Sakaguchi (keisuke[at]cs.jhu.edu).

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

jfleg's People

Contributors

cnap avatar kaashish avatar keisks avatar snukky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jfleg's Issues

Add a paper to the leader board

  1. I think you should add this paper to the leaderboard: https://arxiv.org/pdf/1807.01270.pdf
    It only reports JFLEG test GLEU, not dev, but it is beating the reference score.

  2. Just so I understand better, I see that this repo includes dev and test data, but not train data. Where can I find train data? or, do I need to train on a different source and only evaluate on this data?

Thanks

Add label on the GLEU.py script output

I tested the gleu.py script and got these 4 values. I know that the last 2 values (0.647,0.649) are confidence values. However, I'm not sure about the first 2 values which is (0.668878 and 0.010937). Upon checking the script, i know that one of them is the average but i don't know which is which. I've been reading the docs and ReadME files and I can't find any information on how to decipher the 4 output value. It would be better if you can add a label besides the values or update the docs regarding the 4 output values of GLEU.py script. Thank you

RuntimeWarning: invalid value encountered in multiply...

Most times I run this script (though not every time), I get:
.../python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1910: RuntimeWarning: invalid value encountered in multiply lower_bound = self.a * scale + loc
.../python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1911: RuntimeWarning: invalid value encountered in multiply upper_bound = self.b * scale + loc
[['0.584945', '0.000000', '(nan,nan)']]
due to std passed to scipy.stats.norm.interval(0.95, loc=mean, scale=std) in get_gleu_stats being 0.0 (because the scores are either all the same, or a list of size 1).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.