Git Product home page Git Product logo

wmt19-paraphrased-references's Introduction

Additional reference translations for the English-to-German WMT test set nestest2018, newstest2019 and newstest2020.

The contents of this repository are not an official Google product.

[Additional References] The sentences below are alternative reference translations for the WMT newstest20XX English-German test sets, produced through human translation or human paraphrasing. Automatic metrics like BLEU have been demonstrated to correlate better with human judgement when using these references than when using standard references. For details on data collection and how paraphrased references can improve the automatic evaluation of machine translation, see our paper below. Also, consider citing the paper if you are using this data for your research. Currently the repo contains additional references for newstes2018, newstest2019 and newstest2020:

  1. newstest2018 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

  2. newstest2019 AR An additional high quality reference translation.

  3. newstest2019 AR.p A paraphrased as-much-as-possible version of AR.

  4. newstest2019 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

  5. newstest2019 HQ(R) A combined reference from the original reference translation and AR. Per sentence, humans picked one of the two reference translations.

  6. newstest2019 HQ(P) A combined reference from WMT.p and AR.p. Per sentence, humans picked one of the two reference translations.

  7. newstest2019 HQ(all) A combined reference from WMT, AR, WMT.p, AR.p. Per sentence, humans picked one of the two reference translations.

  8. newstest2020 WMT.p A paraphrased as-much-as-possible version of the original WMT reference.

[Research Paper]

BLEU might be Guilty but References are not Innocent Markus Freitag, David Grangier, Isaac Caswell - EMNLP 2020.

@inproceedings{freitag-bleu-paraphrase-references-2020,
title={BLEU might be Guilty but References are not Innocent},
author={Markus Freitag and David Grangier and Isaac Caswell},
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
year={2020},
month={nov}
}

Human-Paraphrased References Improve Neural Machine Translation Markus Freitag, George Foster, David Grangier, Colin Cherry - WMT 2020

wmt19-paraphrased-references's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.