Git Product home page Git Product logo

natalia-khaidanova-machine-translation-evaluation's Introduction

Machine-Translation Evaluation: Comparing Traditional and Neural Machine-Translation Evaluation Metrics for English→Russian

Machine translation (MT) has become increasingly popular in recent years due to advances in technology and growing globalization. As the quality of MT continues to improve, more and more companies are turning to this method over human translation to save time and money. However, the increasing reliance on MT has also highlighted the need for automatic evaluation algorithms that can accurately measure its quality. Developing such algorithms is essential in ensuring that MT can effectively meet the needs of businesses and individuals in the global marketplace, as well as in comparing different MT systems against each other and tracking their improvements over time. MT evalaution metrics are an indispensable component of these automatic evaluation algorithms.

This repository is part of the thesis project for the Master's Degree in "Linguistics: Text Mining" at the Vrije Universiteit Amsterdam (2022-2023). The project focuses on replicating selected research conducted at the WMT21 Metrics Shared Task. The replication involves evaluating the traditional (SacreBLEU, TER, CHRF2) and best-performing reference-based (BLEURT-20, COMET-MQM_2021) and reference-free (COMET-QE-MQM_2021) neural metrics. The evaluation is conducted across two domains: news articles and TED talks translated from English into Russian. By examining the performance of these metrics, we aim to understand their effectiveness and suitability in different translation contexts. Furthermore, the thesis project goes beyond the initial evaluation and explores the applicability of reference-free neural metrics, with a particular focus on COMET-QE-MQM_2021, for professional human translators. This extended evaluation is performed on a distinct domain, namely scientific articles. The articles are translated in the same direction as the primary data.

Creator: Natalia Khaidanova

Supervisor: Sophie Arnoult

Content

\Data

The Data folder contains:

Files:

  • all_TED_data.tsv stores all source sentences, reference translations, and MTs presented at the WMT21 Metrics Task for the TED talks domain.

  • all_news_data.tsv stores all source sentences, reference translations, and MTs presented at the WMT21 Metrics Task for the news domain.

  • create_data_files.py creates all_TED_data.tsv and all_news_data.tsv files, converts the WMT21 Metrics Task human judgments per type (MQM, raw DA, and z-normalized DA) and domain (news and TED talks) into .tsv files. The files are stored in human_judgments_seg (segment-level human judgments) and human_judgments_sys (system-level human judgments).

Subfolders:

\eval

The eval folder contains:

Files:

  • get_nr_annotations.py checks the number of annotated segments in the WMT21 Metrics Task data per type of human judgment (MQM, raw DA, or z-normalized DA).

  • seg_eval.py runs a segment-level evaluation of the implemented neural (BLEURT-20, COMET-MQM_2021, and COMET-QE-MQM_2021) and traditional (SacreBLEU, TER, and CHRF2) metrics.

  • sys_eval.py runs a system-level evaluation of the implemented neural (BLEURT-20, COMET-MQM_2021, and COMET-QE-MQM_2021) and traditional (SacreBLEU, TER, and CHRF2) metrics.

Subfolders:

  • human_judgments_seg stores segment-level human judgment scores of each type (MQM, raw DA, or z-normalized DA) in separate .tsv files. The scores are presented for both news and TED talks.

  • human_judgments_sys stores system-level human judgment scores of each type (MQM, raw DA, or z-normalized DA) in separate .tsv files. The scores are presented for both news and TED talks.

\metrics

The metrics folder contains:

Files:

\reference-free_eval

The reference-free_eval folder contains:

Files:

  • COMET-QE-MQM_2021.py computes segment- and system-level scores of the reference-free neural metric COMET-QE-MQM_2021 on the additional data comprising two scientific articles (Baby K and A Beautiful Mind). The metric evaluates both human and machine translations. Note that the source sentences and their human translations were added to the files manually.

  • add_opus_mt_translations.py adds MTs produced by the opus-mt-en-ru MT system to the data comprising two scientific articles (Baby K and A Beautiful Mind).

  • get_mean_length.py counts the mean character length of the source sentences and their human translations in the Baby K and A Beautiful Mind articles.

Subfolders:

  • Data contains two scientific articles (Baby K and A Beautiful Mind), each comprising English source sentences, their corresponding Russian human translations and MTs produced by the opus-mt-en-ru MT system. The files were created with the aim of evaluating the applicability of reference-free neural metrics, specifically COMET-QE-MQM_2021, for professional human translators. The subfolder also stores the segment- and system-level scores produced by COMET-QE-MQM_2021 for both human and machine translations.

requirements.txt

The requirements.txt file contains information about the packages and models required to run and evaluate the implemented traditional (SacreBLEU, TER, and CHRF2) and neural (BLEURT-20, COMET-MQM_2021, and COMET-QE-MQM_2021) metrics. It also lists additional packages needed to run all the .py files in the repository.

Natalia_Khaidanova_Thesis.pdf

The Natalia_Khaidanova_Thesis.pdf file contains the thesis report outlining the results of the research.

References

natalia-khaidanova-machine-translation-evaluation's People

Contributors

nataliakhaidanova avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.