Git Product home page Git Product logo

ckd_nmt_distillation's Introduction

STEPS 17th Project

Following exploratory work was presented at CS6101 Module Projects Display at 17th STEP

Combining Intermediate Layers for Knowledge Distillation in Neural Machine Translation Models for Japanese -> English

This project investigates the newly introduced technique to combine intermediate layers rather than skipping while performing knowledge distillation of NMT Models. The language pair investigated is Japanese->English using the recently published work by Yimeng Wu et. al. for Portuguese->English, Turkish->English, and English->German. They were able to distill similar performance with a 50% reduction in parameters. Their results and paper can be referred at the following link: Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers We use JParacrawl for our investigation and the source code from Yimeng's work.

Our Results

Following are the results for English --> Japanese based on a training corpus of 2.6 million sentences from JParacrawl.

MODELS BLEU SCORES
Teacher 23.1
Regular KD 20.3
PKD 19.3
Regular COMB 19.7
Overlap COMB 19.7
Skip COMB 19.4
Cross COMB 19.6

Discussions

Based on our experiments, we don’t notice any improvement over regular knowledge distillation RKD in case of any combination-based distillation variant as shown in figure above, though there is minor improvement over Patient KD which skips some of the layers in all COMB approaches. Possible reasons for our observation: Extensive hyperparameter optimization was not done, which could be one reason for the obtained performance. So, more experiments to be done to make any conclusions. Human Evaluation is not done, and BLEU can’t be relied solely for evaluating models.

Requirements

Check README_CKD_Original.md

Acknowledgement

This repo is exploration based on original source at CKD_PyTorch which is the original implementation of the paper Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers, Yimeng Wu, Peyman Passban, Mehdi Rezagholizadeh, Qun Liu at Proceedings of EMNLP, 2020.

ckd_nmt_distillation's People

Contributors

sukuya avatar yimeng0701 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

lilujunai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.