Git Product home page Git Product logo

ffr-v1's Introduction

FFR: Fon-French Neural Machine Translation

Towards developing a Robust Translation Model for African languages: Pilot Project FFR v1.0.

"FFR v1.0" is the first stage of a Fon-French translation model project, trained on https://github.com/bonaventuredossou/ffr-v1/tree/master/FFR-Dataset using neural machine translation with attention. While it could be observed that Masakhane https://www.masakhane.io/ (https://twitter.com/MasakhaneMt) , an online community of African researchers working on machine translation for African languages, have generated translation models and baselines from/to many African languages, however, the "Project FFR v1.0” is the first to make this effort on a large scale, by taking time to painstakingly amass a large training dataset and exploring techniques to work with the Fon diacritics for better translation accuracy in order to achieve a publishable model which may be used by people to a certain degree of reliability.

Part of the research methodology used by the researchers in sourcing the data for this research includes rigorous compilation through “web-scraping” and “parsing” open source dataset websites. Through these efforts, we obtained 53,975 Fon-French parallel words and sentences, which we used for the pilot stage. Furthermore, the dataset was specially cleaned, pre-processed and tokenized, preserving the diacritics and special characters of the Fon alphabet. The owners of the website were contacted and permission was granted to collect the data on their website.

FFR v1.0 was trained for 5 days, using the Paperspace cloud computation virtual machine and the code for the model was inspired from [1] and [2], with our added contributions to address the Fon diacritics.

[1] : Deep Learning for NLP, Jason Brownlee - Section 9 : Machine Translation [2] : Tensorflow Tutorial on Neural Machine Translation with Attention Mechanism : https://www.tensorflow.org/tutorials/text/nmt_with_attention

The project has been led so far by the edAI (https://twitter.com/edAIOfficial) researchers : Chris EMEZUE (https://twitter.com/ChrisEmezue) and Bonaventure DOSSOU (https://twitter.com/bonadossou) .

Our work gave us overall BLEU and GLUE score respectively of 30.55 and 18.18 . Make sure you check out https://github.com/bonaventuredossou/ffr-v1/blob/master/model_train_test/testing_bleu_gleu_scores.txt for more details.

The model training and the bleu score distribution along the test dataset plots were provided too. All the results and summary about the model and its architecture are available in the repository FFR pdf file.

We are opened for collaboration to improve the current model and gather more data.

We have finally released the website at https://www.ffrtranslate.com/

ffr-v1's People

Contributors

bonaventuredossou avatar chrisemezue avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.