Git Product home page Git Product logo

attention_sezan's Introduction

Probable Implementation of Attention Mechanism for English to Bangla Translation

MD Muhaimin Rahman contact: sezan92[at]gmail[dot]com

In this project I have implemented -at least tried to implement- Attention Mechanism for Encoder-Decoder Deep Learning Network for English To Bangla Translation in keras. Neural Machine Translation is a case for Encoder Decoder network. An example is given in Jason Brownlee's blog . But this architecture had a problem for long sentences . Bahdanau et al. used Attention mechanism for Neural Machine Translation , in this paper.

Attention Mechanism

I have used one of the implementations from Luong Thang's phd thesis. From the paper, the model is as following attention_luong. Don't be afraid by the image! What attention layer does can be summarized in following points

  • Takes Input
  • Takes the Hidden state of Encode Input,
  • Takes the Hidden state of Previous Output
  • Derives a function with the two hidden state,
  • Derives a softmax function from that tanh function
  • Multiplies this softmax function with the hidden state of input
  • The attention work is done , the rest is like Decoder Architecture

Experiment

I have worked on English to Bengali (My mother tongue). I have used the data from this website , it has around 4500 data for English-Bangla translation . The result is not good, but for my first try with attention from scratch, I am happy with it! The [notebook](Attention Mechanism.ipynb) tries to explain the code. Bangla .

  • main_bengali_fh.py code for training model
  • attention_test_fh.py code for testing the model

Hardware

This architecture wasn't possible to train in my tiny laptop gpu of Geforce 920mx! Just look at the architecture I had to use floydhub cloud gpu which is NVIDIA Tesla K80.

Other Keras Implementations?

Not Many keras implementations are available in internet . I have found one from a brother Philip peremy , but I am doubtful about his implementation.

Philip Peremy

Philip Peremy tried to show keras Attention here. But his code looks problematic. His schematic is as follows, peremy_wrong . The problem with this implementation is that it doesn't take account to the hidden state of the decoder network! It just does it's thing from the encoder side! That's a kind of manual lstm!

Area of Improvement

  • My code is just in beta stage. I am not too much sure if My code is the perfect implementation of Attention Mechanism . So I request to the reader if any bugs or problems can be found, if yes-which is more likely-please feel free to contact!

  • And obviously, the translation is not upto standard! Please try to add the data and make more human-like.

Acknowledgement

I first came to know about Attention mechanism from Andrew Ng's video of Deep Learning Specialization , course 5. But I was not sure how to implement them. Then I read the two above mentioned papers. Also Jason Brownlee's article helped me to make sure what I have understood is correct or not.

I am grateful to Luong Thang and Bahdanau for their papers! I found Bahdanau's paper more readable than Luong's! Both helped me a lot.

attention_sezan's People

Contributors

sezan92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

attention_sezan's Issues

Runtime Error

When I give the command python main_bengali_fh.py

Traceback (most recent call last):
File "main_bengali_fh.py", line 14, in
import text_preprocess_utils_fh as tpu
ImportError: No module named text_preprocess_utils_fh

Runtime Error

 4737  I've forgotten her name.  আমি ওনার নাম ভুলে গেছি।
4738  I've forgotten her name.    আমি ওর নাম ভুলে গেছি।
4739  I've forgotten his name.   আমি তার নাম ভুলে গেছি।
4740  I've forgotten his name.  আমি ওনার নাম ভুলে গেছি।
4741  I've forgotten his name.    আমি ওর নাম ভুলে গেছি।
Adding start and end tag....
Traceback (most recent call last):
  File "main_bengali_fh.py", line 28, in <module>
    source_input,input_starter,output_encoded = tp.preprocess(max_words)
  File "/home/faisal/translation/text_preprocess_utils_fh.py", line 57, in preprocess
    row['Target']='<s> '+row['Target']+' <e>'
TypeError: Can't convert 'float' object to str implicitly

Result

how to get the accuracy graph of this project?
Thanks in Advance

Installation Problem

how can i install text_preprocess_utils_fh in my ubuntu environment? Please help me. Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.