Git Product home page Git Product logo

bytenet-tensorflow's Introduction

byteNet-tensorflow

Join the chat at https://gitter.im/byteNet-tensorflow/Lobby

This is a tensorflow implementation of the byte-net model from DeepMind's paper Neural Machine Translation in Linear Time.

From the abstract

The ByteNet decoder attains state-of-the-art performance on character-level language modeling and outperforms the previous best results obtained with recurrent neural networks. The ByteNet also achieves a performance on raw character-level machine translation that approaches that of the best neural translation models that run in quadratic time. The implicit structure learnt by the ByteNet mirrors the expected alignments between the sequences.

ByteNet Encoder-Decoder Model:

Model architecture

Image Source - Neural Machine Translation in Linear Time paper

The model applies dilated 1d convolutions on the sequential data, layer by layer to obain the source encoding. The decoder then applies masked 1d convolutions on the target sequence (conditioned by the encoder output) to obtain the next character in the target sequence.The character generation model is just the byteNet decoder, while the machine translation model is the combined encoder and decoder.

Implementation Notes

  1. The character generation model is defined in ByteNet/generator.py and the translation model is defined in ByteNet/translator.py. ByteNet/ops.py contains the bytenet residual block, dilated conv1d and layer normalization.
  2. The model can be configured by editing model_config.py.
  3. Number of residual channels 512 (Configurable in model_config.py).

Requirements

  • Python 2.7.6
  • Tensorflow 1.2.0

Datasets

  • The character generation model has been trained on Shakespeare text. I have included the text file in the repository Data/generator_training_data/shakespeare.txt.
  • The machine translation model has been trained for german to english translation. You may download the news commentary dataset from here http://www.statmt.org/wmt16/translation-task.html

Training

Create the following directories Data/tb_summaries/translator_model, Data/tb_summaries/generator_model, Data/Models/generation_model, Data/Models/translation_model.

  • Text Generation

    • Configure the model by editing model_config.py.
    • Save the text files to train on, in Data/generator_training_data. A sample shakespeare.txt is included in the repo.
    • Train the model by : python train_generator.py --text_dir="Data/generator_training_data"
    • python train_generator.py --help for more options.
  • Machine Translation

    • Configure the model by editing model_config.py.
    • Save the source and target sentences in separate files in Data/MachineTranslation. You may download the new commentary training corpus using this link.
    • The model is trained on buckets of sentence pairs of length in mutpiples of a configurable parameter bucket_quant. The sentences are padded with a special character beyond the actual length.
    • Train translation model using:
      • python train_translator.py --source_file=<source sentences file> --target_file=<target sentences file> --bucket_quant=50
      • python train_translator.py --help for more options.

Generating Samples

  • Generate new samples using :
    • python generate.py --seed="SOME_TEXT_TO_START_WITH" --sample_size=<SIZE OF GENERATED SEQUENCE>
  • You can test sample translations from the dataset using python translate.py.
    • This will pick random source sentences from the dataset and translate them.

Sample Generations

ANTONIO:
What say you to this part of this to thee?

KING PHILIP:
What say these faith, madam?

First Citizen:
The king of England, the will of the state,
That thou dost speak to me, and the thing that shall
In this the son of this devil to the storm,
That thou dost speak to thee to the world,
That thou dost see the bear that was the foot,

Translation Results to be updated

TODO

  • Evaluating the translation Model
  • Implement beam search - Contributors welcomed. Currently the model samples from the probability distribution from the top k most probable predictions.

References

bytenet-tensorflow's People

Contributors

gitter-badger avatar paarthneekhara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bytenet-tensorflow's Issues

Translating

Hey awesome implementation. Thanks.
In translate.py, when I want to translate a source sentence, I still need to provide a target. Is the target the same as the source?
Thanks

could not train using python3.5 and tf 0.11

encountered 2 few errors.
perhaps due to version incompatibility.

$ python train_p3.py --data_dir=/Users/jhave/Desktop/github/byteNet-tensorflow/Data/pf
Traceback (most recent call last):
File "train_p3.py", line 6, in
from ByteNet import model
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 2, in
import ops
ImportError: No module named 'ops'


after ImportError I moved ops.py into same folder as train, then encountered...

$ python train_p3.py --data_dir=/Users/jhave/Desktop/github/byteNet-tensorflow/Data/pf
Traceback (most recent call last):
File "train_p3.py", line 85, in
main()
File "train_p3.py", line 42, in main
bn_tensors = byte_net.build_prediction_model()
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 48, in build_prediction_model
decoder_output = self.decoder(source_embedding)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 123, in decoder
layer_output = self.decode_layer(curr_input, dilation, layer_no)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ByteNet/model.py", line 111, in decode_layer
name = "dec_dilated_conv_laye{}".format(layer_no)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ops.py", line 50, in dilated_conv1d
restored = batch_to_time(conv, dilation)
File "/Users/jhave/Desktop/github/byteNet-tensorflow/ops.py", line 21, in batch_to_time
[(shape[0]/dilation), -1, shape[2]])
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1977, in reshape
name=name)
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 573, in apply_op
_Attr(op_def, input_arg.type_attr))
File "//anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: DataType float32 for attr 'Tshape' not in list of allowed values: int32, int64

the padding for dilated convolution seems to be wrong

For encoder, the dilated convolution for encoder pads (filter_width - 1) * dilation/2, this means after reshaping there are (filter_width-1) zeros at the beginning. But conv1d uses SAME padding, which again, will pad (filter_width-1) number of zeros, which duplicates the zeros needed.

Assume filter_width=3, dilation=2, the input is
1 2 3 4 5
Ater padding in dilated convolution function, the input becomes
0 0 1 2 3 4 5 0 0
After the reshape,

0 1 3 5 0 
0 2 4 6 0

becomes the input to conv1d, which will again, pad with filter_width-1 zeros with the SAME padding scheme

0 0 1 3 4 0 0  
0 0 2 4 6 0 0

We can upgrade the code to compatible with tf1.0.0 by the following diff:

git diff

diff --git a/ByteNet/model.py b/ByteNet/model.py
index 4cfe3b3..3a12b5b 100644
--- a/ByteNet/model.py
+++ b/ByteNet/model.py
@@ -138,7 +138,7 @@ class Byte_net_model:
decoder_output = self.decoder(source_embedding)
loss = self.loss(decoder_output, target_sentence)

  •           tf.scalar_summary('LOSS', loss)
    
  •           tf.summary.scalar('LOSS', loss)
    
              flat_logits = tf.reshape( decoder_output, [-1, options['n_target_quant']])
              prediction = tf.argmax(flat_logits, 1)
    

@@ -220,7 +220,7 @@ class Byte_net_model:

            flat_logits = tf.reshape( decoder_output, [-1, options['n_target_quant']])
            flat_targets = tf.reshape( target_one_hot, [-1, options['n_target_quant']])
  •           loss = tf.nn.softmax_cross_entropy_with_logits(flat_logits, flat_targets, name='decoder_cross_entropy_loss')
    
  •           loss = tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits, labels=flat_targets, name='decoder_cross_entropy_loss')
    
              if 'target_mask_chars' in options:
                      # MASK LOSS BEYOND EOL IN TARGET
    

diff --git a/train_generator.py b/train_generator.py
index 78d502c..72e898b 100644

Hi Paarth

thanks for the great work. However, I noticed code may have a serious bug.
I am now playing with train_generator.py but find that the code may have some problems.

In your original evaluatio, there is no training and testing set.

So if we divide the dataset by randomly splitting the whole data. Then we slightly change the code with a simple testing set loss evaluation which like training loss.

You may find that the loss of training is decreasing from the beginning but the loss in testing set never decrese but actually increase from the beggining.

It is kind of weird, do you know the reason, do you think there may be a problem with the algorithm?

Thanks

Fajie

Pre-trained models

Hi, is it possible to share with us some of your pre-trained models? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.