Git Product home page Git Product logo

bytenet_tensorflow's Introduction

ByteNet -- TensorFlow Implementation

This is a TensorFlow implementation of the ByteNet generative neural network architecture for seq2seq text generation. This is a fork from ibab's tensorflow-wavenet.

##Work In Progress

In the bytenet folder you will find the model and its associated ops. Will be working on this throughout the next week.

###Implemented:

  • Source Network
  • Residual Block (Fig 3 left)
  • Regular Batch Normalization
  • Reduction Summing of Nodes (Not in paper but useful for attention in decoder)
  • Option to just use dilations in network
  • Option to use wavenet's dilation architecture (proven useful for language models)
  • Use different filter widths for convolutional dilations

###Need To Implement -- Contributions Welcome

  • Target Network
  • Framework for Training and Decoding (probably pull code from tensorflow's seq2seq)
  • Sub Batch Normalization

##Structure

The structure of the model will be in two parts:

  • One ByteNet model is initialized as a source network
  • A Second ByteNet model is initialized as a target network

To implement network:

from bytenet import bytenet_model

source_network = bytenet_model.ByteNetModel(args)
source_output = source_network.create_source_network(inputs)

target_network = bytenet_model.ByteNetModel(args) 
output = target_network.create_target_network(source_output, conditional_inputs) #this has not been implemented

This is done this way so that ByteNet is modular. This means that you can use a convolution encoder and a RNN decoder. Parts should be interchangeable.

##Results

With the source network, it does perform pretty well for language tasks, but not as well as a standard vanilla LSTM 3 layer stack. The bytenet source network certainly consumes your gpus. I running it on three oc Titan X's and all three at 100% gpu usage most of the time.

In terms of actual wall-time, the 1024 dilation (25 layers repeating 1,2,4,8,16 rates) is slightly faster than 1024 unit LSTM. Not exactly comparing them appropriately because the source network has way more parameters, and there are far more computations being done. However, given the fact that the wall time is almost the same due to parrallelization, I feel the source network has promise.

##Other Notes

Contributions are welcome!

List of contributions that would help me out while I work on other parts of the network:

  • Masking Causal Convolutions
  • Sub Batch Normalization
  • Multiple Integration Advanced Block Network (fig 3 right)

I apologize for all the comments but it helps me make progress, debug, and understand what is going on specifically. I do many pushes as I experiment heavily with different combinations of models.

bytenet_tensorflow's People

Contributors

ibab avatar nickshahml avatar jyegerlehner avatar lemonzi avatar mecab avatar robinsloan avatar genekogan avatar mortont avatar nakosung avatar pkhorrami4 avatar bdelmas avatar chanil1218 avatar huyouare avatar jfsantos avatar maxhodak avatar macsj200 avatar zectbynmo avatar prajitr avatar pineking avatar tomlepaine avatar code-terminator avatar fehiepsi avatar jrao1 avatar multivac61 avatar

Watchers

Viacheslav Seledkin avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.