Git Product home page Git Product logo

Comments (9)

paarthneekhara avatar paarthneekhara commented on May 18, 2024 1

Just an update in case you are still following this. I rewrote almost the complete model. I noticed there was a bug in the encoder for the translation model earlier (in the dilated conv1d implementation). I have updated the ops and training is in progress. It looks good, but it will take time to train since I am not using a powerful GPU.

Also, the model is now neater and faster. It accepts variable length sentences as opposed to the last
implementation in which the sentence length had to specified while initializing the model.

from bytenet-tensorflow.

amirj avatar amirj commented on May 18, 2024

or translation experiments?

from bytenet-tensorflow.

paarthneekhara avatar paarthneekhara commented on May 18, 2024

No, I could not. Not sure why. Check out this alternate implementation https://github.com/buriburisuri/ByteNet , though from the results I don't think it is able to reproduce the paper results either.

from bytenet-tensorflow.

tindzk avatar tindzk commented on May 18, 2024

Thanks for the great work! I compared the current implementation with the paper and noticed a few discrepancies:

  • Section 3.3 only applies to the translation model, but the implemented prediction model has an embedding of size n x 2d (w_source_embedding). The paper does not state the correct embedding size, but intuitively I would set it to n x c with c being the number of characters.

  • The authors write: "Each layer is wrapped in a residual block that contains additional convolutional layers with filters of size 1 × 1 (He et al., 2016)." decode_layer() does not seem to initialise any filters of size 1 x 1.

  • The authors use Multiplicative Units instead of ReLUs for the prediction model (section 3.6). See section 4.1 of this paper for an explanation of Multiplicative Units.

  • Layer Normalisation should be used before the activation function (section 3.6).

from bytenet-tensorflow.

speakerjohnash avatar speakerjohnash commented on May 18, 2024

Can you quantify what sections of the code you feel you are confident align with the paper itself and which parts you are uncertain about?

from bytenet-tensorflow.

paarthneekhara avatar paarthneekhara commented on May 18, 2024

@tindzk

  1. Yes, the author does not mention the embedding size for prediction model. You can try setting it to c, but (intuitively) I don't think that will significantly improve the performance.

  2. The deocde layer does infact pack the dilated convolutions by 1X1 convolutions defined. See conv1 and conv2). The default filter width is 1 in ops.conv1d.
    `def decode_layer(self, input_, dilation, layer_no):
    options = self.options
    relu1 = tf.nn.relu(input_, name = 'dec_relu1_layer{}'.format(layer_no))
    conv1 = ops.conv1d(relu1, options['residual_channels'], name = 'dec_conv1d_1_layer{}'.format(layer_no))

     relu2 = tf.nn.relu(conv1, name = 'enc_relu2_layer{}'.format(layer_no))
     dilated_conv = ops.dilated_conv1d(relu2, options['residual_channels'], 
     	dilation, options['decoder_filter_width'],
     	causal = True, 
     	name = "dec_dilated_conv_laye{}".format(layer_no)
     	)
     
     relu3 = tf.nn.relu(dilated_conv, name = 'dec_relu3_layer{}'.format(layer_no))
     conv2 = ops.conv1d(relu3, 2 * options['residual_channels'], name = 'dec_conv1d_2_layer{}'.format(layer_no))
     
     return input_ + conv2`
    
  3. Yes, the multiplicative units have not been implemented for the prediction model. You can try doing that to check if that improves the performance.

  4. Layer normalisation seems to be an update in the paper. Earlier this was sub-batch normalisation. Check this https://arxiv.org/pdf/1610.10099v1.pdf . I'll implement this soon.

from bytenet-tensorflow.

paarthneekhara avatar paarthneekhara commented on May 18, 2024

@msevrens
I think except layer normalisation, everything is in line with paper. I cannot guarantee a bug-free implementation since the results are not aligned with the paper. I'll be working on this in the coming week.

from bytenet-tensorflow.

speakerjohnash avatar speakerjohnash commented on May 18, 2024

@paarthneekhara

If you replace the output embedding targets with word embeddings, your code actually translates semantically very well but syntactically poorly. It also over estimates common tokens in translation.

I think you can actually see this pattern when the output targets are character embeddings too. In that case, uncommon collections of characters appear to be easily predicted even though the sequences are long and complex but interspersed between all of the correct learned sequences are random injections of common tokens.

I don't know how that information can help others in recreating the paper but there it is.

from bytenet-tensorflow.

speakerjohnash avatar speakerjohnash commented on May 18, 2024

@paarthneekhara

Why is the target embedding going into the decoder? How does that work at evaluation time with no target to feed?

from bytenet-tensorflow.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.