I noticed some differences in the implementation of the architecture compared to the o

Another 2 questions: the paper also mentions using Gradient Cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Merged. Only weight norm is missing <a class="commit-link" data-hovercard-type="co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Differences with the paper about keras-tcn HOT 9 CLOSED

lukeb88 commented on August 30, 2024 2

Differences with the paper

from keras-tcn.

Comments (9)

lingdoc commented on August 30, 2024

Another 2 questions:

the paper also mentions using Gradient Clipping for language modeling, and I assume that this was done as part of the optimization of the Adam optimizer, but just to clarify, was this norm scaling?
the paper mentions gating as well, but is this implemented in the current keras version?

from keras-tcn.

philipperemy commented on August 30, 2024

@lingdoc this is in my TODO list. I'll dig into that when I have a bit of time. In ~1 week from now!

from keras-tcn.

lingdoc commented on August 30, 2024

btw, great implementation! I'm comparing it with a BiLSTM-CNN variant on a text classification problem and getting similar results with half the runtime.

from keras-tcn.

philipperemy commented on August 30, 2024

That's great! Thank you :)

from keras-tcn.

philipperemy commented on August 30, 2024

Just for info, I'm working on a PR to make the implementation identical to the paper: #42

from keras-tcn.

philipperemy commented on August 30, 2024

Merged. Only weight norm is missing
141ef1c

from keras-tcn.

philipperemy commented on August 30, 2024

@lukeb88 :

In the paper, they don't talk about stacking various dilated convolutions (parameter nb_stacks):
I think I understood what you discussed here, but how this changes instead of just increasing the number of dilations?

=> fixed I set the nb_stacks to 1 for all the examples. Results are the same.

how this changes instead of just increasing the number of dilations?

cf. https://arxiv.org/pdf/1609.03499.pdf

In the single residual block (in residual_block()), why is there only one convolutional layer instead of 2 as in the paper?
It's just a simplification?

=> Was a simplification. But I fixed it!

from keras-tcn.

lukeb88 commented on August 30, 2024

@philipperemy I supposed that the idea of stacking dilated convolution was taken from the Wavenet paper!
Thanks for your clarification...

great work, btw!!!

from keras-tcn.

philipperemy commented on August 30, 2024

@lukeb88 yeah the idea was taken from wavenet!
Thanks I highly appreciate :)

from keras-tcn.

Recommend Projects