Git Product home page Git Product logo

Comments (15)

Luux avatar Luux commented on August 30, 2024 2

@beasteers You probably forgot the link :P
It's this one I guess? https://github.com/vincentherrmann/pytorch-wavenet/blob/master/wavenet_model.py

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024

@DefinitlyEvil no problem
You would use stateful only when you don't know the whole sequence in advance. For example, imagine you want to train in a streaming fashion. Data is coming to your system one point at a time. Stateful was designed for that.
In your case, you can train on sequence of different length without a stateful model. Have a look at this example:

https://github.com/philipperemy/keras-tcn/blob/master/tasks/multi_length_sequences.py

And of course within the same batch all sequences should have the same length. Otherwise just pad them.

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024

And thanks for your support :)

from keras-tcn.

DefinitlyEvil avatar DefinitlyEvil commented on August 30, 2024

@philipperemy
Hi, thank you soooo much for your reply. I did look into this example and I found it very helpful.
But dynamic length made it impossible to train on a big batch so it will have terrible efficiency.
I am trying to use it to generate midi music using Bach's pieces by GAN.

I used PrettyMIDI to get a piano roll at 20 samples/sec and fed it into a generator TCN with nb_stacks=4 and default parameters.
The discriminator is the same but its return_sequence = False and Dense(1, activation="sigmoid").
I used the way from the copy memory task example where there is one more dimension telling its time to generate or not.
so the input and output of the generator are like(129=128 notes + 1 control flag):

in (None, None, 129):
000000000000000000|111111111111
a short slice of music |0000000000000
----
out(None, None, 129):
000000000000000000|111111111111
000000000000000000| generated music

after hours of training, the model can only produce a single holding note.
I switched all final-layer activation functions from softmax to sigmoid but that doesn't help at all.

Could you please give my code an inspection to find the problems?

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024
  • Dense(1, 'sigmoid') with a binary_crossentropy loss is exactly the same as Dense(2, 'softmax') with categorical_crossentropy.
  • Why do you need a time parameter in the first place? I think you should simplify the problem first. Like truncate all your MIDI files to a length of 128 and make it work. Then, you can add a bit of complexity.

When GANs generate images, they see the image in its whole. Can't you just do the same with your MIDI files? You can see MIDI files as a flattened image.

The idea of the copy memory task is to test that the network can remember a sequence over the course of many steps. You probably don't need that. TCN is a network that takes into input a sequence and outputs whatever you want. Also you probably don't need the temporal feature (unless you want to streaming your output one note at a time).

  • You would have the generator: random_number -> array of 128 elements.
  • The discriminator would take this array of 128 elements and outputs 0 or 1, depending whether it thinks it's a legit music or not.

from keras-tcn.

DefinitlyEvil avatar DefinitlyEvil commented on August 30, 2024

Ah, thank you, now I think I understood more about GAN networks. I think I'm gonna do more experiences, thanks! ;)

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024

Great ;)

from keras-tcn.

Luux avatar Luux commented on August 30, 2024

Just wondering: what would be the best way if you want to replace a stateful lstm? Using a "cache"/sliding window of size (2 * receptive_field - 1) and predict every (receptive_field) timesteps? This way the last new data point still has the entire history that is part of the receptive field. You would have a delay of (receptive field size) timesteps. Or am I wrong?
Is there any elegant way to do this for a single timestep while not having to throw away all the information calculated for the previous ones?

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024

@Luux yes I know what you mean. At the moment the computations are lost between two calls. In a stateful LSTM you keep the states (c, h) in memory. But in a TCN you dont have such states. So stateful or stateless does not really make sense for TCN. However, we could always cache some computations instead of recomputing all of them. But Im not sure on how to do that. It would definitely be difficult in my opinion. I dont have a clear idea how to replace a stateful LSTM.

from keras-tcn.

Luux avatar Luux commented on August 30, 2024

@philipperemy Yep, that's the problem. I think that such an approach could be an improvement to State-of-the-Art implementations when it comes to real time prediction of temporal data, where stateful LSTM are still the way to go as far as I know.

I have no idea if this is even possible with keras; I think it should be somehow if you take pure tensorflow, but an implementation of this is currently beyond my knowledge.

from keras-tcn.

philipperemy avatar philipperemy commented on August 30, 2024

@Luux yes I agree with you. A pure tensorflow implementation is definitely do-able. In Keras, I'm not even sure it's possible. I guess it would be but would require a lot of work.

from keras-tcn.

beasteers avatar beasteers commented on August 30, 2024

If I recall correctly, I've seen some wavenet implementations (and I think the original paper too?) that use a FIFO queue as a way of tracking states for the generative model. I'll have to see if I can track down the repo that I'm thinking of, I don't see it right now.

Update: Just found one: https://github.com/ibab/tensorflow-wavenet/blob/3c973c038c8c2c20fef0039f111cb04139ff594b/wavenet/model.py#L451

I'm not sure how queues play with backprop+Keras so it may or may not integrate well

from keras-tcn.

beasteers avatar beasteers commented on August 30, 2024

Here's a PyTorch implementation that seems to use a (custom implemented) DilatedQueue class alongside Conv1D layers (instead of the hand-rolled convolutions from the tf implementation) so it might be a useful example.

from keras-tcn.

Luux avatar Luux commented on August 30, 2024

@philipperemy @beasteers I don't have the time this or next week to further look into this (maybe after that :P), but the implementation @beasteers found leads to this paper that proposes special queues for caching already calculated activations in Wavenet: https://arxiv.org/abs/1611.09482
I've only had a quick look at this by now, but that could be the one we're looking for.

from keras-tcn.

beasteers avatar beasteers commented on August 30, 2024

Hey @philipperemy @Luux! I hope you've been well. :)

My work diverged a bit, but now I'm starting to look back on TCNs and this was one of the issues that I wanted to get implemented.

Currently, I'm trying to train some TCN layers on top of a fairly large CNN that processes in 1 second chunks. I'm aiming for a larger receptive field (about 10-15 seconds or so), but my coworker has been expressing concern about memory constraints from duplicating the model that many times. So that got my mind back on the idea of stateful TCNs.

Working off of: https://github.com/tomlepaine/fast-wavenet/blob/master/wavenet/models.py

NOTE: They only use queues for generating - does it make sense to allow them for training as well?

I'm trying to formulate it in a way that doesn't require duplicating the existing architecture definition, so I've started on a Conv1D drop-in replacement.

But the outstanding problems/questions are:

  • tf.fifoqueue has no way of getting values from the queue without dequeueing.
    This is a problem because the current assumption in the above library is that the input is a single sample, and the kernel size is 2. But to support one sample input with a kernel size of 3, you'd have to dequeue 2, but only step forward 1.
  • How do we handle return_sequences? Currently this returns sample by sample
  • When do we reset states? I'm not sure how LSTMs control this, but that's probably the place to look
  • Is extending Conv1D the right approach? this seems like the least opinionated way of doing it, but idk if there are benefits of doing it differently

please let me know your thoughts! 🙃

import tensorflow as tf

class QueueConv1D(tf.keras.layers.Conv1D):
    def reset_states(self):
        self.q.enqueue_many(tf.zeros(self.dilation_rate + tuple(self.kernel.shape)))

    def build(self, input_shape):
        super().build(input_shape)
        # initialize queue
        self.q = tf.queue.FIFOQueue(
            self.dilation_rate[0], dtypes=tf.float32, shapes=self.kernel.shape)
        self.reset_states()

    def call(self, inputs):
        # valid for kernel_size in (1, 2)
        outputs = tf.matmul(inputs, self.kernel[0])
        if self.kernel_size[0] > 1:
            outputs += tf.matmul(self.q.dequeue(), self.kernel[1])
            self.add_update(self.q.enqueue([inputs]))

        # TODO: how to support other kernel/input sizes????
        #       but we can't get values from the queue without dequeueing so this doesn't work...
        # outputs = (
        #     tf.matmul(inputs, self.kernel[0])[None] + 
        #     tf.matmul(self.q[:len(self.kernel) - 1], self.kernel[1:])
        # )
        # self.add_update(self.q.enqueue([inputs]))

        if self.use_bias:
            dformat = 'NCHW' if self.data_format == 'channels_first' else 'NHWC'
            outputs = tf.nn.bias_add(outputs, self.bias, data_format=dformat)

        if self.activation is not None:
            return self.activation(outputs)
        return outputs

Other notes:

  • I'm not sure if we should be building keras layers using eager mode or if we should still be using tf.keras.backend.function and whatnot idk - eager mode confuses me

from keras-tcn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.