Git Product home page Git Product logo

keras-tcn's Introduction

Keras TCN

Keras Temporal Convolutional Network. [paper]

Tested with Tensorflow 2.9, 2.10, 2.11, 2.12, 2.13, 2.14 and 2.15 (Nov 17, 2023).

Downloads Downloads Keras TCN CI

pip install keras-tcn
pip install keras-tcn --no-dependencies  # without the dependencies if you already have TF/Numpy.

For MacOS M1 users: pip install --no-binary keras-tcn keras-tcn. The --no-binary option will force pip to download the sources (tar.gz) and re-compile them locally. Also make sure that grpcio and h5py are installed correctly. There are some tutorials on how to do that online.

Why TCN (Temporal Convolutional Network) instead of LSTM/GRU?

  • TCNs exhibit longer memory than recurrent architectures with the same capacity.
  • Performs better than LSTM/GRU on long time series (Seq. MNIST, Adding Problem, Copy Memory, Word-level PTB...).
  • Parallelism (convolutional layers), flexible receptive field size (how far the model can see), stable gradients (compared to backpropagation through time, vanishing gradients)...

Visualization of a stack of dilated causal convolutional layers (Wavenet, 2016)

TCN Layer

TCN Class

    dilations=(1, 2, 4, 8, 16, 32),


  • nb_filters: Integer. The number of filters to use in the convolutional layers. Would be similar to units for LSTM. Can be a list.
  • kernel_size: Integer. The size of the kernel to use in each convolutional layer.
  • dilations: List/Tuple. A dilation list. Example is: [1, 2, 4, 8, 16, 32, 64].
  • nb_stacks: Integer. The number of stacks of residual blocks to use.
  • padding: String. The padding to use in the convolutions. 'causal' for a causal network (as in the original implementation) and 'same' for a non-causal network.
  • use_skip_connections: Boolean. If we want to add skip connections from input to each residual block.
  • return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
  • dropout_rate: Float between 0 and 1. Fraction of the input units to drop.
  • activation: The activation used in the residual blocks o = activation(x + F(x)).
  • kernel_initializer: Initializer for the kernel weights matrix (Conv1D).
  • use_batch_norm: Whether to use batch normalization in the residual layers or not.
  • use_layer_norm: Whether to use layer normalization in the residual layers or not.
  • use_weight_norm: Whether to use weight normalization in the residual layers or not.
  • go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
  • return_state: Boolean. Whether to return the last state in addition to the output. Default: False.
  • kwargs: Any other set of arguments for configuring the parent class Layer. For example "name=str", Name of the model. Use unique names when using multiple TCN.

Input shape

3D tensor with shape (batch_size, timesteps, input_dim).

timesteps can be None. This can be useful if each sequence is of a different length: Multiple Length Sequence Example.

Output shape

  • if return_sequences=True: 3D tensor with shape (batch_size, timesteps, nb_filters).
  • if return_sequences=False: 2D tensor with shape (batch_size, nb_filters).

How do I choose the correct set of parameters to configure my TCN layer?

Here are some of my notes regarding my experience using TCN:

  • nb_filters: Present in any ConvNet architecture. It is linked to the predictive power of the model and affects the size of your network. The more, the better unless you start to overfit. It's similar to the number of units in an LSTM/GRU architecture too.

  • kernel_size: Controls the spatial area/volume considered in the convolutional ops. Good values are usually between 2 and 8. If you think your sequence heavily depends on t-1 and t-2, but less on the rest, then choose a kernel size of 2/3. For NLP tasks, we prefer bigger kernel sizes. A large kernel size will make your network much bigger.

  • dilations: It controls how deep your TCN layer is. Usually, consider a list with multiple of two. You can guess how many dilations you need by matching the receptive field (of the TCN) with the length of features in your sequence. For example, if your input sequence is periodic, you might want to have multiples of that period as dilations.

  • nb_stacks: Not very useful unless your sequences are very long (like waveforms with hundreds of thousands of time steps).

  • padding: I have only used causal since a TCN stands for Temporal Convolutional Networks. Causal prevents information leakage.

  • use_skip_connections: Skip connections connects layers, similarly to DenseNet. It helps the gradients flow. Unless you experience a drop in performance, you should always activate it.

  • return_sequences: Same as the one present in the LSTM layer. Refer to the Keras doc for this parameter.

  • dropout_rate: Similar to recurrent_dropout for the LSTM layer. I usually don't use it much. Or set it to a low value like 0.05.

  • activation: Leave it to default. I have never changed it.

  • kernel_initializer: If the training of the TCN gets stuck, it might be worth changing this parameter. For example: glorot_uniform.

  • use_batch_norm, use_weight_norm, use_weight_norm: Use normalization if your network is big enough and the task contains enough data. I usually prefer using use_layer_norm, but you can try them all and see which one works the best.

Receptive field

The receptive field is defined as: the maximum number of steps back in time from current sample at time T, that a filter from (block, layer, stack, TCN) can hit (effective history) + 1. The receptive field of the TCN can be calculated using the formula:

where Nstack is the number of stacks, Nb is the number of residual blocks per stack, d is a vector containing the dilations of each residual block in each stack, and K is the kernel size. The 2 is there because there are two Conv1d layers in a single ResidualBlock.

Ideally you want your receptive field to be bigger than the largest length of input sequence, if you pass a sequence longer than your receptive field into the model, any extra values (further back in the sequence) will be replaced with zeros.


NOTE: Unlike the TCN, example figures only include a single Conv1d per layer, so the formula becomes Rfield = 1 + (K-1)⋅Nstack⋅Σi di (without the factor 2).

  • If a dilated conv net has only one stack of residual blocks with a kernel size of 2 and dilations [1, 2, 4, 8], its receptive field is 16. The image below illustrates it:

ks = 2, dilations = [1, 2, 4, 8], 1 block

  • If a dilated conv net has 2 stacks of residual blocks, you would have the situation below, that is, an increase in the receptive field up to 31:

ks = 2, dilations = [1, 2, 4, 8], 2 blocks

  • If we increased the number of stacks to 3, the size of the receptive field would increase again, such as below:

ks = 2, dilations = [1, 2, 4, 8], 3 blocks

Non-causal TCN

Making the TCN architecture non-causal allows it to take the future into consideration to do its prediction as shown in the figure below.

However, it is not anymore suitable for real-time applications.

Non-Causal TCN - ks = 3, dilations = [1, 2, 4, 8], 1 block

To use a non-causal TCN, specify padding='valid' or padding='same' when initializing the TCN layers.


Once keras-tcn is installed as a package, you can take a glimpse of what is possible to do with TCNs. Some tasks examples are available in the repository for this purpose:

cd adding_problem/
python # run adding problem task

cd copy_memory/
python # run copy memory task

cd mnist_pixel/
python # run sequential mnist pixel task

Reproducible results are possible on (NVIDIA) GPUs using the tensorflow-determinism library. It was tested with keras-tcn by @lingdoc.


Word PTB

Language modeling remains one of the primary applications of recurrent networks. In this example, we show that TCN can beat LSTM on the WordPTB task, without too much tuning.

TCN vs LSTM (comparable number of weights)

Adding Task

The task consists of feeding a large array of decimal numbers to the network, along with a boolean array of the same length. The objective is to sum the two decimals where the boolean array contain the two 1s.


Adding Problem Task

Implementation results

782/782 [==============================] - 154s 197ms/step - loss: 0.8437 - val_loss: 0.1883
782/782 [==============================] - 154s 196ms/step - loss: 0.0702 - val_loss: 0.0111
782/782 [==============================] - 152s 194ms/step - loss: 6.9630e-04 - val_loss: 3.7180e-04

Copy Memory Task

The copy memory consists of a very large array:

  • At the beginning, there's the vector x of length N. This is the vector to copy.
  • At the end, N+1 9s are present. The first 9 is seen as a delimiter.
  • In the middle, only 0s are there.

The idea is to copy the content of the vector x to the end of the large array. The task is made sufficiently complex by increasing the number of 0s in the middle.


Copy Memory Task

Implementation results (first epochs)

118/118 [==============================] - 17s 143ms/step - loss: 1.1732 - accuracy: 0.6725 - val_loss: 0.1119 - val_accuracy: 0.9796
118/118 [==============================] - 15s 125ms/step - loss: 0.0268 - accuracy: 0.9885 - val_loss: 0.0206 - val_accuracy: 0.9908
118/118 [==============================] - 15s 125ms/step - loss: 0.0228 - accuracy: 0.9900 - val_loss: 0.0169 - val_accuracy: 0.9933

Sequential MNIST


The idea here is to consider MNIST images as 1-D sequences and feed them to the network. This task is particularly hard because sequences are 28*28 = 784 elements. In order to classify correctly, the network has to remember all the sequence. Usual LSTM are unable to perform well on this task.

Sequential MNIST

Implementation results

1875/1875 [==============================] - 46s 25ms/step - loss: 0.0949 - accuracy: 0.9706 - val_loss: 0.0763 - val_accuracy: 0.9756
1875/1875 [==============================] - 46s 25ms/step - loss: 0.0831 - accuracy: 0.9743 - val_loss: 0.0656 - val_accuracy: 0.9807
1875/1875 [==============================] - 46s 25ms/step - loss: 0.0486 - accuracy: 0.9840 - val_loss: 0.0572 - val_accuracy: 0.9832
1875/1875 [==============================] - 46s 25ms/step - loss: 0.0453 - accuracy: 0.9858 - val_loss: 0.0424 - val_accuracy: 0.9862



  author = {Philippe Remy},
  title = {Temporal Convolutional Networks for Keras},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{}},


keras-tcn's People


769176706 avatar blaimelendezcatalan avatar dependabot-preview[bot] avatar evanharwin avatar hugoych avatar kismuz avatar krzim avatar li-xin-yi avatar nbertagnolli avatar philipperemy avatar psomers3 avatar qlemaire22 avatar rola93 avatar thoppe avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-tcn's Issues

I got inexplicable results on “mnist_pixel”,I didn't find the reason.

I did not modify any parameters, but I got a val_accuracy about 0.11 till many epochs:
55000/55000 [==============================] - 164s 3ms/step - loss: 2.3037 - accuracy: 0.1120 - val_loss: 2.3015 - val_accuracy: 0.1135

Is there any special setting in this example("mnist_pixel")?

ps: I try other examples too, and the result seems good.

Error "FailedPreconditionError: Attempting to use uninitialized value" saving weights

Hi Philip,

I am using keras-tcn and it works really well. Thanks for your contribution.

I can save the weights of my model using:

ModelCheckpoint('weights.h5', save_weights_only=True)

However, if I use:


I get the following error:

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _do_call(self, fn, *args)
   1333     try:
-> 1334       return fn(*args)
   1335     except errors.OpError as e:

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1318       return self._call_tf_sessionrun(
-> 1319           options, feed_dict, fetch_list, target_list, run_metadata)

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1406         self._session, options, feed_dict, fetch_list, target_list,
-> 1407         run_metadata)

FailedPreconditionError: Attempting to use uninitialized value conv1d_33/bias
	 [[{{node conv1d_33/bias/_0}} = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_134_conv1d_33/bias", _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv1d_33/bias)]]
	 [[{{node tcn_d_same_conv_4_tanh_s0_2/kernel/_99}} = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_232_tcn_d_same_conv_4_tanh_s0_2/kernel", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-31-f4a33930d8c0> in <module>()
     18          PD_signals=PD_signals, nonPD_same=nonPD_same, nonPD_different=nonPD_different, sig_shape=sig_shape)
---> 20 model.save_weights(models_dir + name + '_weights.h5')  # creates a HDF5 file 'my_model.h5'
     22 # ModelCheckpoint(models_dir + name + '_weights.h5', save_weights_only=True)

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/engine/ in save_weights(self, filepath, overwrite)
   1119                 return
   1120         with h5py.File(filepath, 'w') as f:
-> 1121             saving.save_weights_to_hdf5_group(f, self.layers)
   1122             f.flush()

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/engine/ in save_weights_to_hdf5_group(f, layers)
    570         g = f.create_group(
    571         symbolic_weights = layer.weights
--> 572         weight_values = K.batch_get_value(symbolic_weights)
    573         weight_names = []
    574         for i, (w, val) in enumerate(zip(symbolic_weights, weight_values)):

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/backend/ in batch_get_value(ops)
   2418     """
   2419     if ops:
-> 2420         return get_session().run(ops)
   2421     else:
   2422         return []

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1150     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1151       results = self._do_run(handle, final_targets, final_fetches,
-> 1152                              feed_dict_tensor, options, run_metadata)
   1153     else:
   1154       results = []

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1326     if handle is None:
   1327       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328                            run_metadata)
   1329     else:
   1330       return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/ in _do_call(self, fn, *args)
   1346           pass
   1347       message = error_interpolation.interpolate(message, self._graph)
-> 1348       raise type(e)(node_def, op, message)
   1350   def _extend_graph(self):

FailedPreconditionError: Attempting to use uninitialized value conv1d_33/bias
	 [[{{node conv1d_33/bias/_0}} = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_134_conv1d_33/bias", _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv1d_33/bias)]]
	 [[{{node tcn_d_same_conv_4_tanh_s0_2/kernel/_99}} = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_232_tcn_d_same_conv_4_tanh_s0_2/kernel", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

It seems as if "conv1d_33" (which is part of the TCN in my model) is considered as "uninitialized" by the method "save_model". Do you have any suggestion why this could be the case?

Thanks again for you great library

Dilation rate vs dilations argument


In your examples the dilations are dilatations=[1, 2, 4, 8]. This is in fact in-line with the imageof the example from WaveNet.
WaveNet Example
However this will lead to a dilation rate of 2**dilations, which implies the first dilation is 2 instead of 1. Is this a mistake? is there any reason why a dilation_depth parameter is not used instead?

code issue (process_dilations)

Hi, first of all, thanks for your code sharing.
Guessing your intentions, the following expression may be appropriate.

def process_dilations(dilations):
new_dilations = [2 ** i for i in dilations]
--> new_dilations = [2 ** i for i in range(len(dilations))]

Your setting of the conv dilation rate

Your conv_1d setting is:
atrous_rate=2 ** i, i is from [1, 2, 4, 8]
That means you want to make the dilatations of conv1d as
atrous_rate=[2, 4, 16, 256]

My sequence is 100 time-steps. As my guessing, the dilation_rate over 100 cannot be fully effective. But I tried:
atrous_rate=2 ** i, i is from [1, 2, 3, 4, 5, 6]
atrous_rate=2 ** i, i is from [1, 2, 4, 6]
Only your configuration with even dilate_rate=256 (>100) gives me the best result.

Can you explain more about this configuration? Thanks!

Tensorflow Addons


I would like to expand the code with tf.keras. Can I modify the code to release it on pypi?
I know the license is MIT, but I would like to confirm that just in case.

compatibility with plaidml backend

Hello i use plaidml as keras backend because it supports AMD graphics cards

there are two issues with your implementation

  1. spatialdropout is not supported by plaidml so
    x = SpatialDropout1D(dropout_rate, name='spatial_dropout1d_%d_s%d_%f' % (i, s, dropout_rate))(x)
    will not work

  2. it seems that keras.backend.max isn't implemented too

If you don't have time to fix it can you suggest a workaround?

Error in compiled_tcn

The shape of my input data is x_train: (856,119,68), y_train: (856,1)
where 68 are the dimension of the feature vector. I define model like this

model = tcn.compiled_tcn(num_feat=68,
dilations=[1, 2, 4], #[2 ** i for i in range(9)],
max_len=119, #train_x[0:1].shape[1],

When I run, train_y, validation_data=(test_x, test_y), epochs=100,
I get this following error
"""Error when checking target: expected activation_59 to have 3 dimensions, but got array with shape (856, 1)"""
Please let me know if I have to change anything in the code?

Weight Initialization

The default weight initialization in Keras is Glorot Uniform. I've read that for Conv2D layers you should use the initialization algorithm from Kaiming He instead. I'm not sure if that applies to 1D Conv that are used in keras-tcn, but it may be nice to have the choice regardless.

Binary classification always predicting .5, .5

Apologize if this is a dumb question but I am having a problem doing binary classification:

model = compiled_tcn(return_sequences=False,
                     dilations=[2 ** i for i in range(9)],
                     ), y_train, validation_data=(X_val, y_val), epochs=5)

Results in prediction:

array([[0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5],
       [0.5, 0.5]], dtype=float32)

Thanks for any help

Input dimension is 30K rows of 83 features, target is 0 or 1

Failed to get convolution algorithm

tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1d_1/convolution}}]]

The problem may be because I have got two GPUs (RTX 2080). I ran into the issue with CUDA 10.0 and 10.1

Add the following lines to your
The fix has been tested with the "adding problem"

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
sess = tf.Session(config=config)

Why use more stacks over more dilated layers?

The receptive field is a product of number of stacks, the largest dilation value, and kernel size. Let us say we fix kernel_size = 2 . I would like to understand the difference between two different ways of achieving the same receptive field, say a receptive field of 256:

  1. Use nb_stacks=1 stacks, and use dilations [1, 2, 4, 8, 16, 32, 64, 128]
  2. Use nb_stacks=2 stacks, and use dilations [1, 2, 4, 8, 16, 32, 64]

In general, what kind of tradeoff am I making in choosing to increase stacks versus increase dilations to get the receptive field I want, or are they equivalent?

Your reader

How to input the feature vector of voice data into TCN? Thank you.

ValueError: Error when checking target

I tried instantiating TCN via the complied_tcn function in this way:

classifier = compiled_tcn(num_feat=train_X.shape[2],
                         dilations=[2 ** i for i in range(9)],

This is the output from running the line above:

x.shape= (?, 80)
model.x = (?, 80, 26)
model.y = (?, 4)
Adam with norm clipping.

These are the shapes of my train and test sets:

[in]: print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

(10540, 80, 26) (10540, 4) (555, 80, 26) (555, 4)

When I attempt to run fit on the classifier with this line:, train_y, validation_data=(test_X, test_y), epochs=epochs,batch_size=batch_size)

I get this error:

ValueError: Error when checking target: expected activation_133 to have shape (1,) but got array with shape (4,)

I don't understand where is the shape (1,) coming from, since I have num_classes=4, regression=False and return_sequences=False and the shapes of my data and the model seem to match as well.

Multiple identical dilations lead to name error

When using (e.g.)

tcn = TCN(27,dilations=[1, 1, 1, 2, 4, 8, 16, 32],return_sequences=False)(in1)

The code gives an error

"The name "tcn_d_causal_conv_1_tanh_s0" is used 3 times in the model. All layer names should be unique."
This is due to the block naming in 'residual_block' not including the layer number, but only the parameters.

Saving and Loading Model

Thanks for creating this easy to access library. I am using it for time series forecasting scenario based on the example given. After training the model, I saved it in h5 format using save method in Keras. However, when I try to load the model again using load_model() it gives an error saying

This version performs the same function as Dropout, however it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the

IndexError: tuple index out of range

After some google search, I figured it might be because of the Lambda layer used in the model. In this scenario, what is the best option to save and load the TCN model ?

TCN with spectrograms as input

Hello everyone,

we'd like to try your TCN layers for speech enhancement task.
Our model is an autoencoder which takes ~270ms of spectrogram data (noisy speech)with dimensionality (F, T, C) where:

  • F is the number of frequency bins: 256
  • T is the number of stft() frames: 16
  • C is the number of channels: 1, but could be more
    The output has the same dimensionality of the input and is supposed to represent an enhanced version of the input speech (duh!).

It is in our understanding that we have to:

  • permute the first two dimensions in order to use our T-dimension as timesteps
  • flatten the remaining F and C dimensions to obtain the proper tensor shape

However, it is not clear how to obtain the original dimensionality after stacking one or more TCNs, as well as how to the layer in a way that reduces/expands the dimensionality of the data.

Could you shed some light on it and/or highlight any pitfall in our approach?
Thanks in advance!

Possible wasted computations when not returning sequence

Hi, thanks for offering this great package. I'm trying to build a autoencoder using TCN. In my encoder part, the TCN will not return a sequence. After looking at your code, it seems to me that if return sequence is set to False, then only one slice in the last layer is kept for output. However, the convolution is still conducted for other slices. Tracing back the dilated convolution, many computations that only affect the discarded slices is wasted. I'm not sure if I understand it correctly. If not, please kindly point out.

I also have a question in building the decoder. I'm not sure if I should use a identical dilation order in the TCN group of layers as in the encoder, or should reverse the dilation order. This is a general question not very related to the implementation of TCN itself so you can ignore it if you'd like to. Any comment/suggestions would be appreciated.


I have to apologize that I'm not a math expert, but I wish to implement a GAN network using TCN in the discriminator model to process a generated sequence with dynamic length. Is it possible to make TCN stateful so it can process dynamic length data?

And if yes, is it possible to integrate stateful TCN to seq2seq model?

Thanks so much and I'm very appreciated to you and your project,

copy_memory example issue

Hi all,

I think there is an issue in copy_memory example

It is giving below error without any change to the code:

InvalidArgumentError: Incompatible shapes: [158976] vs. [256,621]

channel_normalization layer

It seems like that channel_normalization layer is different from the weight norm layer in the paper “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling”.

Name parameter not used

The name parameter is currently not being applied to the layer. As a related issue, the current layer implementation dumps all of the model layers into a single flattened list and it's hard to see where one TCN block ends and another begins. I propose that, to solve both of these issues, the TCN layer should inherit from keras.layers.Layer

I patched the layer to implement the desired interface here, but all of this can be factored into the original TCN layer.

from keras.layers import Layer
class TCN2(Layer):
    def __init__(self, *a, name=None, **kw):
        self.model = TCN(*a, **kw)
        Layer.__init__(self, name=name)
    def call(self, x):
        return self.model(x)
    def compute_output_shape(self, input_shape):
        return input_shape[:-1] + (self.model.nb_filters,)

Here's a sketch of the changes translated to the original class:

from keras.layers import Layer
class TCN(Layer):
    # remove name from args here so it is passed in **kw
    def __init__(self, *a, ..., **kw): 
        super().__init__(*a, **kw)
    call = __call__ # just change the method name
    def compute_output_shape(self, input_shape):
        return input_shape[:-1] + (self.nb_filters,)

Now when I do this:

batch_size, timesteps, input_ch = None, fs*dur, 1

i = Input(batch_shape=(batch_size, timesteps, input_ch), name='tcn_in')
o = TCN2(16, return_sequences=True, name='tcn1')(i)
o = TCN2(16, return_sequences=True, name='tcn2')(o)
o = TCN2(16, return_sequences=True, name='tcn3')(o)

m = Model(inputs=[i], outputs=[o])
m.compile(optimizer='adam', loss='mse')
[ for l in m.layers]

I get:

['tcn_in', 'tcn1', 'tcn2', 'tcn3']

Instead of:


Question about number of filters

I'm trying to replicate the result of my previous project that used LSTM with units = 256. And on the home page of this repo it says that number of filters is like units in LSTM. And since I want the tcn to remember the whole sequence I set the receptive field about equals to the sequence length. But with number of filters 256 I ended up with a network that is way larger than the LSTM counterpart. So how do I calculate the appropriate number of filters?

Keras and Tensorflow Versions


Which versions of Tensorflow and Keras was this tested on? I keep getting issues that I think are related to incompatible version.

Missing num_classes in regression example

Testing the regression example from the README,

model = tcn.dilated_tcn(output_slice_index='last',
                        dilatations=[1, 2, 4, 8],

gives the error

Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
TypeError: dilated_tcn() missing 1 required positional argument: 'num_classes'

Keras installation error

I could install tcn using pip install keras-tcn successfully without any error.
But when I use it as
"from tcn import tcn" the following error gets generated.
File "/Users/sowmyar/anaconda/lib/python2.7/site-packages/tcn/", line 95
print(f'Updated dilations from {dilations} to {new_dilations} because of backwards compatibility.')
SyntaxError: invalid syntax

Can you help me with this issue?

Clarify what's happening if dilations=None

"dilations" is treated as an optional argument that defaults to "None": tcn.TCN(nb_filters=64, kernel_size=2, nb_stacks=1, dilations=None, activation='norm_relu', padding='causal', use_skip_connections=True, dropout_rate=0.0, return_sequences=True, name='tcn')

In the readme, it only states "List. A dilation list. Example is: [1, 2, 4, 8, 16, 32, 64]."
and in the __call__ function of it seams that it gets converted to [1, 2, 4, 8, 16, 32, 64] when it's set to None.

In this case, I think it it should either default to [1, 2, 4, 8, 16, 32] directly instead of None or it should at least be stated in the docs that it behaves so since it's quite confusing this way. Or did i miss something?

Regression many to many

I have tried to use your library to do a many to many regression, which does not seem to be supported out of the box. Would you mind to indicate what has to be changed for this use case?

Size of receptive field

Hi Phillippe, thanks for doing this code, it is very helpful and works well.

I am doing time series prediction so can choose a sequence as long or as short as I want for the TCN to work with. What I am not clear on is how the kernel size, dilation values and number of stacks interact to determine the receptive field size.

For example, if I have a kernel size=8, dilation values of [1,2,4,8] and stacks=3, how long should my sequence be so that it fully covers the receptive field of the TCN? Can you clarify?

different from the paper

In tcn/

def residual_block(x, dilation_rate, nb_filters, kernel_size, padding, dropout_rate=0):
    prev_x = x
    for k in range(2):
        x = Conv1D(filters=nb_filters,
        x = Activation('relu')(x)
        x = SpatialDropout1D(rate=dropout_rate)(x)

    x = Convolution1D(nb_filters, 1, padding='same')(x)
    res_x = keras.layers.add([prev_x, x])
    return res_x, x

I think the code x = Convolution1D(nb_filters, 1, padding='same')(x) in the penultimate line 3 is not correct. This code want to match the shapes of prev_x and x. I think it's input should be prev_x and not x.

Change the last 3 lines to:

prev_x = Conv1D(nb_filters, 1, padding='same')(prev_x)
res_x = keras.layers.add([prev_x, x])
return res_x, x
` ``

Deprecation warnings in keras layers


from tcn import tcn

model = tcn.dilated_tcn(output_slice_index='last',
                        dilatations=[1, 2, 4, 8],

gives the Warnings

Using TensorFlow backend.
/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/keras/legacy/ UserWarning: The `AtrousConvolution1D` layer  has been deprecated. Use instead the `Conv1D` layer with the `dilation_rate` argument.
  warnings.warn('The `AtrousConvolution1D` layer '
/home/hoppeta/src/keras-tcn/tcn/ UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
  res_x = Merge(mode='sum')([original_x, x])
/home/hoppeta/src/keras-tcn/tcn/ UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
  x = Merge(mode='sum')(skip_connections)
x.shape= (?, 24)
model.x = (?, 100, 20)
model.y = (?, 1)

Confused about nb_stacks

I'm reading the code and I'm just a little confused about the below lines:
for s in range(nb_stacks):
for i in dilatations:
x, skip_out = residual_block(x, s, i, activation, nb_filters, kernel_size)

What is the interpretation of this? In the paper, I had thought that each layer would have its own dilation rate, typically 2 ** i for the ith layer. What is happening here?

Different sequence length input shape error

I have a list of sequences as my input data, each sequence could have different time step size, at each time step the feature vector length is 100. I'm using fit_generator in keras to train the model, and a generator function to produce the input with batch_size I specified, so each batch from the generator is an numpy array with shape (batch_size, ), the sub array will have shape (variable_length, 100)

Since my input has different time step size, I specified the input shape using:
Input(batch_shape=(batch_size, timesteps, input_dim))
but this would give me error below:

ValueError: Error when checking input: expected num_input to have 3 dimensions, but got array with shape (128, 1)

where 128 is my batch_size, same error if I set Input(shape=(None, input_dim))


Hi philipperemy,

After I update tcn to the recent version, the training accuracy reduced tremendously (around from 0.98 to 0.17)

I then uncommented the following line of the code for BatchNormalization and things seems to be back to normal. So this seems to be an important part of tcn

x = BatchNormalization()(x) # TODO should be WeightNorm here.

Is there a quick fix that going to happen soon?

Thank you

Plans to support for TensorFlow 2.0?

Are there plans to support TensorFlow 2.0? When I try to use keras-tnn with TF 2.0, I get this error. Which is likely related to the eager execution model of TF 2.0.

Using TensorFlow backend.

AttributeError Traceback (most recent call last)
19 i = Input(batch_shape=(batch_size, timesteps, input_dim))
---> 21 o = TCN(return_sequences=False)(i) # The TCN layers are here.
22 o = Dense(1)(o)

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/tcn/ in call(self, inputs)
119 x = inputs
120 # 1D FCN.
--> 121 x = Convolution1D(self.nb_filters, 1, padding=self.padding)(x)
122 skip_connections = []
123 for s in range(self.nb_stacks):

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/keras/legacy/ in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/keras/layers/ in init(self, filters, kernel_size, strides, padding, data_format, dilation_rate, activation, use_bias, kernel_initializer, bias_initializer, kernel_regularizer, bias_regularizer, activity_regularizer, kernel_constraint, bias_constraint, **kwargs)
357 kernel_constraint=kernel_constraint,
358 bias_constraint=bias_constraint,
--> 359 **kwargs)
361 def get_config(self):

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/keras/layers/ in init(self, rank, filters, kernel_size, strides, padding, data_format, dilation_rate, activation, use_bias, kernel_initializer, bias_initializer, kernel_regularizer, bias_regularizer, activity_regularizer, kernel_constraint, bias_constraint, **kwargs)
103 bias_constraint=None,
104 **kwargs):
--> 105 super(_Conv, self).init(**kwargs)
106 self.rank = rank
107 self.filters = filters

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/keras/engine/ in init(self, **kwargs)
130 if not name:
131 prefix =
--> 132 name = to_snake_case(prefix) + '' + str(K.get_uid(prefix))
133 = name

~/miniconda3/envs/tensorflow-2.0/lib/python3.6/site-packages/keras/backend/ in get_uid(prefix)
72 """
73 global _GRAPH_UID_DICTS
---> 74 graph = tf.get_default_graph()
75 if graph not in _GRAPH_UID_DICTS:
76 _GRAPH_UID_DICTS[graph] = defaultdict(int)

AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Differences with the paper

I noticed some differences in the implementation of the architecture compared to the original paper.

I've already come across another issue, where @philipperemy was answering: "As the author of the paper stated, this TCN architecture really aims to be a simple baseline that you can improve upon. So personally I think replicating the exact same results is not that important.".

I just wanted to know if there are any reasons behind these two implementation choices that I don't understand:

  1. In the paper, they don't talk about stacking various dilated convolutions (parameter nb_stacks):
    I think I understood what you discussed here, but how this changes instead of just increasing the number of dilations?

  2. In the single residual block (in residual_block()), why is there only one convolutional layer instead of 2 as in the paper?
    It's just a simplification?

function comments

I really like this implementation : ). I pulled this down and took the time to understand it. I added a bunch of comments to the functions, do you have any interest in me opening a pull request?

Extracting latent context vectors

I'm trying to extract context vectors from the latent TCN space. Basically I want to extract all of the highlighted nodes from the hidden layers in this diagram (nodes with arrows) into a list of activations, one for each residual block:

    np.array(shape=(8, n_ch)), # finest temporal activations
    np.array(shape=(4, n_ch)), # middle temporal activations
    np.array(shape=(2, n_ch)), # coarsest temporal activations


The way that I see it, I need to get the activation layers at the end of each residual block, get their output tensors, and slice them using a stride based on the dilation rate. I'm just not sure if I need to do anything special to handle dilation rates of stacked TCN blocks. I think it would be helpful to add a helper method/property that makes it easier to extract this context.

If you're wondering about my use case, I'm trying to build a time series autoencoder-style network for multi-scale anomaly detection and I'm trying to model the learned latent space at multiple temporal scales using some distribution (probably a gaussian mixture model atm).

How to remove randomness?

I tried the following things:
`# Set seed value
seed_value = 56
import os

2. Set python built-in pseudo-random generator at a fixed value

import random

3. Set numpy pseudo-random generator at a fixed value

import numpy as np
from comet_ml import Experiment

4. Set tensorflow pseudo-random generator at a fixed value

import tensorflow as tf

5. Configure a new global tensorflow session

from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

However, I cannot still get the same answer by running several times.

Thank you very much!


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.