First, Thank you for your work
By following your description, I'm trying to implement the attention layers each with tf 2.1.
I have a question that does the line 221 requires to be add a "squeeze" the inputs ?
attention_score = RepeatVector(source_hidden_states.shape[1])(tf.squeeze(attention_score))
because if i understood the full code correctly, the h_t is already expanded_dim and its attention score is (B, 1, H) before getting in the Repeatvector. However, when i feeding the (B, 1, H) to repeat vector, it rise an error as [repeat_vector is incompatible with the layer: expected ndim=2, found ndim=3.]
Thank you
|
attention_score = RepeatVector(source_hidden_states.shape[1])(attention_score) # (B, S*, H) |