I spent some time trying to figure out what the GRU really does. My understanding

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Closing as I got the answers I was looking for :-) Thanks <a class="user-mention n

Is the GRU really needed to predict mu_t ? about uis-rnn HOT 7 CLOSED

google commented on July 28, 2024

Is the GRU really needed to predict mu_t ?

from uis-rnn.

Comments (7)

wq2012 commented on July 28, 2024

@AnzCol Hi Aonan, what are your thoughts here?

from uis-rnn.

AnzCol commented on July 28, 2024

Hi, The output for GRU are those m_s (see Fig.2). We use a running average of m_{1,2,...,s} to estimate the observation x_s (see the inline equation below Eq.(11)). Thanks, Aonan

…

On Thu, Mar 28, 2019 at 11:31 AM Quan Wang ***@***.***> wrote: @AnzCol <https://github.com/AnzCol> Hi Aonan, what are your thoughts here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#42 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AoCHFRs7_Sz0n1OTl1jDsVE2SGI2LdTIks5vbOBWgaJpZM4cQc7y> .

from uis-rnn.

wq2012 commented on July 28, 2024

@AnzCol I think what @hbredin means is - what if we simply define m_t=x_t, will it still work? Did we have such experiments (my impression is no)?

Personally I don't think it's going to work well.

My understanding is that (@AnzCol please correct me if I'm wrong), the training process forces m_t for each speaker to better fall into a normal distribution. But this is not guaranteed in the distributions of x_t. The power of GRU here is that, to transform the distributions of speaker embeddings into a more clusterable distribution, by learning from the training dataset.

@hbredin Does this explanation make sense to you?

from uis-rnn.

hbredin commented on July 28, 2024

@AnzCol I think what @hbredin means is - what if we simply define m_t=x_t, will it still work? Did we have such experiments (my impression is no)?

This is what I meant, indeed.

My understanding is that (@AnzCol please correct me if I'm wrong), the training process forces m_t for each speaker to better fall into a normal distribution. But this is not guaranteed in the distributions of x_t. The power of GRU here is that, to transform the distributions of speaker embeddings into a more clusterable distribution, by learning from the training dataset.

Except you are still using raw x_t in Equation 11, so the distribution of speaker embeddings is not changed. Or did I miss something?

@hbredin Does this explanation make sense to you?

Not quite sure -- I think I have to think a bit more about this...
I would really like to see an ablative study with m_t = x_t :-)

from uis-rnn.

zan12 commented on July 28, 2024

Thanks for clarifying your question. Let me make a complement according to my experience. I personally did some unpublished experiments when m_t=x_t. Under best hyper-parameter settings, there was a performance drop around 3% comparing to the current case when m_t is only used to calculate means. Quan's explanation is very intuitive indeed. Another simple explanation is that use empirical mean seems to be more statistically stable. Hervé BREDIN <[email protected]> 于2019年3月28日周四下午3:41写道：

…

@AnzCol <https://github.com/AnzCol> I think what @hbredin <https://github.com/hbredin> means is - what if we simply define m_t=x_t, will it still work? Did we have such experiments (my impression is no)? This is what I meant, indeed. My understanding is that ***@***.*** <https://github.com/AnzCol> please correct me if I'm wrong), the training process forces m_t for each speaker to better fall into a normal distribution. But this is not guaranteed in the distributions of x_t. The power of GRU here is that, to *transform the distributions of speaker embeddings into a more clusterable distribution*, by learning from the training dataset. Except you are still using raw x_t in Equation 11, so the distribution of speaker embeddings is not changed. Or did I miss something? @hbredin <https://github.com/hbredin> Does this explanation make sense to you? Not quite sure -- I think I have to think a bit more about this... I would really like to see an ablative study with m_t = x_t :-) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#42 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADRqItQ6JdGyxQ4nviNqqwBQuX0O1v6yks5vbRrdgaJpZM4cQc7y> .

from uis-rnn.

wq2012 commented on July 28, 2024

@hbredin Not sure whether this example makes sense: consider two clusters, their distributions of x_t largely overlap with each other, but their distributions of m_t are better separated. Eq. 11 regularizes that m_t should not disjoint too much from x_t.

from uis-rnn.

hbredin commented on July 28, 2024

Closing as I got the answers I was looking for :-)
Thanks @AnzCol and @wq2012 !

from uis-rnn.

Is the GRU really needed to predict mu_t ? about uis-rnn HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent