Questions about the sequence length dynamic adjustment about nag-bert HOT 7 CLOSED

speedcell4 commented on June 12, 2024

Questions about the sequence length dynamic adjustment

from nag-bert.

Comments (7)

clearloveclearlove commented on June 12, 2024

Hi, thanks for the coming soon source code.
I have two questions about the sequence length dynamic adjustment.

I got the point that you use two consecutive [eos]s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single [eos], e.g., I ate an [eos] apple [eos] [eos], and you need to remove all these intermediate [eos]s, is this correct?

If this is true, then why do you need two [eos]s instead of a single [eos]? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos], [eos])", so the point here is to make [eos] a black hole? Once decoding trajectory transits to [eos], it will not have a chance to get out? If this is correct, then why not simply set all [eos] -> non-[eos] transitions very negative weights and do not update them during training?

At the training stage, say the target sequence is I ate an apple and the length of the source sequence is 9, which of the following do you use to train the model as the target?

I ate an apple [eos] [eos]

I ate an apple [eos] [eos] [eos] [eos] [eos]

Hope I can get your reply, and thanks~

Friend, I also have this question. Have you figured out this question later? I need your help.
Recently, there was a paper in 2021acl that used this method to do a task of correcting Chinese grammar. He was even more outrageous. The length of the target sentence was known by default during the test, and I was completely confused.

from nag-bert.

speedcell4 commented on June 12, 2024

@clearloveclearlove
No, I am still waiting for the reply from the authors.
By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"

from nag-bert.

clearloveclearlove commented on June 12, 2024

@clearloveclearlove
No, I am still waiting for the reply from the authors.
By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"

i mean the paper <>. same architecture bert+crf for Grammatical Error Correction.
the code supply by the author when he test, he direct use the information of target length, so ...

from nag-bert.

clearloveclearlove commented on June 12, 2024

@clearloveclearlove
No, I am still waiting for the reply from the authors.
By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"

i mean the paper <>. same architecture bert+crf for Grammatical Error Correction.
the code supply by the author when he test, he direct use the information of target length, so ...

Tail-to-Tail Non-Autoregressive Sequence Prediction for ChineseGrammatical Error Correction

from nag-bert.

yxuansu commented on June 12, 2024

Hi, thanks for the coming soon source code. I have two questions about the sequence length dynamic adjustment.

I got the point that you use two consecutive [eos]s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single [eos], e.g., I ate an [eos] apple [eos] [eos], and you need to remove all these intermediate [eos]s, is this correct?

If this is true, then why do you need two [eos]s instead of a single [eos]? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos], [eos])", so the point here is to make [eos] a black hole? Once decoding trajectory transits to [eos], it will not have a chance to get out? If this is correct, then why not simply set all [eos] -> non-[eos] transitions very negative weights and do not update them during training?

At the training stage, say the target sequence is I ate an apple and the length of the source sequence is 9, which of the following do you use to train the model as the target?

I ate an apple [eos] [eos]

I ate an apple [eos] [eos] [eos] [eos] [eos]

Hope I can get your reply, and thanks~

Hello, sorry for my very late reply... During training, we use this configuration I ate an apple [eos] [eos]. Because we found that if we append many [eos] tokens as I ate an apple [eos] [eos] [eos] [eos] [eos], the model parameters will be overwhelmed by the occurance of [eos] token and it only learns to generate [eos] token as well. In practice, the generation of sequences like 'I ate an [eos] apple [eos] [eos]' are possible, but putting two [eos] tokens in training could reduce this phenomenon. Feel free to ask follow up questions! Sorry for my late reply again.

from nag-bert.

yxuansu commented on June 12, 2024

@clearloveclearlove
No, I am still waiting for the reply from the authors.
By the way, which ACL2021 do you mean? I am curious about why you called it "outrageous"

i mean the paper <>. same architecture bert+crf for Grammatical Error Correction. the code supply by the author when he test, he direct use the information of target length, so ...

I know this paper too. The paper is from one of co-author of NAG-BERT. I did not look into details of his paper. I can help you ask him about the details if you need.

from nag-bert.

speedcell4 commented on June 12, 2024

Thanks~

from nag-bert.

Questions about the sequence length dynamic adjustment about nag-bert HOT 7 CLOSED

Comments (7)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent