yxuansu / nag-bert Goto Github PK

View Code? Open in Web Editor NEW

61.0 5.0 4.0 88 KB

[EACL'21] Non-Autoregressive with Pretrained Language Model

Home Page: https://arxiv.org/abs/2102.08220

License: Apache License 2.0

Python 99.77% Shell 0.23%

generation bert

nag-bert's Issues

Questions about the sequence length dynamic adjustment

Hi, thanks for the coming soon source code.
I have two questions about the sequence length dynamic adjustment.

I got the point that you use two consecutive [eos]s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single [eos], e.g., I ate an [eos] apple [eos] [eos], and you need to remove all these intermediate [eos]s, is this correct?
1. If this is true, then why do you need two [eos]s instead of a single [eos]? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos], [eos])", so the point here is to make [eos] a black hole? Once decoding trajectory transits to [eos], it will not have a chance to get out? If this is correct, then why not simply set all [eos] -> non-[eos] transitions very negative weights and do not update them during training?
At the training stage, say the target sequence is I ate an apple and the length of the source sequence is 9, which of the following do you use to train the model as the target?
1. I ate an apple [eos] [eos]
2. I ate an apple [eos] [eos] [eos] [eos] [eos]

Hope I can get your reply, and thanks~

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed?

In Gigaword dataset there are some examples where the summary is longer than the source sequence. Sometimes the sourse is a single unk word. As I can see in dataclass.py, such examples are dropped from the pipeline completely.

Were the rouge scores reported in the paper computed without those examples? If yes, then it is incorrect to compare the resulting scores with the baselines. For example, as I can see, the rouge scores for Concept Pointer were taken directly from the paper, where they measured the performance on all test examples.

Machine Translation Scripts

Hello~ Thanks for your code.

Could you release the scripts for reproducing the results of machine translation task? Thank you very much~

Question regarding the speed up

I saw the paper use argmax as the equation to obtain the sequence.
I understand that that would be a Viterbi algorithm, where the complexity is again O(n).
I'm confused that how is it faster than Auto-Regressive approach

Code?

I couldn't find code in this repository.

Is there any alternative link to this research?

Unlikelihood loss

Hi, thanks for a great paper and code repository!

When looking at your code inside the main.py file, I see that you used the regular negative log-likelihood loss and I haven't seen in the code any reference to the unlikelihood loss of the context-aware (with a context window of size c) term that is mentioned on the paper.

Can you please point out to where in the code this loss is configured? Thanks.

Sentence compression task

Hi! Thank you for sharing your code.

Is it possible for you to share the instructions for reproducing the results for the sentence compression task as well? It would be really helpful.

yxuansu / nag-bert Goto Github PK

nag-bert's Issues

Questions about the sequence length dynamic adjustment

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed?

Machine Translation Scripts

Question regarding the speed up

Code?

Unlikelihood loss

Sentence compression task

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent