yxuansu / nag-bert Goto Github PK

View Code? Open in Web Editor NEW

60.0 60.0 4.0 88 KB

[EACL'21] Non-Autoregressive with Pretrained Language Model

Home Page: https://arxiv.org/abs/2102.08220

License: Apache License 2.0

Python 99.77% Shell 0.23%

bert generation

nag-bert's Introduction

Hi there 👋

I am a final-year Ph.D. student at Language Technology Lab, University of Cambridge. I am broadly interested in natural language processing (NLP) and machine learning. The majority of my research lies in text generation. Recently, I focus my research on the topic of contrastive learning and the study of its potential in language model pre-training, discourse representation learning, knowledge probing, open-ended text generation, and multi-modal text generation. Please refer to [my personal page] for the complete list of my research.

Personally, I really like pandas. The one in my icon is my favourite and her name is Hehua.

nag-bert's People

Stargazers

Watchers

Forkers

shwang1114 rexbalaeniceps trellixvulnteam chenyangh

nag-bert's Issues

Sentence compression task

Hi! Thank you for sharing your code.

Is it possible for you to share the instructions for reproducing the results for the sentence compression task as well? It would be really helpful.

Questions about the sequence length dynamic adjustment

Hi, thanks for the coming soon source code.
I have two questions about the sequence length dynamic adjustment.

I got the point that you use two consecutive [eos]s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single [eos], e.g., I ate an [eos] apple [eos] [eos], and you need to remove all these intermediate [eos]s, is this correct?
1. If this is true, then why do you need two [eos]s instead of a single [eos]? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos], [eos])", so the point here is to make [eos] a black hole? Once decoding trajectory transits to [eos], it will not have a chance to get out? If this is correct, then why not simply set all [eos] -> non-[eos] transitions very negative weights and do not update them during training?
At the training stage, say the target sequence is I ate an apple and the length of the source sequence is 9, which of the following do you use to train the model as the target?
1. I ate an apple [eos] [eos]
2. I ate an apple [eos] [eos] [eos] [eos] [eos]

Hope I can get your reply, and thanks~

Unlikelihood loss

Hi, thanks for a great paper and code repository!

When looking at your code inside the main.py file, I see that you used the regular negative log-likelihood loss and I haven't seen in the code any reference to the unlikelihood loss of the context-aware (with a context window of size c) term that is mentioned on the paper.

Can you please point out to where in the code this loss is configured? Thanks.

Machine Translation Scripts

Hello~ Thanks for your code.

Could you release the scripts for reproducing the results of machine translation task? Thank you very much~

Question regarding the speed up

I saw the paper use argmax as the equation to obtain the sequence.
I understand that that would be a Viterbi algorithm, where the complexity is again O(n).
I'm confused that how is it faster than Auto-Regressive approach

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed?

In Gigaword dataset there are some examples where the summary is longer than the source sequence. Sometimes the sourse is a single unk word. As I can see in dataclass.py, such examples are dropped from the pipeline completely.

Were the rouge scores reported in the paper computed without those examples? If yes, then it is incorrect to compare the resulting scores with the baselines. For example, as I can see, the rouge scores for Concept Pointer were taken directly from the paper, where they measured the performance on all test examples.

Code?

I couldn't find code in this repository.

Is there any alternative link to this research?

yxuansu / nag-bert Goto Github PK

nag-bert's Introduction

Hi there 👋

nag-bert's People

Stargazers

Watchers

Forkers

nag-bert's Issues

Sentence compression task

Questions about the sequence length dynamic adjustment

Unlikelihood loss

Machine Translation Scripts

Question regarding the speed up

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed?

Code?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent