Git Product home page Git Product logo

nag-bert's Introduction

Hi there ๐Ÿ‘‹

I am a final-year Ph.D. student at Language Technology Lab, University of Cambridge. I am broadly interested in natural language processing (NLP) and machine learning. The majority of my research lies in text generation. Recently, I focus my research on the topic of contrastive learning and the study of its potential in language model pre-training, discourse representation learning, knowledge probing, open-ended text generation, and multi-modal text generation. Please refer to [my personal page] for the complete list of my research.

Personally, I really like pandas. The one in my icon is my favourite and her name is Hehua.

Yixuan's GitHub stats

nag-bert's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nag-bert's Issues

Sentence compression task

Hi! Thank you for sharing your code.

Is it possible for you to share the instructions for reproducing the results for the sentence compression task as well? It would be really helpful.

Questions about the sequence length dynamic adjustment

Hi, thanks for the coming soon source code.
I have two questions about the sequence length dynamic adjustment.

  1. I got the point that you use two consecutive [eos]s to indicate the end of the sequence. But at the intermediate of the sequence, it is still possible to generate a single [eos], e.g., I ate an [eos] apple [eos] [eos], and you need to remove all these intermediate [eos]s, is this correct?
    1. If this is true, then why do you need two [eos]s instead of a single [eos]? You have mentioned "Once the decoded trajectory enters the [eos] state, the state transition term in S(X, Y_0) will be dominated by the transition score term t([eos], [eos])", so the point here is to make [eos] a black hole? Once decoding trajectory transits to [eos], it will not have a chance to get out? If this is correct, then why not simply set all [eos] -> non-[eos] transitions very negative weights and do not update them during training?
  2. At the training stage, say the target sequence is I ate an apple and the length of the source sequence is 9, which of the following do you use to train the model as the target?
    1. I ate an apple [eos] [eos]
    2. I ate an apple [eos] [eos] [eos] [eos] [eos]

Hope I can get your reply, and thanks~

Unlikelihood loss

Hi, thanks for a great paper and code repository!

When looking at your code inside the main.py file, I see that you used the regular negative log-likelihood loss and I haven't seen in the code any reference to the unlikelihood loss of the context-aware (with a context window of size c) term that is mentioned on the paper.

Can you please point out to where in the code this loss is configured? Thanks.

Machine Translation Scripts

Hello~ Thanks for your code.

Could you release the scripts for reproducing the results of machine translation task? Thank you very much~

Question regarding the speed up

I saw the paper use argmax as the equation to obtain the sequence.
I understand that that would be a Viterbi algorithm, where the complexity is again O(n).
I'm confused that how is it faster than Auto-Regressive approach

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed?

In Gigaword dataset there are some examples where the summary is longer than the source sequence. Sometimes the sourse is a single unk word. As I can see in dataclass.py, such examples are dropped from the pipeline completely.

Were the rouge scores reported in the paper computed without those examples? If yes, then it is incorrect to compare the resulting scores with the baselines. For example, as I can see, the rouge scores for Concept Pointer were taken directly from the paper, where they measured the performance on all test examples.

Code?

I couldn't find code in this repository.

Is there any alternative link to this research?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.