Git Product home page Git Product logo

Comments (7)

HuiBinR avatar HuiBinR commented on May 24, 2024 11

1630396947(1)
As I read the [fairseq document](https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-train). They defaultly use distributed data parallel. You use 32 GPU, the batch size is 8 when --required-batch-size-multiple 1 . So for each time of gradient descent, the gradient is learned from 32*8=256 cases. But when I use 1 gpu, the real batch size is 1*8=8. So this make some different.

I changed the learning rate, as it going lower, the performance getting better. From the table below, we can say 1e-06 is a proper lr for 1 GPU.

<style> </style>
  wiki-test-kilt msnbc-test-kilt aida-test-kilt clueweb-test-kilt aquaint-test-kilt ace2004-test-kilt
author_aida_model 83.02 83.54 87.92 68.75 84.32 84.82
3e-05 76.53 78.66 83.86 66.59 81.84 81.71
2e-05 79.70 81.25 85.11 67.86 81.84 84.05
1e-05 82.01 81.40 85.77 68.74 84.32 84.44
7e-06 82.23 82.62 86.27 68.74 84.46 84.44
5e-06 82.44 82.62 86.24 68.92 84.32 84.44
3e-06 82.85 83.08 86.85 68.97 84.32 85.21
1e-06 82.82 83.69 87.54 69.06 85.14 85.99
9e-07 82.94 83.99 87.38 69.11 85.01 85.60

By the way. Thanks for you always quick answer for this issue and the issue #56 . I get the same trie tree with kilt_titles_trie_dict.pkl and get similar result with the model you shared.

Thanks again! Your guidances help a lot.

from genre.

nicola-decao avatar nicola-decao commented on May 24, 2024

It looks like in the first table you are using a "trie tree build specially for aida" where in the second you are using "WIKI trie tree 'kilt_titles_trie_dict.pkl'". So these two tables do not look comparable.

from genre.

HuiBinR avatar HuiBinR commented on May 24, 2024

no, the first table also use WIKI trie tree, not a tree build specially for aida. I add the word "without" before it now.

from genre.

nicola-decao avatar nicola-decao commented on May 24, 2024

Try to lower the learning rate. I used many GPUs so the gradient might have been more precise.

from genre.

HuiBinR avatar HuiBinR commented on May 24, 2024

hi, i guess maybe this is because you use a smaller trie tree (tree build for aida) for pretrain and finetune, so the searching space is smaller for the model. This will bring some performance improvement.
Do you think this assumption is reasonable?

from genre.

nicola-decao avatar nicola-decao commented on May 24, 2024

I did not train with a trie. Not during pertaining nor during finetuning.

from genre.

HuiBinR avatar HuiBinR commented on May 24, 2024

Yes, you didn't. I re-read the section 4.1, you train and finetune just like normal generation task. But when testing, use a trie to constrain the output.
But in Appendix A.1, the paper mentioned we fine-tune on AIDA without resetting the learning rate nor the optimizer statistics for 10k steps.
I just use 1 GPU, and you use 32 GPU, will this lead the result difference? I will try to lower the learning rate and see.

from genre.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.