Git Product home page Git Product logo

Comments (10)

Alymostafa avatar Alymostafa commented on July 23, 2024 3

same problem here with a longer sequence.
@vblagoje
@lvwerra

from trl.

Alymostafa avatar Alymostafa commented on July 23, 2024 1

@adhitya-synth I used the same configuration as you mentioned and I found out that when the batch size is small it happens as you said but with a larger batch size as in the notebook, the reward increases.

from trl.

hdvvip avatar hdvvip commented on July 23, 2024 1

Thus, based on the OpenAI experiments in InstructGPT paper, I think that it's based on the dataset you used to train your model. In OpenAI case, with the best implementation of PPO, they still failed to improve the rewards when they train GPT-3 using PPO on FLAN and T0 datasets.

Selection_1567

from trl.

hdvvip avatar hdvvip commented on July 23, 2024 1

Well, I think we have some misunderstanding here. I didn't specifically mention you in post. I just want to explain to everyone here that depend on your tasks, PPO may work or not. So, it's not your fault when PPO failed on your NLP task. Everyone here has different tasks, so my answer didn't have anything to do with batch size. BTW, OpenAI used batch size of 128 but still failed.

from trl.

parshinsh avatar parshinsh commented on July 23, 2024

I confirm that this issue happens. I'm facing the same problem with my own task. Can anyone help with this?

from trl.

hdvvip avatar hdvvip commented on July 23, 2024

Recently, I came across OpenAI InstructGPT which is an upgrade version of GPT-3 that has been trained with reinforcement learning.
The reinforcement learning they used for training InstructGPT is PPO which is implemented in this github repository.
Related to the problem that the reward is stagnant or going down, I think even OpenAI (fathers of PPO) also face the same issue. Please see the Figure 13 below.
"As shown in Figure 13, the reward saturates after the initial 400k examples of training."

Selection_1566

Here is InstructGPT paper.
https://arxiv.org/pdf/2203.02155.pdf

from trl.

hdvvip avatar hdvvip commented on July 23, 2024

Thus, if you used PPO on your task and it doesn't work. Don't be surprised! Like I said above, some tasks PPO will work. Some tasks, it won't.

from trl.

Alymostafa avatar Alymostafa commented on July 23, 2024

Thanks for the clarification. But, I am mentioning that based on his observations when the batch size is small what he mentioned happens, but when I increased the batch size I was able to reproduce the same results as in the notebook.

from trl.

lvwerra avatar lvwerra commented on July 23, 2024

Thanks for the discussion here. Indeed, it can depend a lot on the hyperparameters as well as the task. Great you found that increasing the BS works. I think this is still a very underexplored area!

from trl.

leoribeiro avatar leoribeiro commented on July 23, 2024

@adhitya-synth I face the same problem when using larger text. Did you figure it out a way to overcome this?

from trl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.