Hi! I've been attempting to run examples/s/ppo/ppo_tld

Hi thanks so much for reporting! I have created <a class="issue-link js-issue-link" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

TypeError when not passing total_episodes in PPOv2Trainer about trl HOT 4 CLOSED

meng-wenlong commented on August 25, 2024

TypeError when not passing total_episodes in PPOv2Trainer

from trl.

Comments (4)

vwxyzjn commented on August 25, 2024 1

Hi thanks so much for reporting! I have created #1743 to close this issue.

However, the num_train_epochs and num_ppo_epochs are actually two different things. The num_train_epochs means how many epochs do we go over the dataset, the num_ppo_epochs means the number of epochs we perform PPO updates on a batch of data. So, there is a subtle but meaningful difference here.

from trl.

meng-wenlong commented on August 25, 2024

@vwxyzjn Thanks for your comments. Just to clarify, is mini_batch_size the batch size used when we perform PPO updates on a batch of data? If so, does per_device_train_batch_size actually control the batch size of a mini-batch in PPOv2Trainer?

from trl.

vwxyzjn commented on August 25, 2024

Just to clarify, is mini_batch_size the batch size used when we perform PPO updates on a batch of data?

The terminology goes like this:

import numpy as np
batch_size = 8
nminibatches = 2
gradient_accumulation_steps = 2
mini_batch_size = batch_size // nminibatches
micro_batch_size = mini_batch_size // gradient_accumulation_steps
data = np.arange(batch_size).astype(np.float32)
print("data:", data)
print("batch_size:", batch_size)
print("mini_batch_size:", mini_batch_size)
print("micro_batch_size:", micro_batch_size)
for epoch in range(4):
    batch_inds = np.random.permutation(batch_size)
    print("epoch:", epoch, "batch_inds:", batch_inds)
    for mini_batch_start in range(0, batch_size, mini_batch_size):
        mini_batch_end = mini_batch_start + mini_batch_size
        mini_batch_inds = batch_inds[mini_batch_start:mini_batch_end]
        
        # `optimizer.zero_grad()` set optimizer to zero for gradient accumulation
        for micro_batch_start in range(0, mini_batch_size, micro_batch_size):
            micro_batch_end = micro_batch_start + micro_batch_size 
            micro_batch_inds = mini_batch_inds[micro_batch_start:micro_batch_end]
            print("____⏩ a forward pass on", data[micro_batch_inds])
        # `optimizer.step()`
        print("⏪ a backward pass on", data[mini_batch_inds])

# data: [0. 1. 2. 3. 4. 5. 6. 7.]
# batch_size: 8
# mini_batch_size: 4
# micro_batch_size: 2
# epoch: 0 batch_inds: [6 4 0 7 3 5 1 2]
# ____⏩ a forward pass on [6. 4.]
# ____⏩ a forward pass on [0. 7.]
# ⏪ a backward pass on [6. 4. 0. 7.]
# ____⏩ a forward pass on [3. 5.]
# ____⏩ a forward pass on [1. 2.]
# ⏪ a backward pass on [3. 5. 1. 2.]
# epoch: 1 batch_inds: [6 7 3 2 0 4 5 1]
# ____⏩ a forward pass on [6. 7.]
# ____⏩ a forward pass on [3. 2.]
# ⏪ a backward pass on [6. 7. 3. 2.]
# ____⏩ a forward pass on [0. 4.]
# ____⏩ a forward pass on [5. 1.]
# ⏪ a backward pass on [0. 4. 5. 1.]
# epoch: 2 batch_inds: [1 4 5 6 0 7 3 2]
# ____⏩ a forward pass on [1. 4.]
# ____⏩ a forward pass on [5. 6.]
# ⏪ a backward pass on [1. 4. 5. 6.]
# ____⏩ a forward pass on [0. 7.]
# ____⏩ a forward pass on [3. 2.]
# ⏪ a backward pass on [0. 7. 3. 2.]
# epoch: 3 batch_inds: [7 2 4 1 3 0 6 5]
# ____⏩ a forward pass on [7. 2.]
# ____⏩ a forward pass on [4. 1.]
# ⏪ a backward pass on [7. 2. 4. 1.]
# ____⏩ a forward pass on [3. 0.]
# ____⏩ a forward pass on [6. 5.]
# ⏪ a backward pass on [3. 0. 6. 5.]

from trl.

vwxyzjn commented on August 25, 2024

per_device_train_batch_size should really be named forward_batch_size. It's the size of the data that goes in to model.foward(data), so per_device_train_batch_size effectively helps with OOMs.

from trl.

TypeError when not passing total_episodes in PPOv2Trainer about trl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent