The Collector tracks the self.data field that is a <c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I now had a closer look at the discussion in <a class="issue-link js-issue-link" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Remove data from state in Collector, and remove preprocess_fn there about tianshou HOT 7 CLOSED

MischaPanch commented on August 15, 2024

Remove data from state in Collector, and remove preprocess_fn there

from tianshou.

Comments (7)

bordeauxred commented on August 15, 2024

@MischaPanch, concerning the purpose, the preprocess_fn was introduced in #42

from tianshou.

MischaPanch commented on August 15, 2024

@Trinkle23897 what do you think, can we remove it? It does appear entirely unused, and whatever the purpose of #42, it can likely be solved in other ways (if it is ever needed)

from tianshou.

MischaPanch commented on August 15, 2024

I now had a closer look at the discussion in #42 and came to believe more that the current architecture of preprocess_fn is not the best way to reach the goals stated there. Apart from that, the goals don't seem to be in demand by use cases.

I think we can go ahead and remove it

from tianshou.

Trinkle23897 commented on August 15, 2024

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

from tianshou.

MischaPanch commented on August 15, 2024

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

We could do that as next step, could you write an issue briefly explaining why it would improve throughput? Such a method would probably also help a lot in the n_episode version of Collector.collect (see #1042 )

from tianshou.

Trinkle23897 commented on August 15, 2024

The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style.

There are some constraints in front of this assumption:

We cannot sequentially send data to GPU to achieve the same throughput as batch-style easily
The model is relatively small, and it's not memory-bound
The Environment step function takes a small amount of time (including reward calculation), at least shorter than policy forward

These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock collector.collect time. For example, in RLHF case:

LLM's completion function can be implemented in a fully-async style to achieve the same throughput as batch completion, as long as you provide enough thread/process to handle per request. That invalids (1) (2);
The environment needs a reward model to calculate rewards. If we do things in batch-style, we have to do all policy sampling first, sync, and do reward calculation. The system might be environment throughput bound by not investigating enough compute for reward. But if you can do policy/reward calculation in a fully async way, you can remove all bubbles. That invalids (3).

from tianshou.

MischaPanch commented on August 15, 2024

@Trinkle23897 I agree with you, there's probably a bunch of performance related things that can be optimized in the collection. I will open a separate issue from your comment.

This issue however is just about refactoring the current collector to make it amenable to such improvements in the first place. The collect method is very convoluted, I don't feel comfortable reviewing any function improvements on it (like #1042, which started this whole story) before the code is more structured and readable. Removing unnecessary state from the collectors goes a long way in that direction, which is what this issue is about.

PS: for pypi, I sent you an email

from tianshou.

Remove data from state in Collector, and remove preprocess_fn there about tianshou HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent