Git Product home page Git Product logo

Comments (7)

bordeauxred avatar bordeauxred commented on August 15, 2024

@MischaPanch, concerning the purpose, the preprocess_fn was introduced in #42

from tianshou.

MischaPanch avatar MischaPanch commented on August 15, 2024

@Trinkle23897 what do you think, can we remove it? It does appear entirely unused, and whatever the purpose of #42, it can likely be solved in other ways (if it is ever needed)

from tianshou.

MischaPanch avatar MischaPanch commented on August 15, 2024

I now had a closer look at the discussion in #42 and came to believe more that the current architecture of preprocess_fn is not the best way to reach the goals stated there. Apart from that, the goals don't seem to be in demand by use cases.

I think we can go ahead and remove it

from tianshou.

Trinkle23897 avatar Trinkle23897 commented on August 15, 2024

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

from tianshou.

MischaPanch avatar MischaPanch commented on August 15, 2024

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

We could do that as next step, could you write an issue briefly explaining why it would improve throughput? Such a method would probably also help a lot in the n_episode version of Collector.collect (see #1042 )

from tianshou.

Trinkle23897 avatar Trinkle23897 commented on August 15, 2024

The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style.

There are some constraints in front of this assumption:

  1. We cannot sequentially send data to GPU to achieve the same throughput as batch-style easily
  2. The model is relatively small, and it's not memory-bound
  3. The Environment step function takes a small amount of time (including reward calculation), at least shorter than policy forward

These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock collector.collect time. For example, in RLHF case:

  • LLM's completion function can be implemented in a fully-async style to achieve the same throughput as batch completion, as long as you provide enough thread/process to handle per request. That invalids (1) (2);
  • The environment needs a reward model to calculate rewards. If we do things in batch-style, we have to do all policy sampling first, sync, and do reward calculation. The system might be environment throughput bound by not investigating enough compute for reward. But if you can do policy/reward calculation in a fully async way, you can remove all bubbles. That invalids (3).

from tianshou.

MischaPanch avatar MischaPanch commented on August 15, 2024

@Trinkle23897 I agree with you, there's probably a bunch of performance related things that can be optimized in the collection. I will open a separate issue from your comment.

This issue however is just about refactoring the current collector to make it amenable to such improvements in the first place. The collect method is very convoluted, I don't feel comfortable reviewing any function improvements on it (like #1042, which started this whole story) before the code is more structured and readable. Removing unnecessary state from the collectors goes a long way in that direction, which is what this issue is about.

PS: for pypi, I sent you an email

from tianshou.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.