Git Product home page Git Product logo

Comments (4)

tpoisonooo avatar tpoisonooo commented on May 29, 2024 2

TurboMind is indeed developed based on fastertransformer, but if you use beyond compare to see the difference, the two are so different that they are not the same repo

  1. FT not support LLaMa, so you can not directy inference on it
  2. FT has no KV Cache Manager, no reliable quantization, int8_model==2 actually does not work
  3. FT has no fmha and many trivial optimizations

Hopes @lzhangzz give more description.

from lmdeploy.

lzhangzz avatar lzhangzz commented on May 29, 2024

@happened in addition to @tpoisonooo's response,

  1. FT's context decoder implementation requires k_len == q_len thus context decoder is only used in the first round of a conversation. Our implementation supports context decoding new input tokens for throughout the conversation
  2. With our caching mechanism only new input tokens will be decoded (not the entire history) unless the sequence has been evicted from the cache
  3. Our KV Cache Manager implements LRU policy so that least recently used sequence will be evicted into token indices (the most compact form of KV cache) and recomputed when requested, so you don't have to worry about OOM
  4. We support persistent batch (you may know it as "continuous batching") for both Python API or serving with tritonserver

from lmdeploy.

tpoisonooo avatar tpoisonooo commented on May 29, 2024

#95

from lmdeploy.

tpoisonooo avatar tpoisonooo commented on May 29, 2024

@happened Please read this PR and give your comments #101

from lmdeploy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.