Git Product home page Git Product logo

Comments (4)

oandreeva-nv avatar oandreeva-nv commented on August 15, 2024 2

Hi @CoolFish88 ,
Thank you for your questions and your interest in Triton + vLLM.

  1. Connection between model.json and model.py.
    model.json specifies vLLM engine arguments, e.g. model_name, tensor_parallel_size, gpu_memory_utilization, etc. For the full list of parameters, please refer to https://github.com/vllm-project/vllm/blob/b422d4961a3052c5b4bcfc3747a1ad55acfe7eb8/vllm/engine/arg_utils.py#L23

model.py is a Triton-required file. You can learn more about it's components here.

Now, the following may sound confusing, so feel free to ask questions. vLLM backend depends on python_backend to load and serve models. To learn more about python-based backends, please refer here.
Alternatively to using vLLM backend, you can always deploy any vLLM model with python backend. However, in case of vLLM, it is sufficient to implement TritonPythonModel interface only once and re-use it across multiple models, specified in model.json. For this use-case, we've introduced python-based backend feature, when Triton Inference Server treats common model.py script as a backend; it loads libtriton_python.so first, this ensures that Triton knows how to send requests to the backend for execution and the backend knows how to communicate with Triton. Then, Triton makes sure to use common model.py from the backend's repository, and not look for it in the model repository.

  1. Will this overwrite the contents of a user supplied config.pbtxt in the model registry?

No. If you happen to notice any unexpected behaviors, please file a bug.

  1. If negative, will vLLM specific settings in auto_complete_config still be executed?

Yes

  1. What would be the best approach in the vLLM backend context to operate in with user supplied config files?

Could you please clarify this question?

  1. I already have a model.py which I would like to operate with the vLLM backend.
    Wonderful! If you would like to use only your custom model.py as a new vllm_backend, simply replace/put it in backends/vllm_backend repo inside the docker container. Alternatively, you can specify bcakend: python in config.pbtxt file of your model and put model.py under model_repository/<your_vllm_model>/<model_version>/. However, if you would like to re-use model.py, than the first option works better.

from server.

dyastremsky avatar dyastremsky commented on August 15, 2024 1

The model.py is what's used by Triton as the vLLM backend (technically an implementation of the Python backend). It'll exist in a vLLM directory under the backends directory. If you want to understand that directory structure better, you can read about how Triton backends and custom backends work in the backend repo.

If you're trying to modify model.py to work with the vLLM engine differently, that's the place to do it. You're essentially creating your own custom backend though, so that may require going through documentation a bit, including the links above. Otherwise, you'd want to leave this file as is and that tutorial should set you in the right direction.

As far as autocomplete, it should only run if the config.pbtxt is not provided or certain fields are not provided.

from server.

oandreeva-nv avatar oandreeva-nv commented on August 15, 2024 1

When using the Python backend, Triton Inference Server would look into my model repository and fetch my custom model.py implementation as a backend to use

Not really. When you explicitly use python backend, it looks for libtriton_python.so (in your model's repo and under backends/python/) as a backend and your model.py is considered a model, which contains implimentation of execute function, which defines how to run your model.

libtriton_python.so is the library, that implements backend APIs, so that Triton Server knows how to run model.py. And model.py is basically your custom component.

When using a Python-based backend (e.g. vLLM), Triton Inference Server would ignore the model.py file located in the model repository and use the model.py file residing (together with the three file artefacts) in its own repository under /opt/tritonserver/backends/

It will try to find .so library first, when in case nothing is found, it will look for libtriton_python.so and use model.py as a common model definition for all custom models, that use this python-based backend.

as my existing model.py implementation of the TritonPythonModel interface is devoid of any vLLM sugar and contains custom code in the execute and initialize methods, it seems that I have to merge the vLLM model.py

If you don't need anything from vLLM's model.py we provide, there is no need to use it. You can put your model.py under backends/vllm_custom and then specify backend: vllm_custom in your model's config.pbtxt files, make sure backends/python is also present and contains 3 files. If you need something we provide in vllm python backend, then yes, feel free to merge your 2 files.

from server.

CoolFish88 avatar CoolFish88 commented on August 15, 2024

Hello @oandreeva-nv and @dyastremsky,

Thank you for addressing my questions with plenty of detail.

I had previously consulted the documentation of the python backend when drafting the custom model.py file according to the TritonPythonModel interface. This file would reside under the model repository path, as showcased here, and the config.pbtxt file would reference the python backend. And the world was a happy place to be in.

Then I decided to leverage the advantages of the vLLM framework and things started to get really foggy really fast. At that point in time, I came across the Python-based backends that you referenced in the answers, and this is when things started to get unclear, as my model repository wouldn't contain the artefacts: libtriton_python.so, triton_python_backend_stub, and triton_python_backend_utils.py.

Correct me if my understanding is wrong:

  • When using the Python backend, Triton Inference Server would look into my model repository and fetch my custom model.py implementation as a backend to use (if it doesn't exist, will it fetch the backend implementation by looking into /opt/tritonserver/backends/. ??), and use the three file artefacts (libtriton_python.so, triton_python_backend_stub, and triton_python_backend_utils.py) from the backend repository (since they are not present in the model repository).

  • When using a Python-based backend (e.g. vLLM), Triton Inference Server would ignore the model.py file located in the model repository and use the model.py file residing (together with the three file artefacts) in its own repository under /opt/tritonserver/backends/

Now, as my existing model.py implementation of the TritonPythonModel interface is devoid of any vLLM sugar and contains custom code in the execute and initialize methods, it seems that I have to merge the vLLM model.py with my own and resort to one of the two strategies that you @oandreeva-nv mentioned under bulletpoint 5. If I go for the first strategy, I may use model.json to pass parameters to the refactored version of vLLM backend containing bits of the custom code I needed.

Could you please validate my understanding?

from server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.