I want to run a version of triton that has both the TensorRT LLM backend and the other

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

No, the Python Backend should be the same. <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The trt llm container does not have the other backends about server HOT 7 OPEN

MatthieuToulemont commented on August 14, 2024

The trt llm container does not have the other backends

from server.

Comments (7)

krishung5 commented on August 14, 2024

Hi @MatthieuToulemont, the Triton TRT-LLM container is a special container that only contains TRT-LLM backend and Python backend. If you'd like to have other backends, you could try with either way

copy over the backends in the regular Triton container from /opt/tritonserver/backends/*
build the container following the steps here.

from server.

MatthieuToulemont commented on August 14, 2024

Ok, is the python backend from the trt-llm different from the one of 24.05-py3 ?

from server.

krishung5 commented on August 14, 2024

No, the Python Backend should be the same.

from server.

tricky61 commented on August 14, 2024

No, the Python Backend should be the same.

Does the 24.05-py3 contain backends of ONNX、TensorRT and TorchScript?

from server.

krishung5 commented on August 14, 2024

@tricky61 The nvcr.io/nvidia/tritonserver:24.05-py3 container contains ONNX, TRT and PyTorch backends. The nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 only has TRTLLM and Python backends.

from server.

tricky61 commented on August 14, 2024

@tricky61 The nvcr.io/nvidia/tritonserver:24.05-py3 container contains ONNX, TRT and PyTorch backends. The nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 only has TRTLLM and Python backends.

ok. I am using nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3
I also use tritonserver:23.11 and add vllm backend manually.
I will try to add vllm backends to nvcr.io/nvidia/tritonserver:24.05-py3 manually.
Does this method make difference？because the nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3 is 11.2 GB and nvcr.io/nvidia/tritonserver:24.05-py3 costs 7.55GB

from server.

krishung5 commented on August 14, 2024

@tricky61 It shouldn't make any difference. Note that you'd have to pip install vllm and make sure model.py exists under /opt/tritonserver/backends/vllm_backend.

from server.

Recommend Projects

The trt llm container does not have the other backends about server HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent