Comments (4)
Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?
from server.
Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?
I have understand this blog. But I still don't success because the shape of model's input is 3D, but expect of ragged input is 1D. I don't know how to modify my model to adapt the feature.
In your introduction, you did not show how to flatten the input with feature dimensions to one dimension. Can you provide a configuration example of a standard BERT model? I believe many people need it.
from server.
Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?
from server.
@kthui: Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?
I am not familiar with any BERT models right now :/
from server.
Related Issues (20)
- Single docker layer is too large HOT 2
- Memory over 100% with decoupled dali video model HOT 1
- When the request is large, the Triton server has a very high TTFT. HOT 1
- Uneven QPS leads to low throughput and high latency as well as low GPU utilization HOT 13
- Triton Server 24.05 can't initialize CUDA drivers if host system has installed Nvidia driver 555.85 HOT 2
- Building and developing with libtritonserver.so HOT 2
- CUDA runtime API error raised when using only cpu on Mac M3 HOT 1
- Segmentation fault (core dumped) - Server version 2.46.0 HOT 5
- Segmentation fault when multi-requsts to triton-vllm HOT 8
- Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs HOT 4
- Triton Tensorrt-LLM 24.04 and 24.05 are very large HOT 3
- Poll failed for model directory 'diabetes_model': Invalid model name: Could not determine backend for model 'diabetes_model' with no backend in model configuration. Expected model name of the form 'model.<backend_name>' HOT 3
- Triton server crash when running a large model with an ONNX/CPU backend HOT 1
- Large latency when use `tritonclient.http.aio.infer` HOT 1
- tritonserver log problem
- The trt llm container does not have the other backends HOT 6
- Regression from 23.07 to 24.05 on model count lifecycle/restarts HOT 4
- Triton/vllm_backend launches model on incorrect GPU
- Add torch.set_float32_matmul_precision settting in Libtorch backend
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from server.