isarsoft / yolov4-triton-tensorrt Goto Github PK

View Code? Open in Web Editor NEW

276.0 276.0 62.0 644 KB

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

Home Page: http://www.isarsoft.com

License: Other

CMake 1.16% Cuda 9.04% C++ 73.92% Python 15.88%

deep-learning docker object-detection tensorrt triton-inference-server yolov4 yolov4-tiny

yolov4-triton-tensorrt's People

Contributors

Stargazers

Watchers

yolov4-triton-tensorrt's Issues

how to get the “.engine” file from the model trained by myself

When using the ".wts" file downloaded from Google Driver,I can get the yolov4.engine successfully.

However when change the weight file to mine, I got error while running the ./main
The error is :
main: /yolov4-triton-tensorrt/networks/../utils/weights.h:17: std::map<std::__cxx11::basic_string, nvinfer1::Weights> loadWeights(std::__cxx11::string): Assertion `count > 0 && "Invalid weight map file."' failed.

By the way. it is error when uisng the weight downloaded from "https://github.com/AlexeyAB/darknet" which size is 246MB.

What should i do to transform my model to ".engine"

yolov4-tiny model accuracy not right

@philipp-schmidt
I have trained custom yolov4-tiny model and deployed to the triton server, but the accuracy of yolov4-tiny model is not correct, How shuold I debug step by step ? thanks .

customized yolov4 model build failed

Hi @philipp-schmidt
the ./main program is crashed when I build my customized yolov4 model from darknet trained, I have modified the network/yolov4.h defined CLASS_NUM parameter, where else should I modify? thanks. my program reported errors like this:
[Info] Creating builder
[Info] Creating model yolov4
[06/07/2021-07:08:04] [E] [TRT] Parameter check failed at: ../builder/Network.cpp::addScale::494, condition: shift.count > 0 ? (shift.values != nullptr) : (shift.values == nullptr)
main: /yolov4-triton-tensorrt/networks/yolov4.h:65: nvinfer1::IScaleLayer* yolov4::addBatchNorm2d(nvinfer1::INetworkDefinition*, std::map<std::__cxx11::basic_string, nvinfer1::Weights>&, nvinfer1::ITensor&, std::string, float): Assertion `scale_1' failed.
Aborted (core dumped)

Postprocessing Issue

Hi,
After custom yolov4 model trained to 10000 epoch (map displayed as 6% at that epoch) in original YoloV4, I have recreated all necessary files and run your client.py on a custom image.
At the same time, I run original Yolov4 detector with same 10K weights, on above image.
Yolov4 detector made 2 individual detections (62% and 49% with default threshold, an additional one with 9%, at 0.05 thresh.).
client.py made 3 individual detections of same objects, including the last one above.

1- I will let it train up to 40K and retry, but may there be a difference in NMS stuff ?
2- Also, It would be very good if client.py could show confidence scores both on output image and terminal.
3- Is it possible to let client.py show custom names on detections, instead of coco labels?
4- One small thing I noticed about client.py, after closing the image, its not returning to prompt and quits manually with Ctrl+Z.

Alp

mAP of deployed yolov5x decreased

hi, I infer surveillance videos with yolov5x according to https://github.com/wang-xinyu/tensorrtx/tree/master/yolov5. [email protected] of that is 94.8%.

After deploying yolov5x with triton20.08, I found that [email protected] decreased.

Do you know the reasons for that? Thanks!

triton client output

Frame 213: 90 raw boxes, 4 objects
CAR:     0.95
CAR:     0.92
CAR:     0.85
CHEMICAL_VEHICLE:     0.74
time:    28.7ms
Frame 214: 91 raw boxes, 4 objects
CAR:     0.94
CAR:     0.93
CAR:     0.86
CHEMICAL_VEHICLE:     0.72
time:    28.5ms
Frame 215: 104 raw boxes, 4 objects
CAR:     0.94
CAR:     0.93
CAR:     0.85
TRUCK:     0.56
time:    28.0ms
Frame 216: 91 raw boxes, 4 objects
CAR:     0.95
CAR:     0.94
CAR:     0.86
TRUCK:     0.74
time:    28.1ms
Frame 217: 90 raw boxes, 4 objects
CAR:     0.95
CAR:     0.94
CAR:     0.86
TRUCK:     0.74
time:    27.7ms
Frame 218: 107 raw boxes, 4 objects
CAR:     0.95
CAR:     0.95
CAR:     0.88
TRUCK:     0.76
time:    27.9ms

postprocess func

def postprocess(buffer, image_width, image_height, conf_threshold=0.8, nms_threshold=0.5):
    detected_objects = []
    img_scale = [image_width / INPUT_WIDTH, image_height / INPUT_HEIGHT, image_width / INPUT_WIDTH, image_height / INPUT_HEIGHT]
    num_bboxes = int(buffer[0, 0, 0, 0])

    if num_bboxes:
        bboxes = buffer[0, 1 : (num_bboxes * 6 + 1), 0, 0].reshape(-1, 6)
        labels = set(bboxes[:, 5].astype(int))

        for label in labels:
            selected_bboxes = bboxes[np.where((bboxes[:, 5] == label) & (bboxes[:, 4] >= conf_threshold))]
            selected_bboxes_keep = selected_bboxes[nms(selected_bboxes[:, :4], selected_bboxes[:, 4], nms_threshold)]
            for idx in range(selected_bboxes_keep.shape[0]):
                box_xy = selected_bboxes_keep[idx, :2]
                box_wh = selected_bboxes_keep[idx, 2:4]
                score = selected_bboxes_keep[idx, 4]

                box_x1y1 = box_xy - (box_wh / 2)
                box_x2y2 = np.minimum(box_xy + (box_wh / 2), [INPUT_WIDTH, INPUT_HEIGHT])
                box = np.concatenate([box_x1y1, box_x2y2])
                box *= img_scale

                if box[0] == box[2]:
                    continue
                if box[1] == box[3]:
                    continue
                detected_objects.append(BoundingBox(label, score, box[0], box[2], box[1], box[3], image_height, image_width))
    return detected_objects

Triton server through KFserving

Hi,
As a non-developer, but more a user, I have managed to create a fully local kubernetes cluster (currently single node [12CPU, 16GB Ram], more physical GPU nodes waiting aside) installed with KFserving, knative and istio components. Now I am looking to find a way to use yolov4-triton under KFserving, in hope of utilizing autoscaling features. You can see a general view of my cluster in attached screenshot images.

I am looking to find out what else is missing to send inference requests. Naturally, a triton pod and service will be needed, but not sure about their configuration to let it receive requests through KFserving components. Unfortunately, the sample provided in github/KFserving for Triton server is for the Bert model, and looks complicated then object detection (several python files involved etc.).

If anyone experimented on this, or wants to cooperate, happy to hear.

And, I wish the best for all of you, in 2021 !

Failed to connect to all addresses

Hello, under kubernetes, I am receiving following error from client.py which is trying to connect to a worker node via -u IP: portnumber :

$ python client.py -u 192.168.1.26:30353 imageTraceback (most recent call last): File "client.py", line 120, in if not triton_client.is_server_live(): File "/home/user/miniconda3/envs/yolov4-triton/lib/python3.7/site-packages/tritonclient/grpc/init.py", line 217, in is_server_live raise_error_grpc(rpc_error) File "/home/user/miniconda3/envs/yolov4-triton/lib/python3.7/site-packages/tritonclient/grpc/init.py", line 61, in raise_error_grpc raise get_error_grpc(rpc_error) from Nonetritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses

Kubernetes

Hello,

I am looking for necessary steps to use this repo under kubernetes.
Do you think just replacing the repo name is enough, or what additional steps may be required? Appreciate any high overviews.

Python inference returns 'shape': [1, 1]

Cool repo, thanks for sharing! I got yolov4 working great on triton thanks to you.

I'm interested in writing a python inference client (and share it back to the repo once it's done). I started off with the image_client.py example script from official nvidia repo, and after a little tweaking I got stuck.

Everything seems to work just fine, I send an image to the triton engine using the http protocol and get a response back, but things go wrong in postprocessing.

Below is the logs of the triton engine when I send a request with the python client:

I0918 13:17:27.860435 1 http_server.cc:1185] HTTP request: 0 /v2/models/yolov4
I0918 13:17:27.860469 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:27.860480 1 model_repository_manager.cc:564] VersionStates() 'yolov4'
I0918 13:17:27.860726 1 http_server.cc:1185] HTTP request: 0 /v2/models/yolov4/config
I0918 13:17:27.860736 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:28.721807 1 http_server.cc:1185] HTTP request: 2 /v2/models/yolov4/infer
I0918 13:17:28.721850 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:28.768381 1 infer_request.cc:347] add original input: [0x0x7f987c005ac0] request id: 1, model: yolov4, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: []
override inputs:
inputs:
original requested outputs:
requested outputs:

I0918 13:17:28.908998 1 infer_request.cc:480] prepared: [0x0x7f987c005ac0] request id: 1, model: yolov4, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: [3,608,608]
override inputs:
inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: [3,608,608]
original requested outputs:
prob
requested outputs:
prob

I0918 13:17:28.909074 1 plan_backend.cc:1634] Running yolov4_0_gpu0 with 1 requests
I0918 13:17:28.909120 1 plan_backend.cc:2355] Optimization profile default [0] is selected for yolov4_0_gpu0
I0918 13:17:28.909147 1 pinned_memory_manager.cc:130] pinned memory allocation: size 4435968, addr 0x7f994a000090
I0918 13:17:28.914680 1 plan_backend.cc:1894] Context with profile default [0] is being executed for yolov4_0_gpu0
I0918 13:17:28.917198 1 infer_response.cc:74] add response output: output: prob, type: FP32, shape: [1,7001,1,1]
I0918 13:17:28.917220 1 http_server.cc:1136] HTTP: unable to provide 'prob' in GPU, will use CPU
I0918 13:17:28.917232 1 http_server.cc:1156] HTTP using buffer for: 'prob', size: 28004, addr: 0x7f97f3f9bf00
I0918 13:17:28.917240 1 pinned_memory_manager.cc:130] pinned memory allocation: size 28004, addr 0x7f994a43b0a0
I0918 13:17:28.917267 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f994a000090
I0918 13:17:28.923462 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f994a43b0a0
I0918 13:17:28.923628 1 http_server.cc:1171] HTTP release: size 28004, addr 0x7f97f3f9bf00

We can see that the output layer prob is supposed to have shape [1,7001,1,1]. But in the python client I only recieve a [1, 1] shape:

{'id': '1', 'model_name': 'yolov4', 'model_version': '1', 'outputs': [{'name': 'prob', 'datatype': 'BYTES', 'shape': [1, 1], 'data': ['633.535400:3']}]}

It has probably something to do with it being interpreted as BYTES maybe? If I crack this and find a working python script, I'll PR it back to this repo :)

Thanks!

error: creating server: Internal - failed to load all models

I encountered some errors when I was running：
I create "triton-deploy" folder in the same directory as "yolov4-triton-tensorrt", such as:
triton-deploy yolov4-triton-tensorrt

triton-deploy/models/yolov4/1
model.plan
triton-deploy/plugins
liblayerplugin.so

I run "docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/triton-deploy/models:/models -v$(pwd)/triton-deploy/plugins:/plugins --env LD_PRELOAD=/plugins/liblayerplugin.so nvcr.io/nvidia/tritonserver:20.10-py3 tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose 1
", then I got those error:

== Triton Inference Server ==

NVIDIA Release 20.10 (build )

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0517 06:20:39.062929 1 metrics.cc:184] found 4 GPUs supporting NVML metrics
I0517 06:20:39.068694 1 metrics.cc:193] GPU 0: GeForce RTX 2080 Ti
I0517 06:20:39.074545 1 metrics.cc:193] GPU 1: GeForce RTX 2080 Ti
I0517 06:20:39.080262 1 metrics.cc:193] GPU 2: GeForce RTX 2080 Ti
I0517 06:20:39.086189 1 metrics.cc:193] GPU 3: GeForce RTX 2080 Ti
I0517 06:20:39.343302 1 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7ff9e6000000' with size 268435456
I0517 06:20:39.345681 1 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864
I0517 06:20:39.345690 1 cuda_memory_manager.cc:98] CUDA memory pool is created on device 1 with size 67108864
I0517 06:20:39.345694 1 cuda_memory_manager.cc:98] CUDA memory pool is created on device 2 with size 67108864
I0517 06:20:39.345697 1 cuda_memory_manager.cc:98] CUDA memory pool is created on device 3 with size 67108864
W0517 06:20:39.633003 1 server.cc:235] failed to enable peer access for some device pairs
I0517 06:20:39.633077 1 netdef_backend_factory.cc:46] Create NetDefBackendFactory
I0517 06:20:39.633086 1 plan_backend_factory.cc:48] Create PlanBackendFactory
I0517 06:20:39.633091 1 plan_backend_factory.cc:55] Registering TensorRT Plugins
I0517 06:20:39.633160 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0517 06:20:39.633175 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0517 06:20:39.633185 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0517 06:20:39.633194 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0517 06:20:39.633203 1 logging.cc:52] Registered plugin creator - ::Clip_TRT version 1
I0517 06:20:39.633211 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0517 06:20:39.633221 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0517 06:20:39.633232 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0517 06:20:39.633242 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0517 06:20:39.633253 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0517 06:20:39.633262 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I0517 06:20:39.633271 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0517 06:20:39.633280 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0517 06:20:39.633288 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0517 06:20:39.633297 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0517 06:20:39.633306 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0517 06:20:39.633315 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0517 06:20:39.633329 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0517 06:20:39.633336 1 logging.cc:52] Registered plugin creator - ::Split version 1
I0517 06:20:39.633343 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0517 06:20:39.633352 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0517 06:20:39.633364 1 libtorch_backend_factory.cc:53] Create LibTorchBackendFactory
I0517 06:20:39.633374 1 custom_backend_factory.cc:46] Create CustomBackendFactory
I0517 06:20:39.633379 1 backend_factory.h:44] Create TritonBackendFactory
I0517 06:20:39.633396 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I0517 06:20:39.633522 1 autofill.cc:142] TensorFlow SavedModel autofill: Internal: unable to autofill for 'yolov4', unable to find savedmodel directory named 'model.savedmodel'
I0517 06:20:39.633546 1 autofill.cc:155] TensorFlow GraphDef autofill: Internal: unable to autofill for 'yolov4', unable to find graphdef file named 'model.graphdef'
I0517 06:20:39.633568 1 autofill.cc:168] PyTorch autofill: Internal: unable to autofill for 'yolov4', unable to find PyTorch file named 'model.pt'
I0517 06:20:39.633592 1 autofill.cc:180] Caffe2 NetDef autofill: Internal: unable to autofill for 'yolov4', unable to find netdef files: 'model.netdef' and 'init_model.netdef'
I0517 06:20:39.633622 1 autofill.cc:212] ONNX autofill: Internal: unable to autofill for 'yolov4', unable to find onnx file or directory named 'model.onnx'
E0517 06:20:55.232289 1 logging.cc:43] coreReadArchive.cpp (41) - Serialization Error in verifyHeader: 0 (Version tag does not match. Note: Current Version: 96, Serialized Engine Version: 89)
E0517 06:20:55.232422 1 logging.cc:43] INVALID_STATE: std::exception
E0517 06:20:55.232434 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
I0517 06:20:55.233322 1 autofill.cc:225] TensorRT autofill: Internal: unable to autofill for 'yolov4', unable to find a compatible plan file.
W0517 06:20:55.233338 1 autofill.cc:265] Proceeding with simple config for now
I0517 06:20:55.233346 1 model_config_utils.cc:637] autofilled config: name: "yolov4"

E0517 06:20:55.233899 1 model_repository_manager.cc:1604] unexpected platform type for yolov4
I0517 06:20:55.233944 1 server.cc:141]
+---------+--------+------+
| Backend | Config | Path |
+---------+--------+------+
+---------+--------+------+

I0517 06:20:55.233951 1 model_repository_manager.cc:469] BackendStates()
I0517 06:20:55.233960 1 server.cc:184]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0517 06:20:55.234043 1 tritonserver.cc:1621]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.4.0 |
| server_extensions | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+

I0517 06:20:55.234053 1 server.cc:280] Waiting for in-flight requests to complete.
I0517 06:20:55.234060 1 model_repository_manager.cc:435] LiveBackendStates()
I0517 06:20:55.234064 1 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

grpc error

error

(py38torch16) cp@ubuntu:~/project/yolov5/client$ python client.py dummy

Traceback (most recent call last): File "client.py", line 120, in <module> if not triton_client.is_server_live(): File "/home/cp/anaconda3/envs/py38torch16/lib/python3.8/site-packages/tritonclient/grpc/__init__.py", line 217, in is_server_live raise_error_grpc(rpc_error) File "/home/cp/anaconda3/envs/py38torch16/lib/python3.8/site-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED]

conda env

absl-py 0.10.0 pypi_0 pypi
autopep8 1.5.4 pypi_0 pypi
blas 1.0 mkl anaconda
ca-certificates 2020.7.22 0 anaconda
cachetools 4.1.1 pypi_0 pypi
certifi 2020.6.20 py38_0 anaconda
chardet 3.0.4 pypi_0 pypi
click 7.1.2 pypi_0 pypi
cudatoolkit 10.1.243 h6bb024c_0 anaconda
cycler 0.10.0 pypi_0 pypi
cython 0.29.21 pypi_0 pypi
freetype 2.10.2 h5ab3b9f_0 anaconda
future 0.18.2 pypi_0 pypi
gevent 20.9.0 pypi_0 pypi
geventhttpclient 1.4.4 pypi_0 pypi
google-auth 1.21.3 pypi_0 pypi
google-auth-oauthlib 0.4.1 pypi_0 pypi
greenlet 0.4.17 pypi_0 pypi
grpcio 1.32.0 pypi_0 pypi
idna 2.10 pypi_0 pypi
intel-openmp 2020.2 254 anaconda
jpeg 9b habf39ab_1 anaconda
kiwisolver 1.2.0 pypi_0 pypi
lcms2 2.11 h396b838_0 anaconda
ld_impl_linux-64 2.33.1 h53a641e_7 anaconda
libedit 3.1.20191231 h14c3975_1 anaconda
libffi 3.3 he6710b0_2 anaconda
libgcc-ng 9.1.0 hdf63c60_0 anaconda
libpng 1.6.37 hbc83047_0 anaconda
libstdcxx-ng 9.1.0 hdf63c60_0 anaconda
libtiff 4.1.0 h2733197_1 anaconda
lz4-c 1.9.2 he6710b0_1 anaconda
markdown 3.2.2 pypi_0 pypi
matplotlib 3.3.2 pypi_0 pypi
mkl 2020.2 256 anaconda
mkl-service 2.3.0 py38he904b0f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft 1.2.0 py38h23d657b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random 1.1.1 py38h0573a6f_0 anaconda
ncurses 6.2 he6710b0_1 anaconda
ninja 1.10.1 py38hfd86e86_0 anaconda
numpy 1.19.1 py38hbc911f0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base 1.19.1 py38hfa32c7d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nvidia-pyindex 1.0.5 pypi_0 pypi
oauthlib 3.1.0 pypi_0 pypi
olefile 0.46 py_0 anaconda
opencv-python 4.4.0.44 pypi_0 pypi
openssl 1.1.1h h7b6447c_0 anaconda
packaging 20.4 pypi_0 pypi
pillow 7.2.0 py38hb39fc2d_0 anaconda
pip 20.2.2 py38_0 anaconda
protobuf 3.13.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycodestyle 2.6.0 pypi_0 pypi
pyparsing 2.4.7 pypi_0 pypi
pyqt5 5.15.1 pypi_0 pypi
pyqt5-sip 12.8.1 pypi_0 pypi
pyqt5-tools 5.15.1.1.7.4 pypi_0 pypi
python 3.8.5 h7579374_1 anaconda
python-dateutil 2.8.1 pypi_0 pypi
python-dotenv 0.15.0 pypi_0 pypi
python-rapidjson 0.9.4 pypi_0 pypi
pytorch 1.6.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
pyyaml 5.3.1 pypi_0 pypi
readline 8.0 h7b6447c_0 anaconda
requests 2.24.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rsa 4.6 pypi_0 pypi
scipy 1.5.2 pypi_0 pypi
setuptools 49.6.0 py38_0 anaconda
sip 5.4.0 pypi_0 pypi
six 1.15.0 py_0 anaconda
sqlite 3.33.0 h62c20be_0 anaconda
tensorboard 2.3.0 pypi_0 pypi
tensorboard-plugin-wit 1.7.0 pypi_0 pypi
tk 8.6.10 hbc83047_0 anaconda
toml 0.10.2 pypi_0 pypi
torchvision 0.7.0 py38_cu101 pytorch
tqdm 4.49.0 pypi_0 pypi
tritonclient 2.5.0 pypi_0 pypi
urllib3 1.25.10 pypi_0 pypi
werkzeug 1.0.1 pypi_0 pypi
wheel 0.35.1 py_0 anaconda
xz 5.2.5 h7b6447c_0 anaconda
zlib 1.2.11 h7b6447c_3 anaconda
zope-event 4.5.0 pypi_0 pypi
zope-interface 5.2.0 pypi_0 pypi
zstd 1.4.4 h0b5b093_3 anaconda

server

I1203 08:37:01.962787 1 model_repository_manager.cc:837] successfully loaded 'yolov5x' version 1
I1203 08:37:01.962801 1 model_repository_manager.cc:622] TriggerNextAction() 'yolov5x' version 1: 0
I1203 08:37:01.962805 1 model_repository_manager.cc:637] no next action, trigger OnComplete()
I1203 08:37:01.962846 1 model_repository_manager.cc:492] GetVersionStates() 'yolov5x'
Starting endpoints, 'inference:0' listening on
I1203 08:37:01.963952 1 grpc_server.cc:494] New request handler for HealthHandler, 1
I1203 08:37:01.963977 1 grpc_server.cc:447] Threads started for HealthHandler
I1203 08:37:01.964128 1 grpc_server.cc:584] New request handler for StatusHandler, 2
I1203 08:37:01.964157 1 grpc_server.cc:447] Threads started for StatusHandler
I1203 08:37:01.964274 1 grpc_server.cc:683] New request handler for RepositoryHandler, 3
I1203 08:37:01.964319 1 grpc_server.cc:447] Threads started for RepositoryHandler
I1203 08:37:01.964416 1 grpc_server.cc:1005] New request handler for InferHandler, 4
I1203 08:37:01.964444 1 grpc_server.cc:447] Threads started for InferHandler
I1203 08:37:01.964547 1 grpc_server.cc:1281] New request handler for StreamInferHandler, 6
I1203 08:37:01.964581 1 grpc_server.cc:447] Threads started for StreamInferHandler
I1203 08:37:01.964672 1 grpc_server.cc:1611] New request handler for ModelControlHandler, 7
I1203 08:37:01.964713 1 grpc_server.cc:447] Threads started for ModelControlHandler
I1203 08:37:01.964798 1 grpc_server.cc:1702] New request handler for SharedMemoryControlHandler, 8
I1203 08:37:01.964849 1 grpc_server.cc:447] Threads started for SharedMemoryControlHandler
I1203 08:37:01.964878 1 grpc_server.cc:1939] Started GRPCService at 0.0.0.0:8001
I1203 08:37:01.964903 1 http_server.cc:1411] Starting HTTPService at 0.0.0.0:8000
I1203 08:37:02.006834 1 http_server.cc:1426] Starting Metrics Service at 0.0.0.0:8002
I1203 08:37:02.012313 1 server.cc:224] Polling model repository
I1203 08:37:17.012641 1 server.cc:224] Polling model repository
I1203 08:37:32.013035 1 server.cc:224] Polling model repository

C++ client produces no detections

Hello @olibartfast

Thank you for your efforts. I am running the c++ client but I faced some issue.
First I needed to change model input and output name.
I also need to change the namespaces in common.hpp file to be compatible with the latest version of triton.

Now the c++ code client works with no errors but I get no detections. However when I run the python client while loading the same model I get bounding boxes and detection.

Could you please check the code and let me know if I need to change anthing to run the c++ client with this repo release.

Thanks.

Compare results with original yolov4 (mAP)

I want to compare the results of our network with the original yolov4 to make sure we get the same accuracy and qualitative results. There could be mistakes or differences in the pre- and postprocessing that I want to rule out.

Optimally we would check its mAP like described in darknet and compare the results.

If anyone feels like implementing this into the existing python client I would gladly accept the PR and mention you in the README.

Also simply checking if we get the same resulting BoundingBoxes for a few images would be a nice start to check if there are differences.

How to deploy multiple customized yolov4 models on the triton server

@philipp-schmidt I can deploy a customized yolov4 model with plugin on the triton sever , Now I want to deploy multiple customized yolov4 models with different CLASS_NUM on the triton server, How should I do it ? Thanks .

tritonclient InferenceServerException

Hi @philipp-schmidt, I successfully convert model Yolov5 to an engine model using this repo: https://github.com/wang-xinyu/tensorrtx/tree/master/yolov5

And following your guide on how to run triton-inference-client, I got this error:
tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] unexpected inference output 'detections' for model 'yolov5'

your function grpcclient.InferRequestedOutput took an argument detection, but for yolov5, it needs a different argument.

how I can find the correct argument for function grpcclient.InferRequestedOutput ?

Thank you so much!

Are smaller input sizes possible? [Jetson Nano]

Hi,

Thanks for this repo and the very easy to follow readme, was able to get this working fine on my jetson nano with the jetpack triton packages.

The yolov4 model that gets built is 608/FP32 which is maybe a bit heavy for the jetson nano. How do I reduce the input size to like 320 or 416 and get FP16 and INT8. Do I need a model build for each different combo of input size and mode?

In other non tensorRT yolov4 implementations you can change input size and mode with just variables, maybe I am not understanding what tensorRT is doing under the hood.

Thanks
~whiskers434

Facing problem to create "engine"

Hi, following the first steps , I did it successfully , without problem :

cd yourworkingdirectoryhere
git clone [email protected]:isarsoft/yolov4-triton-tensorrt.git
docker run --gpus all -it --rm -v $(pwd)/yolov4-triton-tensorrt:/yolov4-triton-tensorrt nvcr.io/nvidia/tensorrt:21.03-py3

then

cd /yolov4-triton-tensorrt
mkdir build
cd build
cmake ..
make

until here no problem.

When I try the following command : ./main with liblayerplugin.so , main and the downloaded yolov4.wts all in the same folder , as the image below:

The screen is stuck in the command [Info] Creating model yolov4 , I thought maybe it was the processing time, but it was more than 12 hours like this and nothing happened.

What am I doing wrong?
I already tried to change the version of the container but it remained the same.

I would appreciate if someone could help me.

customized yolov4 to tensorrt conversion problem

I have a customized yolov4 darknet model of 36 classes and i followed https://github.com/Tianxiaomo/pytorch-YOLOv4 to create the engine file by following this conversion : .weights-> .onnx-> .trt-engine.
My concern is that, the model file you have provided to try out (the drive link) has output as::
output {
name: "detections"
data_type: TYPE_FP32
dims: 159201
dims: 1
dims: 1
}
but when i create for my customized model it has the following as output::
output {
name: "boxes"
data_type: TYPE_FP32
dims: 22743
dims: 1
dims: 4
}
output {
name: "confs"
data_type: TYPE_FP32
dims: 22743
dims: 38
}

I am finding it difficult to make use of it. Could you have a look at it and let me know a work around to have a single output node?

Thanks

about deploying several yolov4 models

to deploy different yolov4 models ,I have change one of the layers "YoloLayer_TRT" to "YoloLayer_original_TRT"
both in "yolov4-triton-tensorrt/networks/yolov4.h" and "yolov4-triton-tensorrt/layers/yololayer.cu"
i change the classes name and namespace "Yolo" to "Yolo_original"
"yolov4-triton-tensorrt/layers/yololayer.cu"
const char* YoloLayerPlugin_original::getPluginType() const
{
return "YoloLayer_original_TRT";
}

"yolov4-triton-tensorrt/networks/yolov4.h"
auto creator = getPluginRegistry()->getPluginCreator("YoloLayer_original_TRT", "1");
const PluginFieldCollection* pluginData = creator->getFieldNames();
IPluginV2 pluginObj = creator->createPlugin("yololayer", pluginData);
ITensor inputTensors_yolo[] = {conv138->getOutput(0), conv149->getOutput(0), conv160->getOutput(0)};
auto yolo = network->addPluginV2(inputTensors_yolo, 3, *pluginObj);

after these changes,i made two plugins.so
plugin_1.so -> model_1
plugin_2.so -> model_2

when use this command to run triton-inference-server
docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/triton-deploy/models:/models -v$(pwd)/triton-deploy/plugins:/plugins --env LD_PRELOAD=/plugins/plugin_1.so:/plugins/plugin_2.so nvcr.io/nvidia/tritonserver:20.08-py3 tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose 1
the server is running.
when using the client.py to run the models , i can successfully run model1 while model2 cant give the right detection.

when exchange the place of /plugins/plugin_1.so:/plugins/plugin_2.so.
change to "LD_PRELOAD=/plugins/plugin_2.so:/plugins/plugin_1.so"
model2 is ok,but model1 is not ok.

i dont have config.pbtxt.
what is wrong with my operations.

Config.pbtxt requirement file

Hi isarsoft,

I really appreciate your contributions on this repository!!
I can successfully generate the tensorrt engine on TensorRT 20.03.1 image from NGC.
I wonder where can I get the config.pbtxt file in order to deploy on model repository of triton inference server?

Thank you so much!

BR,
Chieh

Feature: Import darknet weights instead of pytorch .wts

Current implementation relies on a pytorch implementation of yolov4 to import darknet weights and export to .wts weights.
This format has the benefit of an additional dictionary to look up values, which makes it more convenient to parse.

It should be possible to read the weights directly from the darknet weight file, considering that darknet is written in C and all necessary functions are implemented there and can be looked up. Darknet has a simple way of serializing weights, it simply writes all weights in order of the layer definitions into the file.

How to use FP16

I watch #define USE_FP16 ,but the model plan is TYPE_FP32

tritonclient.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED]

I had successfully test the engine with standalone TensorRT, but when I tried to run on Triton, I got the error.
“
(yolov4-triton) chen@ubuntu-X785-G30:~/yolov4-triton-tensorrt-1.3.0/clients/python$ python client.py -o data/dog_result.jpg image data/dog.jpg
Traceback (most recent call last):
File "client.py", line 120, in
if not triton_client.is_server_live():
File "/home/chen/anaconda3/envs/yolov4-triton/lib/python3.7/site-packages/tritonclient/grpc/init.py", line 217, in is_server_live
raise_error_grpc(rpc_error)
File "/home/chen/anaconda3/envs/yolov4-triton/lib/python3.7/site-packages/tritonclient/grpc/init.py", line 61, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED]
”
Is the problem relevent with version of grpc?

unexpected inference input 'data'

Just built everything from scratch, and set the batch to 16 in main.cpp, but getting the following error;

+------------------+---------+---------------------------------------------------------------------------------------------+
| Model            | Version | Status                                                                                      |
+------------------+---------+---------------------------------------------------------------------------------------------+
| resnet50_pytorch | 1       | READY                                                                                       |
| yolov4           | 1       | UNAVAILABLE: Invalid argument: unexpected inference input 'data', allowed inputs are: input |
+------------------+---------+---------------------------------------------------------------------------------------------+


I1013 07:49:41.680190 1 plan_backend.cc:365] Creating instance yolov4_0_0_gpu0 on GPU 0 (6.1) using model.plan
I1013 07:49:41.686045 1 logging.cc:52] Allocated persistent device memory of size 422353920
I1013 07:49:41.693108 1 logging.cc:52] Allocated activation device memory of size 2401337344
I1013 07:49:41.693283 1 logging.cc:52] Assigning persistent memory blocks for various profiles
I1013 07:49:41.693315 1 plan_backend.cc:608] Detected input as execution binding for yolov4
I1013 07:49:41.693322 1 plan_backend.cc:608] Detected detections as execution binding for yolov4
I1013 07:49:41.693330 1 plan_backend.cc:161] ~PlanBackend::Context 
E1013 07:49:41.747686 1 model_repository_manager.cc:1242] failed to load 'yolov4' version 1: Invalid argument: unexpected inference input 'data', allowed inputs are: input
I1013 07:49:41.747724 1 model_repository_manager.cc:1008] TriggerNextAction() 'yolov4' version 1: 0
I1013 07:49:41.747732 1 model_repository_manager.cc:1023] no next action, trigger OnComplete()
I1013 07:49:41.747806 1 model_repository_manager.cc:612] VersionStates() 'yolov4'
I1013 07:49:41.960221 1 dynamic_batch_scheduler.cc:230] Starting dynamic-batch scheduler thread 0 at nice 5...

Failed to build nvidia-pyindex on win10

Hi, I want to run the client on win10.
However, there are some errors during building the env. Thanks!

(py38trtc250) G:\client_py>pip install --user nvidia-pyindex
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Collecting nvidia-pyindex
  Using cached https://mirrors.aliyun.com/pypi/packages/64/4c/dd413559179536b9b7247f15bf968f7e52b5f8c1d2183ceb3d5ea9284776/nvidia-pyindex-1.0.5.tar.gz (6.1 kB)
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: 'D:\Anaconda\envs\py38trtc250\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"'; __file__='"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\15151\AppData\Local\Temp\pip-wheel-cho7jhkv'
       cwd: C:\Users\15151\AppData\Local\Temp\pip-install-a7e_bahh\nvidia-pyindex\
  Complete output (25 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib
  creating build\lib\nvidia_pyindex
  copying nvidia_pyindex\cmdline.py -> build\lib\nvidia_pyindex
  copying nvidia_pyindex\utils.py -> build\lib\nvidia_pyindex
  copying nvidia_pyindex\__init__.py -> build\lib\nvidia_pyindex
  running egg_info
  writing nvidia_pyindex.egg-info\PKG-INFO
  writing dependency_links to nvidia_pyindex.egg-info\dependency_links.txt
  writing entry points to nvidia_pyindex.egg-info\entry_points.txt
  writing top-level names to nvidia_pyindex.egg-info\top_level.txt
  reading manifest file 'nvidia_pyindex.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'nvidia_pyindex.egg-info\SOURCES.txt'
  installing to build\bdist.win-amd64\wheel
  running install
  '"nvidia_pyindex uninstall"' 不是内部或外部命令，也不是可运行的程序
  或批处理文件。
  error: [WinError 2] 系统找不到指定的文件。
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  COMMAND: InstallCommand
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  ----------------------------------------
  ERROR: Failed building wheel for nvidia-pyindex
  Running setup.py clean for nvidia-pyindex
Failed to build nvidia-pyindex
Installing collected packages: nvidia-pyindex
    Running setup.py install for nvidia-pyindex ... error
    ERROR: Command errored out with exit status 1:
     command: 'D:\Anaconda\envs\py38trtc250\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"'; __file__='"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\15151\AppData\Local\Temp\pip-record-ma8cx0c0\install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'C:\Users\15151\AppData\Roaming\Python\Python38\Include\nvidia-pyindex'
         cwd: C:\Users\15151\AppData\Local\Temp\pip-install-a7e_bahh\nvidia-pyindex\
    Complete output (7 lines):
    running install
    '"nvidia_pyindex uninstall"' 不是内部或外部命令，也不是可运行的程序
    或批处理文件。
    error: [WinError 2] 系统找不到指定的文件。
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    COMMAND: InstallCommand
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'D:\Anaconda\envs\py38trtc250\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"'; __file__='"'"'C:\\Users\\15151\\AppData\\Local\\Temp\\pip-install-a7e_bahh\\nvidia-pyindex\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\15151\AppData\Local\Temp\pip-record-ma8cx0c0\install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'C:\Users\15151\AppData\Roaming\Python\Python38\Include\nvidia-pyindex' Check the logs for full command output.

C++ program

I have read the officail document，but new I already not clearly this triton.
I want to know is it support ,in my C++ TRT program,in local GPU, deploy .eng file to infer. I just want implment the data parallelistion, like you program
If you can help me, i will pleasure .

Dynamic CLASS_NUM for YoloLayerPlugin

Hello, I encountered a problem. I used Triton Inference Server to deploy models with different class num. How can YoloLayerPlugin solve this problem?

docker run error

root@ubuntu:/home/cp/project/yolov5# docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/triton_deploy/models:/models -v$(pwd)/triton_deploy/plugins:/plugins --env LD_PRELOAD=/plugins/libmyplugins.so nvcr.io/nvidia/tritonserver:20.09-py3 tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose 1

/bin/bash: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory

how to hot update model

@philipp-schmidt Now I have deployed yolov4 models on triton server, I want to hot update model when I need to upgrade it,
how shuod I do it? thanks.

Using another model

Hi, I am planning to train an image classification model (ResNet) and hoping to make use of it in this repo, along with YOLO. If this is possible, which client i should use for it ?

(I mean the clients presented with Triton)

Triton Serving for Yolov3-SPP

Hi, Does your Yolov4-triton-tensorrt supports for Yolov3-spp Model? If no what modification needs to be done?

Thanks in advance

Support for Scaled-YOLOv4

Does this repo. support Scaled-YOLOv4? In particular yolov4-csp ?

How to use batch?

Hi
How to use batch in order to improve inference throughput? Thanks.

CUDA initialization failure with error

I get a CUDA error while executing the main

root@14690d4eb9d4:/yolov4-triton-tensorrt/build# ./main
Creating builder
[08/20/2020-17:01:01] [E] [TRT] CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)

my ubuntu

(base) ubuntu@ubuntu-01:~$ uname  -a
Linux ubuntu-01 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
(base) ubuntu@ubuntu-01:~$ docker -v
Docker version 19.03.12, build 48a66213fe
(base) ubuntu@ubuntu-01:~$ docker -v
Docker version 19.03.12, build 48a66213fe
(base) ubuntu@ubuntu-01:~$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-112-generic
 Operating System: Ubuntu 16.04.7 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.32GiB
 Name: ubuntu-01
 ID: 2KSY:EBTB:J4A3:WKZ2:2LN6:F7UR:64DZ:QIC3:WKOP:ABD5:GS5U:KOW7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
(base) ubuntu@ubuntu-01:~$ nvidia-smi
Fri Aug 21 01:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   35C    P8     8W / 250W |     71MiB / 11016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1037      G   /usr/lib/xorg/Xorg                            69MiB |
+-----------------------------------------------------------------------------+
(base) ubuntu@ubuntu-01:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
(base) ubuntu@ubuntu-01:~$

[Question] Running ./main is so long

Hi author,

Thank you so much for creating this repository. I had a problem is that when I run the ./main command, it takes a lot of time to create the engine model.

The process will be

./main
Creating builder
Creating model

It takes several minutes to complete the process. I would like to ask you that can we accelerate the processing time at this step by using all the processes of CPU? Thank you in advance.

How to run multiple Yolov4 models with different anchors?

As title
I have two yolov4 models
One is for full image and the other for subImage
Data flow is:
Input Image -> Model1 -> sub Images -> Model2 -> Results
These two models were trained by different datasets so they have different anchors

My Problem is:
How should I do?
It seems that anchors are hard-coded in YoloLayer
Should I create different YoloLayer for different models?

关于类别修改需要改那些参数？

谢谢博主提供的代码！
我darknet官方提供的yolov4的模型可以很成功将模型转换为engin，但是如果我想修改类别的话，需要更改那些文件的参数呢？
1.我替换了cfg文件的配置参数
2.修改了yolov4.h文件里面的类别数量，但是没找到filter修改的地方
3.重写编译生成了main，然后执行./main的时候报错了

是不是因为yolov4.h文件里面的filter没有改的原因，我看代码里面定义了yolov4的很多层参数

Batching

Not an issue but a question, i am not so familiar with Triton. is it possible to use dynamic (or any other) batching with YOLO v4 model, in Triton server ? Or it should be handled at client side ?

how to deploy yolov5 model with triton inference server

can you share the tutorial about https://github.com/ultralytics/yolov5 trained model with triton inference server? thanks.

./main: error while loading shared libraries

I am on the path of training of custom data, I have successfully generated yolov4.wts from custom .weights file, run ./main, generated the .engine and tested the client.py without problems. This was for 100th epoch (normally, no objects detected), so after training reach to 1000th, followed the whole path to regenerate .engine for 1000th epoch weights.

But now, when I run ./main
its now reporting an error;

(yolov4-triton) qgs@qgs-MS-7A74:~/yolov4-triton-tensorrt/build$ ./main
./main: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory

This has happened before and I have deleted local repo and re-cloned your repo, started from scratch (make etc) to make it function. What I am doing wrong here ? ( I am not running the ./main from inside docker).
This is whats available in the folder:

drwxr-xr-x 3 root root      4096 Eyl 23 19:48 .
drwxr-xr-x 9 qgs  qgs       4096 Eyl 23 19:50 ..
-rw-r--r-- 1 root root     21592 Eyl 23 19:32 CMakeCache.txt
drwxr-xr-x 7 root root      4096 Eyl 23 19:32 CMakeFiles
-rw-r--r-- 1 root root      1498 Eyl 23 19:32 cmake_install.cmake
-rwxr-xr-x 1 root root    884880 Eyl 23 19:32 liblayerplugin.so
-rwxr-xr-x 1 root root    125240 Eyl 23 19:32 main
-rw-r--r-- 1 root root      5159 Eyl 23 19:32 Makefile
-rw-r--r-- 1 root root 424181144 Eyl 23 19:48 yolov4.engine
-rw-r--r-- 1 root root 579892906 Eyl 23 20:48 yolov4.wts

[TRT] Could not compute dimensions. [TRT] Network validation failed.

I got a yolov4tiny.wts file. The engine creation fails with this output:

# ./main --network yolov4tiny
[Info] Creating builder
[Info] Creating model yolov4tiny
convBnLeaky: addBatchNorm2d: linx: 0
addBatchNorm2d: lname: module_list.0.BatchNorm2d; shift.count:32;
convBnLeaky: addBatchNorm2d: linx: 1
addBatchNorm2d: lname: module_list.1.BatchNorm2d; shift.count:64;
convBnLeaky: addBatchNorm2d: linx: 2
addBatchNorm2d: lname: module_list.2.BatchNorm2d; shift.count:64;
convBnLeaky: addBatchNorm2d: linx: 4
addBatchNorm2d: lname: module_list.4.BatchNorm2d; shift.count:32;
convBnLeaky: addBatchNorm2d: linx: 5
addBatchNorm2d: lname: module_list.5.BatchNorm2d; shift.count:32;
Adding route 6
Adding leaky convolution 7
convBnLeaky: addBatchNorm2d: linx: 7
addBatchNorm2d: lname: module_list.7.BatchNorm2d; shift.count:64;
Adding route 8
Adding maxpool 9
Adding leaky convolution 10
convBnLeaky: addBatchNorm2d: linx: 10
addBatchNorm2d: lname: module_list.10.BatchNorm2d; shift.count:128;
Adding route 11
Adding leaky convolution 12
convBnLeaky: addBatchNorm2d: linx: 12
addBatchNorm2d: lname: module_list.12.BatchNorm2d; shift.count:64;
Adding leaky convolution 13
convBnLeaky: addBatchNorm2d: linx: 13
addBatchNorm2d: lname: module_list.13.BatchNorm2d; shift.count:64;
Adding route 14
Adding leaky convolution 15
convBnLeaky: addBatchNorm2d: linx: 15
addBatchNorm2d: lname: module_list.15.BatchNorm2d; shift.count:128;
Adding route 16
Adding max pool 17
Adding leaky convolution 18
convBnLeaky: addBatchNorm2d: linx: 18
addBatchNorm2d: lname: module_list.18.BatchNorm2d; shift.count:256;
Adding route 19
Adding leaky convolution 20
convBnLeaky: addBatchNorm2d: linx: 20
addBatchNorm2d: lname: module_list.20.BatchNorm2d; shift.count:128;
Adding leaky convolution 21
convBnLeaky: addBatchNorm2d: linx: 21
addBatchNorm2d: lname: module_list.21.BatchNorm2d; shift.count:128;
Adding route 22
Adding leaky convolution 23
convBnLeaky: addBatchNorm2d: linx: 23
addBatchNorm2d: lname: module_list.23.BatchNorm2d; shift.count:256;
Adding route 24
Adding maxpool 25
Adding leaky convolution 26
convBnLeaky: addBatchNorm2d: linx: 26
addBatchNorm2d: lname: module_list.26.BatchNorm2d; shift.count:512;
Adding leaky convolution 27
convBnLeaky: addBatchNorm2d: linx: 27
addBatchNorm2d: lname: module_list.27.BatchNorm2d; shift.count:256;
Adding leaky convolution 28
convBnLeaky: addBatchNorm2d: linx: 28
addBatchNorm2d: lname: module_list.28.BatchNorm2d; shift.count:512;
Adding linear convolution 29
Adding yolo layer 30
Adding route 31
Adding leaky convolution 32
convBnLeaky: addBatchNorm2d: linx: 32
addBatchNorm2d: lname: module_list.32.BatchNorm2d; shift.count:128;
Adding upsample 33
Adding route 34
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: kernel weights has count 0 but 26112 was expected
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: count of 0 weights in kernel, but kernel dimensions (1,1) with 512 input channels, 51 output channels and 1 groups were specified. Expected Weights count is 512 * 1*1 * 51 / 1 = 26112
Adding leaky convolution 35
convBnLeaky: addBatchNorm2d: linx: 35
addBatchNorm2d: lname: module_list.35.BatchNorm2d; shift.count:256;
Adding linear convolution 36
Adding yolo layer 37
Adding route 38
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: kernel weights has count 0 but 26112 was expected
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: count of 0 weights in kernel, but kernel dimensions (1,1) with 512 input channels, 51 output channels and 1 groups were specified. Expected Weights count is 512 * 1*1 * 51 / 1 = 26112
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: kernel weights has count 0 but 26112 was expected
[07/24/2021-10:50:21] [E] [TRT] (Unnamed Layer* 63) [Convolution]: count of 0 weights in kernel, but kernel dimensions (1,1) with 512 input channels, 51 output channels and 1 groups were specified. Expected Weights count is 512 * 1*1 * 51 / 1 = 26112
[07/24/2021-10:50:21] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 63) [Convolution]_output, because the network is not valid.
[07/24/2021-10:50:21] [E] [TRT] Network validation failed.
main: /workspace/yolov4-triton-tensorrt-r21.05/main.cpp:90: int main(int, char**): Assertion `engine != nullptr' failed.
Aborted (core dumped)

I've added some debug messages to network/yolov4tiny.h to checkpoint the execution.
The container got /usr/lib/x86_64-linux-gnu/libnvinfer.so.7.2.2

Graphics driver version

Status: Downloaded newer image for nvcr.io/nvidia/tensorrt:20.08-py3

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 20.08 (build 15268644)

NVIDIA TensorRT 7.1.3 (c) 2016-2020, NVIDIA CORPORATION.  All rights reserved.
Container image (c) 2020, NVIDIA CORPORATION.  All rights reserved.

https://developer.nvidia.com/tensorrt

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install open source parsers, plugins, and samples, run /opt/tensorrt/install_opensource.sh. See https://github.com/NVIDIA/TensorRT for more information.

ERROR: This container was built for NVIDIA Driver Release 450.51 or later, but
       version 410.78 was detected and compatibility mode is UNAVAILABLE.

       [[CUDA Driver UNAVAILABLE (cuInit(0) returned 803)]

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001B06sv000019DAsd00004471bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-430 - third-party free recommended
driver   : nvidia-415 - third-party free
driver   : xserver-xorg-video-nouveau - distro free builtin
driver   : nvidia-410 - third-party free
driver   : nvidia-418 - third-party free
driver   : nvidia-384 - distro non-free

Do I need a newer Ubuntu version (16.04 LTS) ? Current list is like above and no 450+ drivers. Or, 1080Ti (compute 6.1) not eligible for v450 and/or Triton server ?

How to use client.py by shared-memory

Can you help me

Client side questions

Hi, from documents, triton supports different versions of models, under /1/, 2, 3, etc. and newest one is considered /1/ folder.

How can we indicate which version of a model used in a specific inference request, via client.py ? I may prefer an older version sometimes. If its working this way.

C++ client infer script needed [feature]

Most of the code in the repo, looks for building network and serializing plan file, but without client infer script it looks incomplete, it would be interesting to have infer client script as consuming triton server as shared lib, instead of calling over gRPC or HTTP.

client error

I have managed to run perf. test with the following:

docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:20.08-py3-clientsdk /bin/bash
cd install/bin
./perf_client -m yolov4 -u 127.0.0.1:8001 -i grpc --shared-memory system --concurrency-range 4

above is giving 95 inferences, with 2 x 1080Ti

But when I switch to client;

qgs@qgs-MS-7A74:~/yolov4-triton-tensorrt/clients/python$ python client.py -h
  File "client.py", line 142
    print(f"Received result buffer of size {result.shape}")
                                                         ^
SyntaxError: invalid syntax

its not accepting any parameters etc, directly giving the error.

./rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Hi @isarsoft,

I have converted my Pytorch model to ONNX and to TRT(with Custom Plugins).
But I am using another TRT(2) Model whose output will be input to the above TRT(1) Model. I am serving those two TRT models using Triton Server on Jetson Nano, I am sending the request from my Laptop to Jetson Nano, but during the response, I am getting error in the Jetson nano terminal as:

E0311 13:59:57.029723 29688 logging.cc:43] …/rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
E0311 13:59:57.030261 29688 logging.cc:43] FAILED_EXECUTION: std::exception

Not sure what is going wrong. what is causing an issue here?
Can you please assist me in resolving this error?

Command I used in Jetson Nano to start the serving of two models:

LD_PRELOAD=“Einsum_op.so RoI_Align.so libyolo_layer.so” ./Downloads/bin/tritonserver --model-repository=./models --min-supported-compute-capability=5.3 --log-verbose=1

Model Info:

Yolov3-spp-ultralytics.
SlowFast.(with two Custom Plugins).
The bounding boxes outputed by the 1st model will go as the input to the 2nd model(Slowfast).

Looking forward to the reply

Thanks,
Darshan

GRPC: unable to provide 'prob' in GPU, will use CPU

Hi, I want to infer videos with triton20.08 using GPU instead of CPU. Can you help me?

server

I1222 02:48:37.420736 1 plan_backend.cc:1652] Running yolov5x_0_gpu1 with 1 requests
I1222 02:48:37.420799 1 plan_backend.cc:2384] Optimization profile default [0] is selected for yolov5x_0_gpu1
I1222 02:48:37.420847 1 pinned_memory_manager.cc:130] pinned memory allocation: size 4435968, addr 0x7f8f78000090
I1222 02:48:37.421611 1 plan_backend.cc:1911] Context with profile default [0] is being executed for yolov5x_0_gpu1
I1222 02:48:37.423902 1 infer_response.cc:139] add response output: output: prob, type: FP32, shape: [1,6001,1,1]
I1222 02:48:37.423941 1 grpc_server.cc:2151] GRPC: unable to provide 'prob' in GPU, will use CPU
I1222 02:48:37.423958 1 grpc_server.cc:2162] GRPC: using buffer for 'prob', size: 24004, addr: 0x7f8caa0658f0
I1222 02:48:37.423978 1 pinned_memory_manager.cc:130] pinned memory allocation: size 24004, addr 0x7f8f7843b0a0
I1222 02:48:37.424035 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f8f78000090
I1222 02:48:37.433026 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f8f7843b0a0
I1222 02:48:37.433072 1 grpc_server.cc:3158] ModelInferHandler::InferResponseComplete, 251 step ISSUED
I1222 02:48:37.433100 1 grpc_server.cc:2197] GRPC free: size 24004, addr 0x7f8caa0658f0
I1222 02:48:37.433262 1 grpc_server.cc:2736] ModelInferHandler::InferRequestComplete
I1222 02:48:37.433277 1 grpc_server.cc:3007] Process for ModelInferHandler, rpc_ok=1, 251 step COMPLETE
I1222 02:48:37.433298 1 grpc_server.cc:2071] Done for ModelInferHandler, 251

client

Frame 246: 62 raw boxes, 3 objects
car:     0.94
car:     0.93
car:     0.93
time:    31.0ms
Frame 247: 63 raw boxes, 3 objects
car:     0.94
car:     0.93
car:     0.93
time:    28.4ms
Frame 248: 73 raw boxes, 3 objects
car:     0.93
car:     0.92
car:     0.92
time:    27.8ms

Using FP16

Hi, i have an engine compiled with FP16, if i've not undestood bad, i should alter the pbtxt on the triton server, that's my pbtxt

platform: "tensorrt_plan"
max_batch_size: 1
input {
  name: "data"
  data_type: TYPE_FP16
  dims: 3
  dims: 608
  dims: 608
}
output {
  name: "prob"
  data_type: TYPE_FP16
  dims: 7001
  dims: 1
  dims: 1
}
default_model_filename: "model.plan"

Cause of the input now is FP16 the client tell me that the input don't fit for the model, i've changed in client the mode to FP16, but in preprocess function change the type of the data to FP32, i've tried to change from:
image = np.transpose(np.array(image, dtype=**np.float32**, order='C'), (2, 0, 1))
to
image = np.transpose(np.array(image, dtype=**np.float16**, order='C'), (2, 0, 1))
but the loss of precision due to strange values in bboxes, making appear inf values... do you know how to make the repository works on FP16? Thanks.

multiple model instances issue

Hi,

After dealing long with kubernetes based solutions, I have switched to celery. currently I am able to feed the tritonserver (20.08) )good enough, despite some CPU bottleneck. But GPU is still staying on low side, utiilization ranging between 20-60%.

I was already setting config to 8 instances, but decided to give dynamic_batching a try and set the config to following;

name: "yolov4"
platform: "tensorrt_plan"
max_batch_size: 64
input [
  {
    name: "data"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 608, 608 ]
  }
]
output [
  {
    name: "prob"
    data_type: TYPE_FP32
    dims: [7001, 1, 1]
  }
]
instance_group [
  {
    count: 8
    kind: KIND_GPU
  }
]
dynamic_batching {
  preferred_batch_size: [1,2,4,8,16,32,64]
  max_queue_delay_microseconds: 100
}

It failed to function and I take a look to triton logs, found this;

�[35mtritonserver_1    |�[0m I1012 18:15:51.073224 1 logging.cc:52] Deserialize required 2748834 microseconds.
�[35mtritonserver_1    |�[0m I1012 18:15:51.116093 1 autofill.cc:225] TensorRT autofill: OK: 
�[35mtritonserver_1    |�[0m I1012 18:15:51.116163 1 model_config_utils.cc:629] autofilled config: name: "yolov4"
�[35mtritonserver_1    |�[0m platform: "tensorrt_plan"
�[35mtritonserver_1    |�[0m max_batch_size: 1
�[35mtritonserver_1    |�[0m input {
�[35mtritonserver_1    |�[0m   name: "data"
�[35mtritonserver_1    |�[0m   data_type: TYPE_FP32
�[35mtritonserver_1    |�[0m   format: FORMAT_NCHW
�[35mtritonserver_1    |�[0m   dims: 3
�[35mtritonserver_1    |�[0m   dims: 608
�[35mtritonserver_1    |�[0m   dims: 608
�[35mtritonserver_1    |�[0m }
�[35mtritonserver_1    |�[0m output {
�[35mtritonserver_1    |�[0m   name: "prob"
�[35mtritonserver_1    |�[0m   data_type: TYPE_FP32
�[35mtritonserver_1    |�[0m   dims: 7001
�[35mtritonserver_1    |�[0m   dims: 1
�[35mtritonserver_1    |�[0m   dims: 1
�[35mtritonserver_1    |�[0m }
�[35mtritonserver_1    |�[0m instance_group {
�[35mtritonserver_1    |�[0m   count: 2
�[35mtritonserver_1    |�[0m   kind: KIND_GPU
�[35mtritonserver_1    |�[0m }
�[35mtritonserver_1    |�[0m default_model_filename: "model.plan"
�[35mtritonserver_1    |�[0m 
�[35mtritonserver_1    |�[0m I1012 18:15:51.116615 1 model_repository_manager.cc:618] AsyncLoad() 'resnet50_pytorch'
�[35mtritonserver_1    |�[0m I1012 18:15:51.116639 1 model_repository_manager.cc:680] TriggerNextAction() 'resnet50_pytorch' version 1: 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116652 1 model_repository_manager.cc:718] Load() 'resnet50_pytorch' version 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116659 1 model_repository_manager.cc:737] loading: resnet50_pytorch:1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116760 1 model_repository_manager.cc:618] AsyncLoad() 'yolov4'
�[35mtritonserver_1    |�[0m I1012 18:15:51.116778 1 model_repository_manager.cc:680] TriggerNextAction() 'yolov4' version 1: 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116787 1 model_repository_manager.cc:718] Load() 'yolov4' version 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116794 1 model_repository_manager.cc:737] loading: yolov4:1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116766 1 model_repository_manager.cc:790] CreateInferenceBackend() 'resnet50_pytorch' version 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.116867 1 model_repository_manager.cc:790] CreateInferenceBackend() 'yolov4' version 1
�[35mtritonserver_1    |�[0m I1012 18:15:51.184996 1 libtorch_backend.cc:220] Creating instance resnet50_pytorch_0_0_gpu0 on GPU 0 (6.1) using model.pt
�[32mcelery_default_1  |�[0m [2021-10-12 18:15:51,281: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
�[33mcelery_cpu_1      |�[0m [2021-10-12 18:15:51,282: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
�[35mtritonserver_1    |�[0m W1012 18:15:52.596620 1 logging.cc:46] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
�[35mtritonserver_1    |�[0m I1012 18:15:52.610466 1 dynamic_batch_scheduler.cc:216] Starting dynamic-batch scheduler thread 0 at nice 5...
�[35mtritonserver_1    |�[0m I1012 18:15:52.616167 1 model_repository_manager.cc:925] successfully loaded 'resnet50_pytorch' version 1
�[35mtritonserver_1    |�[0m I1012 18:15:52.616201 1 model_repository_manager.cc:680] TriggerNextAction() 'resnet50_pytorch' version 1: 0
�[35mtritonserver_1    |�[0m I1012 18:15:52.616208 1 model_repository_manager.cc:695] no next action, trigger OnComplete()
�[35mtritonserver_1    |�[0m I1012 18:15:53.736514 1 logging.cc:52] Deserialize required 2121911 microseconds.
�[35mtritonserver_1    |�[0m I1012 18:15:53.736539 1 plan_backend.cc:331] Creating instance yolov4_0_0_gpu0 on GPU 0 (6.1) using model.plan
�[35mtritonserver_1    |�[0m I1012 18:15:53.739256 1 plan_backend.cc:503] Detected data as execution binding for yolov4
�[35mtritonserver_1    |�[0m I1012 18:15:53.739281 1 plan_backend.cc:503] Detected prob as execution binding for yolov4
�[35mtritonserver_1    |�[0m I1012 18:15:53.739459 1 plan_backend.cc:649] Created instance yolov4_0_0_gpu0 on GPU 0 with stream priority 0
�[35mtritonserver_1    |�[0m I1012 18:15:53.739595 1 plan_backend.cc:331] Creating instance yolov4_0_1_gpu0 on GPU 0 (6.1) using model.plan
�[35mtritonserver_1    |�[0m I1012 18:15:53.742664 1 plan_backend.cc:503] Detected data as execution binding for yolov4
�[35mtritonserver_1    |�[0m I1012 18:15:53.742688 1 plan_backend.cc:503] Detected prob as execution binding for yolov4
�[35mtritonserver_1    |�[0m I1012 18:15:53.742868 1 plan_backend.cc:649] Created instance yolov4_0_1_gpu0 on GPU 0 with stream priority 0
�[35mtritonserver_1    |�[0m I1012 18:15:53.742954 1 dynamic_batch_scheduler.cc:216] Starting dynamic-batch scheduler thread 0 at nice 5...
�[35mtritonserver_1    |�[0m I1012 18:15:53.743034 1 plan_backend.cc:356] plan backend for yolov4
�[35mtritonserver_1    |�[0m name=yolov4
�[35mtritonserver_1    |�[0m contexts:
�[35mtritonserver_1    |�[0m   name=yolov4_0_0_gpu0, gpu=0, max_batch_size=1
�[35mtritonserver_1    |�[0m   bindings:
�[35mtritonserver_1    |�[0m     0: max possible byte_size=4435968, buffer=0x7fd194800000 ]
�[35mtritonserver_1    |�[0m     1: max possible byte_size=28004, buffer=0x7fd26f647600 ]
�[35mtritonserver_1    |�[0m   name=yolov4_0_1_gpu0, gpu=0, max_batch_size=1
�[35mtritonserver_1    |�[0m   bindings:
�[35mtritonserver_1    |�[0m     0: max possible byte_size=4435968, buffer=0x7fd176800000 ]
�[35mtritonserver_1    |�[0m     1: max possible byte_size=28004, buffer=0x7fd26f694e00 ]
�[35mtritonserver_1    |�[0m 
�[35mtritonserver_1    |�[0m I1012 18:15:53.761424 1 model_repository_manager.cc:925] successfully loaded 'yolov4' version 1

Apparently, my 8 instances is not in place, either. Tritonserver allocates only 2.6GB GPU memory, out of 11.

Is there anything I can do, to fix model instances and dynamic_batching ?
I believe if I can run more than two instances, I will not need dyn_batch stuff..

edit. quoted config values above were wrong, fixed.