QAT with multiple GPUs?

Yolo DeepStream

Description

This repo have 4 parts:

1) yolov7_qat

In yolov7_qat, We use TensorRT's pytorch quntization tool to Finetune training QAT yolov7 from the pre-trained weight. Finally we get the same performance of PTQ in TensorRT on Jetson OrinX. And the accuracy(mAP) of the model only dropped a little.

2) tensorrt_yolov7

In tensorrt_yolov7, We provide a standalone c++ yolov7-app sample here. You can use trtexec to convert FP32 onnx models or QAT-int8 models exported from repo yolov7_qat to trt-engines. And set the trt-engine as yolov7-app's input. It can do detections on images/videos. Or test mAP on COCO dataset.

3) deepstream_yolo

In deepstream_yolo, This sample shows how to integrate YOLO models with customized output layer parsing for detected objects with DeepStreamSDK.

4) tensorrt_yolov4

In tensorrt_yolov4, This sample shows a standalone tensorrt-sample for yolov4.

Performance

For YoloV7 sample:

Below table shows the end-to-end performance of processing 1080p videos with this sample application.

Testing Device :
1. Jetson AGX Orin 64GB(PowerMode:MAXN + GPU-freq:1.3GHz + CPU:12-core-2.2GHz)
2. Tesla T4

Device	precision	Number of streams	Batch Size	trtexec FPS	deepstream-app FPS with cuda-post-process	deepstream-app FPS with cpu-post-process
Orin-X	FP16	1	1	126	124	120
Orin-X	FP16	16	16	162	145	135
Orin-X	Int8(PTQ/QAT)	1	1	180	175	128
Orin-X	Int8(PTQ/QAT)	16	16	264	264	135
T4	FP16	1	1	132	125	123
T4	FP16	16	16	169	169	123
T4	Int8(PTQ/QAT)	1	1	208	170	127
T4	Int8(PTQ/QAT)	16	16	305	300	132

note: trtexec cudaGraph not enabled as deepstream not support cudaGraph

Code structure

├── deepstream_yolo
│   ├── config_infer_primary_yoloV4.txt # config file for yolov4 model
│   ├── config_infer_primary_yoloV7.txt # config file for yolov7 model
│   ├── deepstream_app_config_yolo.txt # deepStream reference app configuration file for using YOLOv models as the primary detector.
│   ├── labels.txt # labels for coco detection # output layer parsing function for detected objects for the Yolo model.
│   ├── nvdsinfer_custom_impl_Yolo 
│   │   ├── Makefile
│   │   └── nvdsparsebbox_Yolo.cpp 
│   └── README.md 
├── README.md
├── tensorrt_yolov4
│   ├── data 
│   │   ├── demo.jpg # the demo image
│   │   └── demo_out.jpg # image detection output of the demo image
│   ├── Makefile
│   ├── Makefile.config
│   ├── README.md
│   └── source
│       ├── generate_coco_image_list.py # python script to get list of image names from MS COCO annotation or information file
│       ├── main.cpp # program main entrance where parameters are configured here
│       ├── Makefile
│       ├── onnx_add_nms_plugin.py # python script to add BatchedNMSPlugin node into ONNX model
│       ├── SampleYolo.cpp # yolov4 inference class functions definition file
│       └── SampleYolo.hpp # yolov4 inference class definition file
├── tensorrt_yolov7
│   ├── CMakeLists.txt
│   ├── imgs # the demo images
│   │   ├── horses.jpg 
│   │   └── zidane.jpg
│   ├── README.md
│   ├── samples 
│   │   ├── detect.cpp # detection app for images detection
│   │   ├── validate_coco.cpp # validate coco dataset app
│   │   └── video_detect.cpp # detection app for video detection
│   ├── src
│   │   ├── argsParser.cpp # argsParser helper class for commandline parsing
│   │   ├── argsParser.h # argsParser helper class for commandline parsing
│   │   ├── tools.h # helper function for yolov7 class
│   │   ├── Yolov7.cpp # Class Yolov7
│   │   └── Yolov7.h # Class Yolov7
│   └── test_coco_map.py # tool for test coco map with json file
└── yolov7_qat
    ├── doc
    │   ├── Guidance_of_QAT_performance_optimization.md # guidance for Q&DQ insert and placement for pytorch-quantization tool
    ├── quantization
    │   ├── quantize.py # helper class for quantize yolov7 model
    │   └── rules.py # rules for Q&DQ nodes insert and restrictions
    ├── README.md 
    └── scripts
        ├── detect-trt.py # detect a image with tensorrt engine
        ├── draw-engine.py # draw tensorrt engine to graph
        ├── eval-trt.py # the script for evalating tensorrt mAP
        ├── eval-trt.sh # the command lne script for evaluating tensorrt mAP
        ├── qat.py # main function for QAT and PTQ
        └── trt-int8.py # tensorrt build-in calibration

repo.	batch=1	batch=2	batch=4	batch=8
tkDNN	N/A (N/A) 207.81	N/A	N/A (N/A) 443.32	N/A
isarsoft	7.96 (N/A) 125.4	N/A	21.0 (N/A) 189.6	38.3 (N/A) 208.0
this	7.023 (2.61747) 120.831	4.393 (1.76344) 186.44	3.688 (1.26853) 223.68	3.42267 (0.888971) 239.063

nvidia-ai-iot / yolo_deepstream Goto Github PK

yolo_deepstream's Introduction

Yolo DeepStream

Description

1) yolov7_qat

2) tensorrt_yolov7

3) deepstream_yolo

4) tensorrt_yolov4

Performance

Code structure

yolo_deepstream's People

Contributors

Stargazers

Watchers

Forkers

yolo_deepstream's Issues

Error Message

Note

Environment

Input filename: yolov4_1_3_608_608_static.nms.onnx ONNX IR version: 0.0.7 Opset version: 11 Producer name: Producer version: Domain: Model version: 0 Doc string:

Recommend Projects

Recommend Topics

Recommend Org

Input filename: yolov4_1_3_608_608_static.nms.onnx
ONNX IR version: 0.0.7
Opset version: 11
Producer name:
Producer version:
Domain:
Model version: 0
Doc string: