This repository contains useful scripts and code references I use or encounter when working with TensorRT.
The master branch is currently targeted at TensorRT 7.1+ (NGC 20.06+).
For earlier TensorRT versions, please see the other tags.
⚡ Useful scripts when using TensorRT
License: Apache License 2.0
Hello,
Thanks for this great repo. I used the provided script to convert a searched network (onnx model) to int8 model with int8 calibration. The onnx model works fine but the output of trt engine is wrong. When I check the calibration cache file, I found many of the quantization scales are "Infinity" and the others are very large. I converted the cache file to json file, it looks as the attached image below. Any hint is very appreciated.
TensorRT Version: 7.0.0.11
GPU Type: Tesla T4
Nvidia Driver Version: 418.67
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Debian 9.11
Python Version (if applicable): 3.7.4
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): 1.4.0
Baremetal or Container (if container which image + tag): An internal Docker image
I'm wondering how to do inference with the saved int8 trt engine file. Dose the process of inference is just the same as normal.
Hi @rmccorm4 ,
I am using the infer_tensorrt_imagenet.py file to infer the images of imagenet with the int8 engine created by TensorRT.
Here is the way that I am using the code:
python3 infer_tensorrt_imagenet.py --engine resnet18.int8.engine
-d /home/hassan/Datasets/ImageNet/ -b 1 -n 5
Here are the results:
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000025.JPEG
Prediction: wall clock Probability: 0.09
Prediction: matchstick Probability: 0.06
Prediction: switch Probability: 0.03
Prediction: screw Probability: 0.03
Prediction: envelope Probability: 0.02
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000073.JPEG
Prediction: wall clock Probability: 0.06
Prediction: matchstick Probability: 0.06
Prediction: switch Probability: 0.04
Prediction: screw Probability: 0.03
Prediction: envelope Probability: 0.02
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000117.JPEG
Prediction: wing Probability: 0.07
Prediction: spotlight Probability: 0.02
Prediction: lampshade Probability: 0.02
Prediction: matchstick Probability: 0.02
Prediction: wall clock Probability: 0.02
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000113.JPEG
Prediction: wall clock Probability: 0.06
Prediction: envelope Probability: 0.06
Prediction: matchstick Probability: 0.04
Prediction: lampshade Probability: 0.03
Prediction: refrigerator Probability: 0.03
There are certain classes popping up (like wall clock). Can you kindly help?
Thank you.
Hi @rmccorm4,
I would like to ask some advice on int8 calibration. I've had no trouble building explicit batch engines where the batch > 1 with fp16 and I've managed to get int8 explicit batch engines built where the batch = 1. However, int8 calibration seems to not work for batch > 1. It calibrates without errors or failures, and my demo app runs without errors so it's getting hard to debug. Do you have any advice? I've tried building the cache with batch = 1 and then using that to build an engine of batch > 1, and it seemed to work but I haven't been able to replicate that particular result.
hi rmccorm4! I use your code to add optimization proflie for a Resnet101 onnx model which have a dynamic shape in batchsize, and bulid a int8 engine. But when i do inference with infer_tensorrt_imagenet.py,i only get correct result with batchsize=1. For other batchsize, i get the most of output probability equal to 0.
I know that i should add something in runtime code, and follow https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions, but i didnt figure it out. Could u show me how to make the multi-batch inference, thank u!
TensorRT Version: TensorRT 7.2.1
GPU Type: P4
CUDA Version: 11.0
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9
i add codes to get the input shape with batchsize = 32,
# Run inference.
context.active_optimization_profile = 3
context_inputshape = context.get_binding_shape(0)
context.set_binding_shape(0, context_inputshape)
context.execute(batch_size, dbindings)
still get the wrong result like this.
Input image: /user/z00590385/imagenet_test_dataset/val/n01728572/ILSVRC2012_val_00003569.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n09246464/ILSVRC2012_val_00024591.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02009912/ILSVRC2012_val_00024166.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n04399382/ILSVRC2012_val_00025599.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02948072/ILSVRC2012_val_00007580.JPEG
Prediction: drake Probability: 1.88
Prediction: sea slug Probability: 1.88
Prediction: sea anemone Probability: 1.00
Input image: /user/z00590385/imagenet_test_dataset/val/n04254777/ILSVRC2012_val_00026534.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n07684084/ILSVRC2012_val_00046840.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02666196/ILSVRC2012_val_00030552.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n03207743/ILSVRC2012_val_00044279.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n03133878/ILSVRC2012_val_00031187.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n07715103/ILSVRC2012_val_00032107.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
.......
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
TensorRT Version: 8.4.5.1
GPU Type: NVIDIA TITAN X
Nvidia Driver Version:
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: linyx
Python Version (if applicable): 3.8.5
Hi,I run yolov4 int8 and bad this problem
I am getting this error while compiling for int8. The same code works fine for FP32 and FP16
[05/31/2022-05:46:12] [TRT] [E] 1: Unexpected exception
Traceback (most recent call last):
File "./onnx_to_tensorrt.py", line 218, in <module>
main()
File "./onnx_to_tensorrt.py", line 213, in main
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: __enter__
I have seen examples where classification models can be run on TensorRT in INT8 mode. But can you get specific on what I should do to calibrate the same and produce int8 engine for DETECTOR models(onnx)
TensorRT Version:7
GPU Type: T4
Nvidia Driver Version: 440
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: 18
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Hi, @rmccorm4
Currently, I'm trying to generate INT8 TRT engine with calibrations, like that
calibrator = Calibrator(data_loader=calib_data(), cache="identity-calib.cache") build_engine = EngineFromNetwork( NetworkFromOnnxPath("identity.onnx"), config=CreateConfig(int8=True, calibrator=calibrator) )
But I was really confused about the mechanisms:
Please help me with these two problems, Thanks a lot.
I used tensorRT python API to parse an onnx model from Pytorch. The model has only one resize node. But the tensorrt parser outputs a two input IResizeLayer. One of the inputs is a constant layer, with shape (4). Also, the output shape of the IResizeLayer is (-1,-,1-,1-,1).
However, this onnx model works fine with ""trtexec --explicitBatch --workspace=128 --onnx=optimized_model.onnx"" command.
model link: https://hkustconnect-my.sharepoint.com/:u:/g/personal/ycchanau_connect_ust_hk/ERJqH7chlY9FquRU4A1XIncBi4_QxCEd8wllrLn5WGqGDw?e=qzvgBf
TensorRT Version: 7.0
GPU Type: T4
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag): TensorRT Release 20.02
I parse the model with the following code by the command "python test_tensorrt.py --explicit-batch -v --explicit-precision"
import pycuda.autoinit
import numpy as np
import pycuda.driver as cuda
import tensorrt as trt
#import torch
import os
import time
#from PIL import Image
#import cv2
#import torchvision
import sys
import glob
import math
import logging
import argparse
TRT_LOGGER = trt.Logger()
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger(__name__)
def add_profiles(config, inputs, opt_profiles):
logger.debug("=== Optimization Profiles ===")
for i, profile in enumerate(opt_profiles):
for inp in inputs:
_min, _opt, _max = profile.get_shape(inp.name)
logger.debug("{} - OptProfile {} - Min {} Opt {} Max {}".format(inp.name, i, _min, _opt, _max))
config.add_optimization_profile(profile)
def mark_outputs(network):
# Mark last layer's outputs if not already marked
# NOTE: This may not be correct in all cases
last_layer = network.get_layer(network.num_layers-1)
if not last_layer.num_outputs:
logger.error("Last layer contains no outputs.")
return
for i in range(last_layer.num_outputs):
network.mark_output(last_layer.get_output(i))
def check_network(network):
if not network.num_outputs:
logger.warning("No output nodes found, marking last layer's outputs as network outputs. Correct this if wrong.")
mark_outputs(network)
inputs = [network.get_input(i) for i in range(network.num_inputs)]
outputs = [network.get_output(i) for i in range(network.num_outputs)]
max_len = max([len(inp.name) for inp in inputs] + [len(out.name) for out in outputs])
logger.debug("=== Network Description ===")
for i, inp in enumerate(inputs):
logger.debug("Input {0} | Name: {1:{2}} | Shape: {3}".format(i, inp.name, max_len, inp.shape))
for i, out in enumerate(outputs):
logger.debug("Output {0} | Name: {1:{2}} | Shape: {3}".format(i, out.name, max_len, out.shape))
def get_batch_sizes(max_batch_size):
# Returns powers of 2, up to and including max_batch_size
max_exponent = math.log2(max_batch_size)
for i in range(int(max_exponent)+1):
batch_size = 2**i
yield batch_size
if max_batch_size != batch_size:
yield max_batch_size
# TODO: This only covers dynamic shape for batch size, not dynamic shape for other dimensions
def create_optimization_profiles(builder, inputs, batch_sizes=[1,4,8]):
# Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
if all([inp.shape[0] > -1 for inp in inputs]):
profile = builder.create_optimization_profile()
for inp in inputs:
fbs, shape = inp.shape[0], inp.shape[1:]
profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
return [profile]
# Otherwise for mixed fixed+dynamic explicit batch inputs, create several profiles
profiles = {}
for bs in batch_sizes:
if not profiles.get(bs):
profiles[bs] = builder.create_optimization_profile()
for inp in inputs:
shape = inp.shape[1:]
# Check if fixed explicit batch
if inp.shape[0] > -1:
bs = inp.shape[0]
profiles[bs].set_shape(inp.name, min=(bs, *shape), opt=(bs, *shape), max=(bs, *shape))
return list(profiles.values())
def get_engine(args):
# Network flags
network_flags = 0
if args.explicit_batch:
network_flags |= 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
if args.explicit_precision:
network_flags |= 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)
builder_flag_map = {
'gpu_fallback': trt.BuilderFlag.GPU_FALLBACK,
'refittable': trt.BuilderFlag.REFIT,
'debug': trt.BuilderFlag.DEBUG,
'strict_types': trt.BuilderFlag.STRICT_TYPES,
'fp16': trt.BuilderFlag.FP16,
'int8': trt.BuilderFlag.INT8,
}
# Building engine
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network(network_flags) as network, \
builder.create_builder_config() as config, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
config.max_workspace_size = 2**27 # 1GiB
# Set Builder Config Flags
for flag in builder_flag_map:
if getattr(args, flag):
logger.info("Setting {}".format(builder_flag_map[flag]))
config.set_flag(builder_flag_map[flag])
if args.fp16 and not builder.platform_has_fast_fp16:
logger.warning("FP16 not supported on this platform.")
if args.int8 and not builder.platform_has_fast_int8:
logger.warning("INT8 not supported on this platform.")
'''
if args.int8:
config.int8_calibrator = get_int8_calibrator(args.calibration_cache,
args.calibration_data,
args.max_calibration_size,
args.preprocess_func,
args.calibration_batch_size)
'''
# Fill network atrributes with information by parsing model
with open(args.onnx, "rb") as f:
if not parser.parse(f.read()):
print('ERROR: Failed to parse the ONNX file: {}'.format(args.onnx))
for error in range(parser.num_errors):
print(parser.get_error(error))
sys.exit(1)
# Display network info and check certain properties
check_network(network)
#?????????????????????????????????????????????????????
if args.explicit_batch:
# Add optimization profiles
batch_sizes = [1, 4]
inputs = [network.get_input(i) for i in range(network.num_inputs)]
opt_profiles = create_optimization_profiles(builder, inputs, batch_sizes)
add_profiles(config, inputs, opt_profiles)
# Implicit Batch Network
else:
builder.max_batch_size = args.max_batch_size
logger.info("Building Engine...")
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
logger.info("Serializing engine to file: {:}".format(args.output))
f.write(engine.serialize())
def main():
parser = argparse.ArgumentParser(description="Creates a TensorRT engine from the provided ONNX file.\n")
parser.add_argument("--onnx", type=str, default="model.onnx", help="The ONNX model file to convert to TensorRT")
parser.add_argument("-o", "--output", type=str, default="model.engine", help="The path at which to write the engine")
parser.add_argument("-b", "--max-batch-size", type=int, default=1, help="The max batch size for the TensorRT engine input")
parser.add_argument("-v", "--verbosity", action="count", help="Verbosity for logging. (None) for ERROR, (-v) for INFO/WARNING/ERROR, (-vv) for VERBOSE.")
parser.add_argument("--explicit-batch", action='store_true', help="Set trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH.")
parser.add_argument("--explicit-precision", action='store_true', help="Set trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION.")
parser.add_argument("--gpu-fallback", action='store_true', help="Set trt.BuilderFlag.GPU_FALLBACK.")
parser.add_argument("--refittable", action='store_true', help="Set trt.BuilderFlag.REFIT.")
parser.add_argument("--debug", action='store_true', help="Set trt.BuilderFlag.DEBUG.")
parser.add_argument("--strict-types", action='store_true', help="Set trt.BuilderFlag.STRICT_TYPES.")
parser.add_argument("--fp16", action="store_true", help="Attempt to use FP16 kernels when possible.")
parser.add_argument("--int8", action="store_true", help="Attempt to use INT8 kernels when possible. This should generally be used in addition to the --fp16 flag. \
ONLY SUPPORTS RESNET-LIKE MODELS SUCH AS RESNET50/VGG16/INCEPTION/etc.")
parser.add_argument("--calibration-cache", help="(INT8 ONLY) The path to read/write from calibration cache.", default="calibration.cache")
parser.add_argument("--calibration-data", help="(INT8 ONLY) The directory containing {*.jpg, *.jpeg, *.png} files to use for calibration. (ex: Imagenet Validation Set)", default=None)
parser.add_argument("--calibration-batch-size", help="(INT8 ONLY) The batch size to use during calibration.", type=int, default=32)
parser.add_argument("--max-calibration-size", help="(INT8 ONLY) The max number of data to calibrate on from --calibration-data.", type=int, default=512)
parser.add_argument("-p", "--preprocess_func", type=str, default=None, help="(INT8 ONLY) Function defined in 'processing.py' to use for pre-processing calibration data.")
args, _ = parser.parse_known_args()
if args.verbosity is None:
TRT_LOGGER.min_severity = trt.Logger.Severity.ERROR
# -v
elif args.verbosity == 1:
TRT_LOGGER.min_severity = trt.Logger.Severity.INFO
# -vv
else:
TRT_LOGGER.min_severity = trt.Logger.Severity.VERBOSE
logger.info("TRT_LOGGER Verbosity: {:}".format(TRT_LOGGER.min_severity))
base_size=32
max_batch_size=1
shape_of_output = (max_batch_size, 3, 128, 128)
image=np.random.randn(max_batch_size,3,base_size,base_size)
image = np.expand_dims(image, axis=0)
engine = get_engine(args)
if __name__ == "__main__":
main()
when I used your code to convert efficientdet-d0 model onnx => trt with int8 calibration, but catch the error RuntimeError: Drive error:
TensorRT Version: 7.1.3.1
GPU Type: GTX 1660
Nvidia Driver Version: 460.39
CUDA Version: 10.2
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.0.0 but loaded cuDNN 7.6.3
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.2 but loaded cuBLAS 10.2.1
Traceback (most recent call last):
File "onnx2trt.py", line 216, in
build_trt_engine(args)
File "onnx2trt.py", line 171, in build_trt_engine
engine = builder.build_engine(network, config)
RuntimeError: Driver error:
Add a tool to launch several docker containers Iterate through common TRT versions (6, 7, etc.) with/without OSS components
Try to parse the given onnx model without each, and print a summary of what passed/failed at the end
Instead of several profiles with min==opt==max, do profiles with some ranges of batch sizes to make it easier to migrate from an implicit batch workflow
I am trying to run a transformer model on tensorrt. To be more specific my model is vitb_256_mae based. I already converted it into onnx file format. I am trying to run it on tensorrt runtime. So in order to do that i made a python script that creates an engine for fp32 to fp16 convertion. It works just fine for this situration. After that in order to convert fp32 to int8, i used your snippet. But i got stuck with some cuda driver errors.
My model takes template and search as input, and produce 5 outputs.
In order to create optimization profiles for multiple inputs i tweaked your onnx_to_tensorrt.py a little bit:
def create_optimization_profiles(builder, inputs, batch_sizes=[1,8,16,32,64]):
# Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
if all([inp.shape[0] > -1 for inp in inputs]):
profile = builder.create_optimization_profile()
for inp in inputs:
fbs, shape = inp.shape[0], inp.shape[1:]
profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
#print(profile.get_shape("template"))
#print(profile.get_shape("search"))
return [profile]
TensorRT Version: 8.6.1 (from pip not builded from source)
GPU Type: RTX3090 ti
Nvidia Driver Version: nvidia-driver-530
CUDA Version: 12.1
CUDNN Version: dont have
Operating System + Version: ubuntu 22.04.2 lts
Python Version (if applicable): 3.10.12
I tryed int8 conversion with 1 dynamic input using this link:
NVIDIA/TensorRT#289
for alexnet code worked. Thats why i think the error might because of the number of input. Or i might have to make some changes at imagenetcalibrator file.
Thanks for reading good days !
2023-08-10 11:27:20 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR
[08/10/2023-11:27:23] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:177: DeprecationWarning: Use set_memory_pool_limit instead.
config.max_workspace_size = 4**30 # 1GiB
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.FP16
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.INT8
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Collecting calibration files from: /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/imagenet/val
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Number of Calibration Files found: 10869
2023-08-10 11:27:23 - ImagenetCalibrator - WARNING - Capping number of calibration images to max_calibration_size: 512
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[08/10/2023-11:27:23] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
2023-08-10 11:27:23 - main - DEBUG - === Network Description ===
2023-08-10 11:27:23 - main - DEBUG - Input 0 | Name: template | Shape: (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - Input 1 | Name: search | Shape: (1, 3, 256, 256)
2023-08-10 11:27:23 - main - DEBUG - Output 0 | Name: pred_boxes | Shape: (1, 1, 4)
2023-08-10 11:27:23 - main - DEBUG - Output 1 | Name: score_map | Shape: (1, 1, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 2 | Name: size_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 3 | Name: offset_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 4 | Name: backbone_feat | Shape: (1, 320, 768)
2023-08-10 11:27:23 - main - DEBUG - === Optimization Profiles ===
2023-08-10 11:27:23 - main - DEBUG - template - OptProfile 0 - Min (1, 3, 128, 128) Opt (1, 3, 128, 128) Max (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - search - OptProfile 0 - Min (1, 3, 256, 256) Opt (1, 3, 256, 256) Max (1, 3, 256, 256)
2023-08-10 11:27:23 - main - INFO - Building Engine...
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:222: DeprecationWarning: Use build_serialized_network instead.
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
[08/10/2023-11:27:26] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
2023-08-10 11:27:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0)
[08/10/2023-11:27:27] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
Traceback (most recent call last):
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 227, in
main()
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 222, in main
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: enter
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
@rmccorm4 Hi Ryan,
You have put a sample resnet50.cache file you in the below link:
https://github.com/rmccorm4/tensorrt-utils/blob/master/int8/calibration/caches/resnet50.cache
There are some unnamed layers at the end of the sample. Why are these layers unnmaed?
gpu_0/pool5_1: 3ec805a6
OC2_DUMMY_0: 3d9942c2
(Unnamed Layer* 174) [Constant]_output: 3996cbd8
(Unnamed Layer* 175) [Constant]_output: 3b64f8bc
(Unnamed Layer* 176) [Matrix Multiply]_output: 3e5b5330
gpu_0/pred_1: 3e526e4b
gpu_0/softmax_1: 3afb76dc
Thank you.
CC @mengdong
Your utils are very helpful please take my appreciation.
https://github.com/rmccorm4/tensorrt-utils/blob/master/int8/calibration/ImagenetCalibrator.py
ImagenetCalibrator.py was used well when quantizing of the ResNet50 learned by the ImageNet, but ImagenetCalibrator.py was not used well when quantizing of the ResNet32 learned by the Cifar100 (The quantization to int8 using ImagenetCalibrator.py is bigger than the quantization to fp16)
So i need a Cifar100Calibrator.py for Cifar100. Where can I get it? or What part of ImagenetCalibrator.py should i fix?
Hello, I am trying to run an onnex model using tensorrt backend, but I get the following error.
KeyError: 'output1_before_shuffle'
model = onnx.load(args.files)
onnx.checker.check_model(model)
input_shapes = [[d.dim_value for d in _input.type.tensor_type.shape.dim] for _input in model.graph.input]
print(input_shapes)
shape = np.ravel(input_shapes)
engine = backend.prepare(model, device='CUDA:0')
input_data = np.random.random(size=(20, shape[1], shape[2], shape[3])).astype(np.float32)
output_data = engine.run(input_data)
Hello
Im trying to convert onnx model with dynamic batch size created from darknet (https://github.com/WongKinYiu/ScaledYOLOv4)
to tensorrt engine. I need to create calibrated int8 engine with static batch size 2.
I use
python onnx_to_tensorrt.py -o int8_2.trt.engine --fp16 --int8 --calibration-data /data/ultra/trt/calib_ds -p preprocess_yolo -v --explicit-batch
But have inference problems with calibrated engine with batch size 2. Int 8 engine with batch size 1 and float16 engine with batch size 2 works correctly.
Where did this problem come from and how to solve it?
TensorRT Version: 8.2
GPU Type: 2080ti
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Hi Ryan,
I am trying to create a calibration file for the ResNet-18 Caffe model. You have mentioned the below statement in another issue:
I have created a reference for INT8 calibration on Imagenet-like data. Hopefully you can use this as a starting point.
However, I do not know how to continue. Since this is different than the sample.py and calibrator.py in the TensorRT 7.0 repository ( tensorrt/samples/python/int8_caffe_mnist/).
Note: I am working on the NVDLA accelerator and unfortunately the compiler of this accelerator only uses Caffe models. They have stated that they are going to add an ONNX model for the future release, hence I have no choice to work on the Caffe models until they add the ONNX feature to the compiler.
Thank you very much.
Hi,
I found this repo is very useful for helping understanding trt int8 functions.
However, I don't quitly understand the usage for calibration-data which mentioned in the README that it shall point to imagenet database. As I know, there is train and val folder for the imagenet.
So when we do the calibration, which folder shall be used? val? or both?
Thx,
Lei
Current example is a fixed-shape Resnet50 from ONNX model zoo.
This doesn't work with dynamic shape and optimization profile examples.
Maybe use Alexnet example from here instead and update README and tutorial: https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930
RuntimeError: Tried to call pure virtual function "IInt8EntropyCalibrator2_pyimpl::read_calibration_cache"
use the docker provided in the readme.
python onnx_to_tensorrt.py --onnx resnet50/model.onnx -o resnet50.int8.engine --fp16 --int8 --calibration-data imagenet_calibration/files --max-calibration-size 32
It's possible to edit an ONNX graph and change certain shapes, such as batch size, to -1 to create a dynamic shape model from an existing fixed shape model.
However, this isn't guaranteed to work in all cases, especially if a model contains any hard-coded shapes in an intermediate layer or something.
It would be nice to have a script that compares the output of the fixed shape and dynamic shape models for the same input shape, as well as for a larger batch size of identical inputs to make sure that the output for the whole batch is computed correctly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.