rmccorm4 / tensorrt-utils Goto Github PK

View Code? Open in Web Editor NEW

232.0 14.0 55.0 1.12 MB

⚡ Useful scripts when using TensorRT

License: Apache License 2.0

Shell 5.51% Python 71.18% Makefile 2.09% C++ 21.21%

tensorrt imagenet optimization gpu nvidia nvidia-docker docker python

tensorrt-utils's Introduction

TensorRT Utils

This repository contains useful scripts and code references I use or encounter when working with TensorRT.

The master branch is currently targeted at TensorRT 7.1+ (NGC 20.06+).

For earlier TensorRT versions, please see the other tags.

tensorrt-utils's People

Contributors

Stargazers

Watchers

tensorrt-utils's Issues

Is the inference for int8 the same as for fp16?

Hello, thank you for your work!

Do I have a problem with int8-engine inference this way? I loaded the engine file directly, but didn't use the calibration table, I'm a bit y

Infinity scale in calibration cache file

Hello,

Thanks for this great repo. I used the provided script to convert a searched network (onnx model) to int8 model with int8 calibration. The onnx model works fine but the output of trt engine is wrong. When I check the calibration cache file, I found many of the quantization scales are "Infinity" and the others are very large. I converted the cache file to json file, it looks as the attached image below. Any hint is very appreciated.

Environment

TensorRT Version: 7.0.0.11
GPU Type: Tesla T4
Nvidia Driver Version: 418.67
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Debian 9.11
Python Version (if applicable): 3.7.4
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): 1.4.0
Baremetal or Container (if container which image + tag): An internal Docker image

How to do inference with the saved int8 trt engine file?

I'm wondering how to do inference with the saved int8 trt engine file. Dose the process of inference is just the same as normal.

infer_tensorrt_imagenet.py: Not Predicting The Correct Classes

Hi @rmccorm4 ,

I am using the infer_tensorrt_imagenet.py file to infer the images of imagenet with the int8 engine created by TensorRT.

Here is the way that I am using the code:

python3 infer_tensorrt_imagenet.py --engine resnet18.int8.engine
-d /home/hassan/Datasets/ImageNet/ -b 1 -n 5

Here are the results:

[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000025.JPEG
Prediction: wall clock Probability: 0.09
Prediction: matchstick Probability: 0.06
Prediction: switch Probability: 0.03
Prediction: screw Probability: 0.03
Prediction: envelope Probability: 0.02

[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000073.JPEG
Prediction: wall clock Probability: 0.06
Prediction: matchstick Probability: 0.06
Prediction: switch Probability: 0.04
Prediction: screw Probability: 0.03
Prediction: envelope Probability: 0.02

[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000117.JPEG
Prediction: wing Probability: 0.07
Prediction: spotlight Probability: 0.02
Prediction: lampshade Probability: 0.02
Prediction: matchstick Probability: 0.02
Prediction: wall clock Probability: 0.02

[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use execute without batch size instead.
Allocating buffers ...
Input image: /home/hassan/Datasets/ImageNet/ILSVRC2012_val_00000113.JPEG
Prediction: wall clock Probability: 0.06
Prediction: envelope Probability: 0.06
Prediction: matchstick Probability: 0.04
Prediction: lampshade Probability: 0.03
Prediction: refrigerator Probability: 0.03

There are certain classes popping up (like wall clock). Can you kindly help?

Thank you.

Add ONNX checker by default, and maybe onnx-simplifier as option?

int8 calibration for batch > 1

Hi @rmccorm4,
I would like to ask some advice on int8 calibration. I've had no trouble building explicit batch engines where the batch > 1 with fp16 and I've managed to get int8 explicit batch engines built where the batch = 1. However, int8 calibration seems to not work for batch > 1. It calibrates without errors or failures, and my demo app runs without errors so it's getting hard to debug. Do you have any advice? I've tried building the cache with batch = 1 and then using that to build an engine of batch > 1, and it seemed to work but I haven't been able to replicate that particular result.

get wrongresult using infer_tensorrr_imagenet.py for multi-batch inference

Description

hi rmccorm4! I use your code to add optimization proflie for a Resnet101 onnx model which have a dynamic shape in batchsize, and bulid a int8 engine. But when i do inference with infer_tensorrt_imagenet.py，i only get correct result with batchsize=1. For other batchsize, i get the most of output probability equal to 0.

I know that i should add something in runtime code, and follow https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#runtime_dimensions, but i didnt figure it out. Could u show me how to make the multi-batch inference, thank u!

Environment

TensorRT Version: TensorRT 7.2.1
GPU Type: P4
CUDA Version: 11.0
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9

Relevant Files

i add codes to get the input shape with batchsize = 32,

# Run inference.
context.active_optimization_profile = 3
context_inputshape = context.get_binding_shape(0)            
context.set_binding_shape(0, context_inputshape)
context.execute(batch_size, dbindings)

still get the wrong result like this.
Input image: /user/z00590385/imagenet_test_dataset/val/n01728572/ILSVRC2012_val_00003569.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n09246464/ILSVRC2012_val_00024591.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02009912/ILSVRC2012_val_00024166.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n04399382/ILSVRC2012_val_00025599.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02948072/ILSVRC2012_val_00007580.JPEG
Prediction: drake Probability: 1.88
Prediction: sea slug Probability: 1.88
Prediction: sea anemone Probability: 1.00
Input image: /user/z00590385/imagenet_test_dataset/val/n04254777/ILSVRC2012_val_00026534.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n07684084/ILSVRC2012_val_00046840.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n02666196/ILSVRC2012_val_00030552.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n03207743/ILSVRC2012_val_00044279.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n03133878/ILSVRC2012_val_00031187.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
Input image: /user/z00590385/imagenet_test_dataset/val/n07715103/ILSVRC2012_val_00032107.JPEG
Prediction: toilet tissue Probability: 0.00
Prediction: sea urchin Probability: 0.00
Prediction: hog Probability: 0.00
.......

Steps To Reproduce

onnx to trt,choose int8 mode

Description

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

Environment

TensorRT Version: 8.4.5.1
GPU Type: NVIDIA TITAN X
Nvidia Driver Version:
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: linyx
Python Version (if applicable): 3.8.5

WARNING: Missing dynamic range for tensor 1316, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

Hi，I run yolov4 int8 and bad this problem

Attribute error _enter_ when running int8 compilation

I am getting this error while compiling for int8. The same code works fine for FP32 and FP16

[05/31/2022-05:46:12] [TRT] [E] 1: Unexpected exception 
Traceback (most recent call last):
  File "./onnx_to_tensorrt.py", line 218, in <module>
    main()
  File "./onnx_to_tensorrt.py", line 213, in main
    with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: __enter__

int8 engine for **DETECTOR models(onnx)

Description

I have seen examples where classification models can be run on TensorRT in INT8 mode. But can you get specific on what I should do to calibrate the same and produce int8 engine for DETECTOR models(onnx)

Environment

TensorRT Version:7
GPU Type: T4
Nvidia Driver Version: 440
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: 18
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

How to save the calibration.bin files?

Description

Hi, @rmccorm4
Currently, I'm trying to generate INT8 TRT engine with calibrations, like that
calibrator = Calibrator(data_loader=calib_data(), cache="identity-calib.cache") build_engine = EngineFromNetwork( NetworkFromOnnxPath("identity.onnx"), config=CreateConfig(int8=True, calibrator=calibrator) )
But I was really confused about the mechanisms:

when was calibration performed? within the 'EngineFromNetwork' process? I tried to set break-point at calibration::get_batch(), but It did not works;
How to get the calibration.bin files? I have tried to call the function of "write_calibration_cache(self, cache)", But I don't know which 'cache' to pass in.

Please help me with these two problems, Thanks a lot.

TensorRt 7.0 python API can't infer shape of IResizeLayer

Description

I used tensorRT python API to parse an onnx model from Pytorch. The model has only one resize node. But the tensorrt parser outputs a two input IResizeLayer. One of the inputs is a constant layer, with shape (4). Also, the output shape of the IResizeLayer is (-1,-,1-,1-,1).

The structure of the model:

However, this onnx model works fine with ""trtexec --explicitBatch --workspace=128 --onnx=optimized_model.onnx"" command.
model link: https://hkustconnect-my.sharepoint.com/:u:/g/personal/ycchanau_connect_ust_hk/ERJqH7chlY9FquRU4A1XIncBi4_QxCEd8wllrLn5WGqGDw?e=qzvgBf

Environment

TensorRT Version: 7.0
GPU Type: T4
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag): TensorRT Release 20.02

Relevant Files

Steps To Reproduce

I parse the model with the following code by the command "python test_tensorrt.py --explicit-batch -v --explicit-precision"

import pycuda.autoinit
import numpy as np
import pycuda.driver as cuda
import tensorrt as trt
#import torch
import os
import time
#from PIL import Image
#import cv2
#import torchvision
import sys
import glob
import math
import logging
import argparse


TRT_LOGGER = trt.Logger()
logging.basicConfig(level=logging.DEBUG,
                    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
                    datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger(__name__)

def add_profiles(config, inputs, opt_profiles):
    logger.debug("=== Optimization Profiles ===")
    for i, profile in enumerate(opt_profiles):
        for inp in inputs:
            _min, _opt, _max = profile.get_shape(inp.name)
            logger.debug("{} - OptProfile {} - Min {} Opt {} Max {}".format(inp.name, i, _min, _opt, _max))
        config.add_optimization_profile(profile)

def mark_outputs(network):
    # Mark last layer's outputs if not already marked
    # NOTE: This may not be correct in all cases
    last_layer = network.get_layer(network.num_layers-1)
    if not last_layer.num_outputs:
        logger.error("Last layer contains no outputs.")
        return

    for i in range(last_layer.num_outputs):
        network.mark_output(last_layer.get_output(i))


def check_network(network):
    if not network.num_outputs:
        logger.warning("No output nodes found, marking last layer's outputs as network outputs. Correct this if wrong.")
        mark_outputs(network)
    
    inputs = [network.get_input(i) for i in range(network.num_inputs)]
    outputs = [network.get_output(i) for i in range(network.num_outputs)]
    max_len = max([len(inp.name) for inp in inputs] + [len(out.name) for out in outputs])

    logger.debug("=== Network Description ===")
    for i, inp in enumerate(inputs):
        logger.debug("Input  {0} | Name: {1:{2}} | Shape: {3}".format(i, inp.name, max_len, inp.shape))
    for i, out in enumerate(outputs):
        logger.debug("Output {0} | Name: {1:{2}} | Shape: {3}".format(i, out.name, max_len, out.shape))


def get_batch_sizes(max_batch_size):
    # Returns powers of 2, up to and including max_batch_size
    max_exponent = math.log2(max_batch_size)
    for i in range(int(max_exponent)+1):
        batch_size = 2**i
        yield batch_size
    
    if max_batch_size != batch_size:
        yield max_batch_size


# TODO: This only covers dynamic shape for batch size, not dynamic shape for other dimensions
def create_optimization_profiles(builder, inputs, batch_sizes=[1,4,8]): 
    # Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
    if all([inp.shape[0] > -1 for inp in inputs]):
        profile = builder.create_optimization_profile()
        for inp in inputs:
            fbs, shape = inp.shape[0], inp.shape[1:]
            profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
            return [profile]
    
    # Otherwise for mixed fixed+dynamic explicit batch inputs, create several profiles
    profiles = {}
    for bs in batch_sizes:
        if not profiles.get(bs):
            profiles[bs] = builder.create_optimization_profile()

        for inp in inputs: 
            shape = inp.shape[1:]
            # Check if fixed explicit batch
            if inp.shape[0] > -1:
                bs = inp.shape[0]

            profiles[bs].set_shape(inp.name, min=(bs, *shape), opt=(bs, *shape), max=(bs, *shape))

    return list(profiles.values())

def get_engine(args):
    # Network flags
    network_flags = 0
    if args.explicit_batch:
        network_flags |= 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    if args.explicit_precision:
        network_flags |= 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)

    builder_flag_map = {
            'gpu_fallback': trt.BuilderFlag.GPU_FALLBACK,
            'refittable': trt.BuilderFlag.REFIT,
            'debug': trt.BuilderFlag.DEBUG,
            'strict_types': trt.BuilderFlag.STRICT_TYPES,
            'fp16': trt.BuilderFlag.FP16,
            'int8': trt.BuilderFlag.INT8,
    }

    # Building engine
    with trt.Builder(TRT_LOGGER) as builder, \
         builder.create_network(network_flags) as network, \
         builder.create_builder_config() as config, \
         trt.OnnxParser(network, TRT_LOGGER) as parser:
            
        config.max_workspace_size = 2**27 # 1GiB

        # Set Builder Config Flags
        for flag in builder_flag_map:
            if getattr(args, flag):
                logger.info("Setting {}".format(builder_flag_map[flag]))
                config.set_flag(builder_flag_map[flag])

        if args.fp16 and not builder.platform_has_fast_fp16:
            logger.warning("FP16 not supported on this platform.")

        if args.int8 and not builder.platform_has_fast_int8:
            logger.warning("INT8 not supported on this platform.")
        '''
        if args.int8:
            config.int8_calibrator = get_int8_calibrator(args.calibration_cache,
                                                         args.calibration_data,
                                                         args.max_calibration_size,
                                                         args.preprocess_func,
                                                         args.calibration_batch_size)
        '''

        # Fill network atrributes with information by parsing model
        with open(args.onnx, "rb") as f:
            if not parser.parse(f.read()):
                print('ERROR: Failed to parse the ONNX file: {}'.format(args.onnx))
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                sys.exit(1)

        # Display network info and check certain properties
        check_network(network)
        #?????????????????????????????????????????????????????
        if args.explicit_batch:
            # Add optimization profiles
            batch_sizes = [1, 4]
            inputs = [network.get_input(i) for i in range(network.num_inputs)]
            opt_profiles = create_optimization_profiles(builder, inputs, batch_sizes)
            add_profiles(config, inputs, opt_profiles)
        # Implicit Batch Network
        else:
            builder.max_batch_size = args.max_batch_size

        logger.info("Building Engine...")
        with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
            logger.info("Serializing engine to file: {:}".format(args.output))
            f.write(engine.serialize())

def main():
    parser = argparse.ArgumentParser(description="Creates a TensorRT engine from the provided ONNX file.\n")
    parser.add_argument("--onnx", type=str, default="model.onnx", help="The ONNX model file to convert to TensorRT")
    parser.add_argument("-o", "--output", type=str, default="model.engine", help="The path at which to write the engine")
    parser.add_argument("-b", "--max-batch-size", type=int, default=1, help="The max batch size for the TensorRT engine input")
    parser.add_argument("-v", "--verbosity", action="count", help="Verbosity for logging. (None) for ERROR, (-v) for INFO/WARNING/ERROR, (-vv) for VERBOSE.")
    parser.add_argument("--explicit-batch", action='store_true', help="Set trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH.")
    parser.add_argument("--explicit-precision", action='store_true', help="Set trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION.")
    parser.add_argument("--gpu-fallback", action='store_true', help="Set trt.BuilderFlag.GPU_FALLBACK.")
    parser.add_argument("--refittable", action='store_true', help="Set trt.BuilderFlag.REFIT.")
    parser.add_argument("--debug", action='store_true', help="Set trt.BuilderFlag.DEBUG.")
    parser.add_argument("--strict-types", action='store_true', help="Set trt.BuilderFlag.STRICT_TYPES.")
    parser.add_argument("--fp16", action="store_true", help="Attempt to use FP16 kernels when possible.")
    parser.add_argument("--int8", action="store_true", help="Attempt to use INT8 kernels when possible. This should generally be used in addition to the --fp16 flag. \
                                                             ONLY SUPPORTS RESNET-LIKE MODELS SUCH AS RESNET50/VGG16/INCEPTION/etc.")
    parser.add_argument("--calibration-cache", help="(INT8 ONLY) The path to read/write from calibration cache.", default="calibration.cache")
    parser.add_argument("--calibration-data", help="(INT8 ONLY) The directory containing {*.jpg, *.jpeg, *.png} files to use for calibration. (ex: Imagenet Validation Set)", default=None)
    parser.add_argument("--calibration-batch-size", help="(INT8 ONLY) The batch size to use during calibration.", type=int, default=32)
    parser.add_argument("--max-calibration-size", help="(INT8 ONLY) The max number of data to calibrate on from --calibration-data.", type=int, default=512)
    parser.add_argument("-p", "--preprocess_func", type=str, default=None, help="(INT8 ONLY) Function defined in 'processing.py' to use for pre-processing calibration data.")
    args, _ = parser.parse_known_args()

    if args.verbosity is None:
        TRT_LOGGER.min_severity = trt.Logger.Severity.ERROR
    # -v
    elif args.verbosity == 1:
        TRT_LOGGER.min_severity = trt.Logger.Severity.INFO
    # -vv
    else:
        TRT_LOGGER.min_severity = trt.Logger.Severity.VERBOSE
    logger.info("TRT_LOGGER Verbosity: {:}".format(TRT_LOGGER.min_severity))


    base_size=32
    max_batch_size=1
    shape_of_output = (max_batch_size, 3, 128, 128)

    image=np.random.randn(max_batch_size,3,base_size,base_size)
    image = np.expand_dims(image, axis=0)

    engine = get_engine(args)

if __name__ == "__main__":
    main()

int8 calibration build engine with RuntimeErrpr: Drive error:

Description

when I used your code to convert efficientdet-d0 model onnx => trt with int8 calibration, but catch the error RuntimeError: Drive error:

Environment

TensorRT Version: 7.1.3.1
GPU Type: GTX 1660
Nvidia Driver Version: 460.39
CUDA Version: 10.2
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6

Relevant Files

Steps To Reproduce

[TensorRT] WARNING: TensorRT was linked against cuDNN 8.0.0 but loaded cuDNN 7.6.3
[TensorRT] WARNING: TensorRT was linked against cuBLAS 10.2.2 but loaded cuBLAS 10.2.1
Traceback (most recent call last):
File "onnx2trt.py", line 216, in
build_trt_engine(args)
File "onnx2trt.py", line 171, in build_trt_engine
engine = builder.build_engine(network, config)
RuntimeError: Driver error:

Add tool to iterate through TRT versions with/without OSS to parse models

Add a tool to launch several docker containers Iterate through common TRT versions (6, 7, etc.) with/without OSS components

Try to parse the given onnx model without each, and print a summary of what passed/failed at the end

Change default opt profile behavior to be more like implicit batch

Instead of several profiles with min==opt==max, do profiles with some ranges of batch sizes to make it easier to migrate from an implicit batch workflow

Tensorrt fp32 to int8 convertion error on model that has 2 inputs

Description

I am trying to run a transformer model on tensorrt. To be more specific my model is vitb_256_mae based. I already converted it into onnx file format. I am trying to run it on tensorrt runtime. So in order to do that i made a python script that creates an engine for fp32 to fp16 convertion. It works just fine for this situration. After that in order to convert fp32 to int8, i used your snippet. But i got stuck with some cuda driver errors.
My model takes template and search as input, and produce 5 outputs.

Important changes

In order to create optimization profiles for multiple inputs i tweaked your onnx_to_tensorrt.py a little bit:

def create_optimization_profiles(builder, inputs, batch_sizes=[1,8,16,32,64]): 
    # Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
    if all([inp.shape[0] > -1 for inp in inputs]):
        profile = builder.create_optimization_profile()
        for inp in inputs:
            fbs, shape = inp.shape[0], inp.shape[1:]
            profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
        #print(profile.get_shape("template"))
        #print(profile.get_shape("search"))
        return [profile]

Environment

TensorRT Version: 8.6.1 (from pip not builded from source)
GPU Type: RTX3090 ti
Nvidia Driver Version: nvidia-driver-530
CUDA Version: 12.1
CUDNN Version: dont have
Operating System + Version: ubuntu 22.04.2 lts
Python Version (if applicable): 3.10.12

More info

I tryed int8 conversion with 1 dynamic input using this link:
NVIDIA/TensorRT#289
for alexnet code worked. Thats why i think the error might because of the number of input. Or i might have to make some changes at imagenetcalibrator file.

Thanks for reading good days !

Output errors

2023-08-10 11:27:20 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR
[08/10/2023-11:27:23] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:177: DeprecationWarning: Use set_memory_pool_limit instead.
config.max_workspace_size = 4**30 # 1GiB
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.FP16
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.INT8
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Collecting calibration files from: /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/imagenet/val
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Number of Calibration Files found: 10869
2023-08-10 11:27:23 - ImagenetCalibrator - WARNING - Capping number of calibration images to max_calibration_size: 512
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[08/10/2023-11:27:23] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
2023-08-10 11:27:23 - main - DEBUG - === Network Description ===
2023-08-10 11:27:23 - main - DEBUG - Input 0 | Name: template | Shape: (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - Input 1 | Name: search | Shape: (1, 3, 256, 256)
2023-08-10 11:27:23 - main - DEBUG - Output 0 | Name: pred_boxes | Shape: (1, 1, 4)
2023-08-10 11:27:23 - main - DEBUG - Output 1 | Name: score_map | Shape: (1, 1, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 2 | Name: size_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 3 | Name: offset_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 4 | Name: backbone_feat | Shape: (1, 320, 768)
2023-08-10 11:27:23 - main - DEBUG - === Optimization Profiles ===
2023-08-10 11:27:23 - main - DEBUG - template - OptProfile 0 - Min (1, 3, 128, 128) Opt (1, 3, 128, 128) Max (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - search - OptProfile 0 - Min (1, 3, 256, 256) Opt (1, 3, 256, 256) Max (1, 3, 256, 256)
2023-08-10 11:27:23 - main - INFO - Building Engine...
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:222: DeprecationWarning: Use build_serialized_network instead.
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
[08/10/2023-11:27:26] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
2023-08-10 11:27:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0)
[08/10/2023-11:27:27] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
Traceback (most recent call last):
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 227, in
main()
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 222, in main
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: enter
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

Unnamed Layers in Sample ResNet-50 Cache File

@rmccorm4 Hi Ryan,

You have put a sample resnet50.cache file you in the below link:

https://github.com/rmccorm4/tensorrt-utils/blob/master/int8/calibration/caches/resnet50.cache

There are some unnamed layers at the end of the sample. Why are these layers unnmaed?

gpu_0/pool5_1: 3ec805a6
OC2_DUMMY_0: 3d9942c2
(Unnamed Layer* 174) [Constant]_output: 3996cbd8
(Unnamed Layer* 175) [Constant]_output: 3b64f8bc
(Unnamed Layer* 176) [Matrix Multiply]_output: 3e5b5330
gpu_0/pred_1: 3e526e4b
gpu_0/softmax_1: 3afb76dc

Thank you.

Add TRTIS dynamic shape engine example/doc

CC @mengdong

The indentation maybe excessive

tensorrt-utils/int8/calibration/onnx_to_tensorrt.py

Line 89 in 2c49b84

return [profile]

Your utils are very helpful please take my appreciation.

Cifar100 calibrator

https://github.com/rmccorm4/tensorrt-utils/blob/master/int8/calibration/ImagenetCalibrator.py

ImagenetCalibrator.py was used well when quantizing of the ResNet50 learned by the ImageNet, but ImagenetCalibrator.py was not used well when quantizing of the ResNet32 learned by the Cifar100 (The quantization to int8 using ImagenetCalibrator.py is bigger than the quantization to fp16)

So i need a Cifar100Calibrator.py for Cifar100. Where can I get it? or What part of ImagenetCalibrator.py should i fix?

Vague error.

Hello, I am trying to run an onnex model using tensorrt backend, but I get the following error.
KeyError: 'output1_before_shuffle'

model = onnx.load(args.files)
onnx.checker.check_model(model)
input_shapes = [[d.dim_value for d in _input.type.tensor_type.shape.dim] for _input in model.graph.input]
print(input_shapes)
shape = np.ravel(input_shapes)
engine = backend.prepare(model, device='CUDA:0')
input_data = np.random.random(size=(20, shape[1], shape[2], shape[3])).astype(np.float32)
output_data = engine.run(input_data)

converting scaled yolov4-scp from onnx to tensorrt

Hello

Im trying to convert onnx model with dynamic batch size created from darknet (https://github.com/WongKinYiu/ScaledYOLOv4)
to tensorrt engine. I need to create calibrated int8 engine with static batch size 2.

I use
python onnx_to_tensorrt.py -o int8_2.trt.engine --fp16 --int8 --calibration-data /data/ultra/trt/calib_ds -p preprocess_yolo -v --explicit-batch

But have inference problems with calibrated engine with batch size 2. Int 8 engine with batch size 1 and float16 engine with batch size 2 works correctly.

Where did this problem come from and how to solve it?

Output of inference with an engine converted using "onnx_to_tensorrt.py"

how build engine with dynamic shape to exectue INT8 calibration

Description

Environment

TensorRT Version: 8.2
GPU Type: 2080ti
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Generating int8 Calibration File for ResNet-18 Caffe Model

Hi Ryan,

I am trying to create a calibration file for the ResNet-18 Caffe model. You have mentioned the below statement in another issue:

I have created a reference for INT8 calibration on Imagenet-like data. Hopefully you can use this as a starting point.

However, I do not know how to continue. Since this is different than the sample.py and calibrator.py in the TensorRT 7.0 repository ( tensorrt/samples/python/int8_caffe_mnist/).

Note: I am working on the NVDLA accelerator and unfortunately the compiler of this accelerator only uses Caffe models. They have stated that they are going to add an ONNX model for the future release, hence I have no choice to work on the Caffe models until they add the ONNX feature to the compiler.

Thank you very much.

End-to-end simple plugin example

Some questions about int8 quat

Hi,

I found this repo is very useful for helping understanding trt int8 functions.
However, I don't quitly understand the usage for calibration-data which mentioned in the README that it shall point to imagenet database. As I know, there is train and val folder for the imagenet.

So when we do the calibration, which folder shall be used? val? or both?

Thx,
Lei

Add an ONNX model with dynamic shape for the example

Current example is a fixed-shape Resnet50 from ONNX model zoo.

This doesn't work with dynamic shape and optimization profile examples.

Maybe use Alexnet example from here instead and update README and tutorial: https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930

calibration with files get error

Description

RuntimeError: Tried to call pure virtual function "IInt8EntropyCalibrator2_pyimpl::read_calibration_cache"

Environment

use the docker provided in the readme.

Relevant Files

Steps To Reproduce

python onnx_to_tensorrt.py --onnx resnet50/model.onnx -o resnet50.int8.engine --fp16 --int8 --calibration-data imagenet_calibration/files --max-calibration-size 32

Script to validate converting Fixed -> Dynamic shape ONNX model

It's possible to edit an ONNX graph and change certain shapes, such as batch size, to -1 to create a dynamic shape model from an existing fixed shape model.

However, this isn't guaranteed to work in all cases, especially if a model contains any hard-coded shapes in an intermediate layer or something.

It would be nice to have a script that compares the output of the fixed shape and dynamic shape models for the same input shape, as well as for a larger batch size of identical inputs to make sure that the output for the whole batch is computed correctly.

rmccorm4 / tensorrt-utils Goto Github PK

tensorrt-utils's Introduction

TensorRT Utils

tensorrt-utils's People

Contributors

Stargazers

Watchers

Forkers

tensorrt-utils's Issues

Environment

Description

Environment

Relevant Files

Steps To Reproduce

Description

Environment

Description

Environment

Relevant Files

Steps To Reproduce

Description

Description

Environment

Relevant Files

Steps To Reproduce

Description

Environment

Relevant Files

Steps To Reproduce

Description

Important changes

Environment

More info

Output errors

Description

Environment

Description

Environment

Relevant Files

Steps To Reproduce

Recommend Projects

Recommend Topics

Recommend Org