Comments (7)
Can you share the full reproducer that quantize the model so I can reproduce it?
from onnxruntime.
Sure @yihonglyu, here's a minimal example where I was able to reproduce it with:
Data reader:
import numpy as np
import onnxruntime
import os
import random
from PIL import Image
from tqdm import tqdm
from onnxruntime.quantization import CalibrationDataReader
class RandomCalibrationDataReader(CalibrationDataReader):
def __init__(self, model_path, none1, limit=10):
self.model_path = model_path
self.limit = limit
self.index = 0
# Initialize ONNX runtime session to get input shape.
self.session = onnxruntime.InferenceSession(model_path, providers=['CPUExecutionProvider'])
self.input_shape = self.session.get_inputs()[0].shape
self.target_size = (640, 640) # Assuming the target size
self.datasize = limit
def get_next(self):
if self.index < self.datasize:
self.index += 1
return {self.session.get_inputs()[0].name: np.random.random(self.input_shape).astype(np.float32)}
def rewind(self):
self.index = 0
Quantization:
from ultralytics import YOLO
import sys
import os
import onnxruntime as ort
from onnxruntime.quantization import QuantType, quantize
from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model
from onnxruntime.quantization.shape_inference import quant_pre_process
from utils.data_reader import CalibrationDataReader, RandomCalibrationDataReader
model_name = 'yolov8x.pt'
model = YOLO(model_name)
model.export(format='onnx')
input_model_path = model_name.replace('.pt', '.onnx')
# Quantization
data_reader = RandomCalibrationDataReader(input_model_path, '.', limit=200)
preproc_model_path = 'model.preproc.onnx'
quant_pre_process(input_model_path, preproc_model_path, skip_optimization=False)
model_changed = qnn_preprocess_model(preproc_model_path, preproc_model_path)
print(f'Model changed? {model_changed}')
model_to_quantize = preproc_model_path if model_changed else input_model_path
print(f'Model to quantize: {model_to_quantize}')
qnn_config = get_qnn_qdq_config(model_to_quantize,
data_reader,
activation_type=QuantType.QUInt8,
weight_type=QuantType.QUInt8,
per_channel=False,
activation_symmetric=True,
weight_symmetric=True)
output_model_path = 'model.qdq.onnx'
quantize(model_to_quantize, 'model.qdq.onnx', qnn_config)
def test_model(model_path, input_data):
print(input_data['images'].shape)
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
outputs = session.run(None, input_data)
return outputs[0]
# Initialize the data reader for the validation dataset
validation_data_reader = CalibrationDataReader(input_model_path, '.', limit=10)
# Accumulate errors
errors = []
# Loop through all data provided by the data reader
while True:
input_data = validation_data_reader.get_next()
if input_data is None:
break # End of data
orig_outputs = test_model(input_model_path, input_data)
quant_outputs = test_model(output_model_path, input_data)
# Compute absolute error for the current batch and store it
batch_error = np.abs(orig_outputs - quant_outputs)
errors.append(batch_error)
# Compute the mean of all errors
if errors:
avg_abs_error = np.mean(np.concatenate(errors)) # Concatenate to handle multiple batches
print(f'Average absolute error per output: {avg_abs_error}')
else:
print("No data available to compute error.")
The error happens during inference after quantization:
2024-06-26 15:03:03.367426450 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running QLinearConcat node. Name:'/model.12/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39
---------------------------------------------------------------------------
Fail Traceback (most recent call last)
Cell In[43], line 14
11 break # End of data
13 orig_outputs = test_model(input_model_path, input_data)
---> 14 quant_outputs = test_model(output_model_path, input_data)
16 # Compute absolute error for the current batch and store it
17 batch_error = np.abs(orig_outputs - quant_outputs)
Cell In[42], line 4, in test_model(model_path, input_data)
2 print(input_data['images'].shape)
3 session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
----> 4 outputs = session.run(None, input_data)
5 return outputs[0]
File ~/anaconda3/envs/yolo/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220, in Session.run(self, output_names, input_feed, run_options)
218 output_names = [output.name for output in self._outputs_meta]
219 try:
--> 220 return self._sess.run(output_names, input_feed, run_options)
221 except C.EPFail as err:
222 if self._enable_fallback:
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running QLinearConcat node. Name:'/model.12/Concat' Status Message: concat.cc:154 PrepareForCompute Non concat axis dimensions must match: Axis 1 has mismatched dimensions of 40 and 39
from onnxruntime.
Let me know if you are able to reproduce or have issues running this!
from onnxruntime.
Could you share the model for the reproducer, too? Thanks
from onnxruntime.
When you said, "regression in onnxruntime functionality", do you mean it used to work before?
from onnxruntime.
Yes, I have a model that I previously quantized with ORT successfully but I don't remember which versions of ultralytics/ort/onnx I used. I'm trying to reproduce it now.
from onnxruntime.
@HectorSVC @yihonglyu Ok, I've been able to reproduce. This is the issue I get with the latest versions of ORT:
Traceback (most recent call last):
File "onnx_session.py", line 13, in <module>
session = onnxruntime.InferenceSession(sys.argv[1], sess_options=options, providers=["QNNExecutionProvider"], provider_options=[{"backend_path": "/root/qnn/lib/libQnnHtp.so"}])
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node 'Conv_token_423' OpType:Conv with domain:com.ms.internal.nhwc was inserted using the NHWC format as requested by QNNExecutionProvider, but was not selected by that EP. This means the graph is now invalid as there will not be an EP able to run the node. This could be a bug in layout transformer, or in the GetCapability implementation of the EP.
With ORT 1.17, it runs fine. When running with more recent versions, I get the error. Potentially related (but different op?): #16462
Note that this is when I exclude the last conv layer from quantization.
from onnxruntime.
Related Issues (20)
- [Build] AllocatorTest.CUDAAllocatorFallbackTest failed HOT 1
- [Performance] Get nan value when I block all the node in fp16 conversion HOT 8
- [Bug] The per_tensor quantized weight type of matmul is wrong HOT 1
- ONNX Runtime 1.18.1 CUDA 12.4 cuDNN 9.2 breaks inference with repeated inputs when enable_mem_reuse is enabled
- Latest Release(1.18.1) Java Artifacts Unavailable HOT 1
- [Build] C++ API cannot be reliably linked with an program using CMake
- [BUG] CANN: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError]
- [Build] Cross compilation of the library for ARMv7 32bit target with gcc 8.3 HOT 4
- CUDA 12 and session.get_providers() not showing CUDAExecutionProvider HOT 9
- [Web] Memory access out of bounds / alignment fault
- An error occurred when I installed onnxruntime-qnn in an Arm environment HOT 3
- [Performance] Multiple Sessions on Same GPU is very slow
- [Models larger than 2GB :(] Specify mid-graph.output after initializing InferenceSession HOT 2
- [Error] [ONNXRuntimeError] : 1 : FAIL : CUDA failure 3: initialization error HOT 4
- [Build] long paths in NuGet package breaking build on Windows
- [Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear?
- Missing onnxruntime_perf_test.exe in Release Assets (or what actually is "Build Drop"?) HOT 2
- [Build]: cmake', '--build', '/temp/liz/onnxruntime/build/Linux/RelWithDebInfo', '--config', 'RelWithDebInfo', '--', '-j64'] HOT 1
- [Feature Request] Request grid_sample 5D support 🌟 HOT 1
- [Build][Bug] The compiler doesn't support BFLOAT16!!! HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxruntime.