Comments (24)
Hi @nevoGoldamn Could you give some hints on which part of the code you think should be updated ?
from yolov7_openvino_cpp-python.
Hi @OpenVINO-dev-contest, I know the tensor shape is what it is. but is it logical to make inference on yolov7-tiny (train on 640x640) at 400ms ? because when I run detect.py (from yolov7 repository) it makes inference at 8ms.
from yolov7_openvino_cpp-python.
Did you run detect.py with dGPU device ? this repository only works for Intel's CPU and iGPU
from yolov7_openvino_cpp-python.
both on iGPU
from yolov7_openvino_cpp-python.
I don't see detect.py
can support iGPU:
https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L173
May I know the full command you used to run detect.py ?
from yolov7_openvino_cpp-python.
O.k. retested it on CPU
this is output after detect.py
and this is the output after the cpp code (in seconds).
same model, same CPU....
from yolov7_openvino_cpp-python.
May I know the full command you used to run detect.py ?
from yolov7_openvino_cpp-python.
python detect.py --weights best_costum.pt --conf 0.2 --img-size 640 --source test7.avi --no-trace
made the change in the code to --> device = select_device('cpu') #opt.device
from yolov7_openvino_cpp-python.
How do you measure the time cost in cpp example ? Does it include the total process of detection or just inference time ?
from yolov7_openvino_cpp-python.
double start, end, res;
start = cv::getTickCount();
//cv::cvtColor(boxed, img, cv::COLOR_BGR2RGB);
// -------- Step 5. Prepare input --------
//ov::Tensor input_tensor1 = infer_request.get_input_tensor(0);
// NHWC => NCHW
boxed.convertTo(boxed, CV_32FC3);
//ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), data1);
ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), (float*)boxed.data);
infer_request.set_input_tensor(input_tensor);
// -------- Step 6. Start inference --------
infer_request.infer();
// -------- Step 7. Process output --------
//auto output_tensor_p8 = infer_request.get_output_tensor(0);
//const float* result_p8 = output_tensor_p8.data<const float>();
//auto output_tensor_p16 = infer_request.get_output_tensor(1);
//const float* result_p16 = output_tensor_p16.data<const float>();
auto output_tensor_p32 = infer_request.get_output_tensor(2);
const float* result_p32 = output_tensor_p32.data<const float>();
std::vector<Object> proposals;
std::vector<Object> objects8;
std::vector<Object> objects16;
std::vector<Object> objects32;
std::vector<Object> objects;
//generate_proposals(8, result_p8, prob_threshold, objects8);
//proposals.insert(proposals.end(), objects8.begin(), objects8.end());
//generate_proposals(16, result_p16, prob_threshold, objects16);
//proposals.insert(proposals.end(), objects16.begin(), objects16.end());
generate_proposals(32, result_p32, prob_threshold, objects32);
proposals.insert(proposals.end(), objects32.begin(), objects32.end());
std::vector<int> classIds;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
for (size_t i = 0; i < proposals.size(); i++)
{
classIds.push_back(proposals[i].label);
confidences.push_back(proposals[i].prob);
boxes.push_back(proposals[i].rect);
}
std::vector<int> picked;
// do non maximum suppression for each bounding boxx
cv::dnn::NMSBoxes(boxes, confidences, prob_threshold, nms_threshold, picked);
float raw_h = src_img.rows;
float raw_w = src_img.cols;
float ratio_x = (float)raw_w / img_w;
float ratio_y = (float)raw_h / img_h;
end = cv::getTickCount();
for (size_t i = 0; i < picked.size(); i++)
{
int idx = picked[i];
cv::Rect box = boxes[idx];
cv::Rect scaled_box = scale_box(box, padd);
drawPred(classIds[idx], confidences[idx], scaled_box, padd[2], raw_h, raw_w, src_img, class_names);
}
res = (end - start) / cv::getTickFrequency();
cout << "time of output --> " << res;
`
all the preprocessing is no included in the time calculations.
from yolov7_openvino_cpp-python.
Thanks, I see.
Did you measure the time cost in detect.py for same part of code ?
from yolov7_openvino_cpp-python.
It seems that in detect.py they were doing the same as I did in the cpp code... don't get why there's this huge time difference..
from yolov7_openvino_cpp-python.
hi I implemented your method to measure the time cost:
time of output --> 0.0252228
Can you also try command benchmark_app -m ./yolov7-tiny.onnx
to test the model's pure performance ?
from yolov7_openvino_cpp-python.
on what CPU ?
"Can you also try command benchmark_app -m ./yolov7-tiny.onnx to test the model's pure performance ?"
where to implement the command ?
from yolov7_openvino_cpp-python.
can you show how did you get the result "time of output --> 0.0252228" ?
from yolov7_openvino_cpp-python.
Im with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz.
double start, end, res; start = cv::getTickCount(); //cv::cvtColor(boxed, img, cv::COLOR_BGR2RGB); // -------- Step 5. Prepare input -------- //ov::Tensor input_tensor1 = infer_request.get_input_tensor(0); // NHWC => NCHW boxed.convertTo(boxed, CV_32FC3); //ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), data1); ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), (float*)boxed.data); infer_request.set_input_tensor(input_tensor); // -------- Step 6. Start inference -------- infer_request.infer(); // -------- Step 7. Process output -------- //auto output_tensor_p8 = infer_request.get_output_tensor(0); //const float* result_p8 = output_tensor_p8.data<const float>(); //auto output_tensor_p16 = infer_request.get_output_tensor(1); //const float* result_p16 = output_tensor_p16.data<const float>(); auto output_tensor_p32 = infer_request.get_output_tensor(2); const float* result_p32 = output_tensor_p32.data<const float>(); std::vector<Object> proposals; std::vector<Object> objects8; std::vector<Object> objects16; std::vector<Object> objects32; std::vector<Object> objects; //generate_proposals(8, result_p8, prob_threshold, objects8); //proposals.insert(proposals.end(), objects8.begin(), objects8.end()); //generate_proposals(16, result_p16, prob_threshold, objects16); //proposals.insert(proposals.end(), objects16.begin(), objects16.end()); generate_proposals(32, result_p32, prob_threshold, objects32); proposals.insert(proposals.end(), objects32.begin(), objects32.end()); std::vector<int> classIds; std::vector<float> confidences; std::vector<cv::Rect> boxes; for (size_t i = 0; i < proposals.size(); i++) { classIds.push_back(proposals[i].label); confidences.push_back(proposals[i].prob); boxes.push_back(proposals[i].rect); } std::vector<int> picked; // do non maximum suppression for each bounding boxx cv::dnn::NMSBoxes(boxes, confidences, prob_threshold, nms_threshold, picked); float raw_h = src_img.rows; float raw_w = src_img.cols; float ratio_x = (float)raw_w / img_w; float ratio_y = (float)raw_h / img_h; end = cv::getTickCount(); for (size_t i = 0; i < picked.size(); i++) { int idx = picked[i]; cv::Rect box = boxes[idx]; cv::Rect scaled_box = scale_box(box, padd); drawPred(classIds[idx], confidences[idx], scaled_box, padd[2], raw_h, raw_w, src_img, class_names); } res = (end - start) / cv::getTickFrequency(); cout << "time of output --> " << res;
` all the preprocessing is no included in the time calculations.
Im with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz.
and use the same method as you did to measure the time.
from yolov7_openvino_cpp-python.
on what CPU ? "Can you also try command benchmark_app -m ./yolov7-tiny.onnx to test the model's pure performance ?" where to implement the command ?
you can try install this tool by pip install openvino-dev
, and run it in a terminal.
from yolov7_openvino_cpp-python.
Microsoft Windows [Version 10.0.19044.2486]
(c) Microsoft Corporation. All rights reserved.
C:\Users\Nevo>cd Desktop
C:\Users\Nevo\Desktop>benchmark_app -m best_costum.onnx
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 968.16 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,3,80,80,11]
[ INFO ] 1387 (node: 1387) : f32 / [...] / [1,3,40,40,11]
[ INFO ] 1401 (node: 1401) : f32 / [...] / [1,3,20,20,11]
[ INFO ] 1415 (node: 1415) : f32 / [...] / [1,3,10,10,11]
[ INFO ] 1350 (node: 1350) : f32 / [...] / [1,320,80,80]
[ INFO ] 1353 (node: 1353) : f32 / [...] / [1,640,40,40]
[ INFO ] 1356 (node: 1356) : f32 / [...] / [1,960,20,20]
[ INFO ] 1359 (node: 1359) : f32 / [...] / [1,1280,10,10]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,3,80,80,11]
[ INFO ] 1387 (node: 1387) : f32 / [...] / [1,3,40,40,11]
[ INFO ] 1401 (node: 1401) : f32 / [...] / [1,3,20,20,11]
[ INFO ] 1415 (node: 1415) : f32 / [...] / [1,3,10,10,11]
[ INFO ] 1350 (node: 1350) : f32 / [...] / [1,320,80,80]
[ INFO ] 1353 (node: 1353) : f32 / [...] / [1,640,40,40]
[ INFO ] 1356 (node: 1356) : f32 / [...] / [1,960,20,20]
[ INFO ] 1359 (node: 1359) : f32 / [...] / [1,1280,10,10]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 1407.88 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch-jit-export
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] NUM_STREAMS: 4
[ INFO ] AFFINITY: Affinity.NONE
[ INFO ] INFERENCE_NUM_THREADS: 8
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 1067.40 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 96 iterations
[ INFO ] Duration: 63219.74 ms
[ INFO ] Latency:
[ INFO ] Median: 2594.64 ms
[ INFO ] Average: 2625.73 ms
[ INFO ] Min: 2238.78 ms
[ INFO ] Max: 3206.64 ms
[ INFO ] Throughput: 1.52 FPS
what can you infer from that ?
p.s. thank you for the help.
from yolov7_openvino_cpp-python.
maybe something with the conversion to onnx ?
from yolov7_openvino_cpp-python.
You can get a baseline performance of your model from benchmark_app on you device , which is about 1s for each inference with OpenVINO. Is your model yolov7 or yolov7-tiny in this case ?
from yolov7_openvino_cpp-python.
I think it's yolov7-tiny but I'll re-check that, moreover I have the same CPU as yours, what can be reason of this gap in inference time ?
from yolov7_openvino_cpp-python.
Maybe your CPU is busy with other applications ? Seems the model you feed-in to benchmark_app is YOLOv7, 1s for Tiny is too slow.
from yolov7_openvino_cpp-python.
I run an ssd model at 40ms can explain ? same CPU situation...
from yolov7_openvino_cpp-python.
Sorry i have no idea on it. maybe you could test the SDD model on benchmark_app as well, and see whether the performance gap make sense for you.
from yolov7_openvino_cpp-python.
Related Issues (20)
- Yolov7 Tiny setting confidence Thres HOT 4
- Link not working HOT 1
- any support for --grid parameter while exporting .onnx model? HOT 24
- [Bug] The line `img.transpose(2, 0, 1)` should be `img = img.transpose(2, 0, 1)`. NumPy's transpose operation does not support in-place assignment. HOT 1
- adding tracker deepsort/sort (int 8 or openvo ir ) to object detection .onnx file or .int8 format file HOT 31
- fps code is not working HOT 2
- float data1[img_h*img_w*3] compile error HOT 1
- Inference with 1280 images HOT 4
- fps im getting is varing too much
- Yolov7-seg support HOT 2
- Downloading Yolo7 modex HOT 4
- hardware to run HOT 4
- Process multiple video feeds ansyc HOT 5
- Python Run Issue
- c++ has encountered an error HOT 1
- getting setup HOT 11
- webcam.py HOT 13
- 4 anchor boxes instead of 3 HOT 1
- YOLOv7 with Multiple Object Tracker - SORT Algorithm HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolov7_openvino_cpp-python.