Git Product home page Git Product logo

Comments (24)

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Hi @nevoGoldamn Could you give some hints on which part of the code you think should be updated ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

Hi @OpenVINO-dev-contest, I know the tensor shape is what it is. but is it logical to make inference on yolov7-tiny (train on 640x640) at 400ms ? because when I run detect.py (from yolov7 repository) it makes inference at 8ms.

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Did you run detect.py with dGPU device ? this repository only works for Intel's CPU and iGPU

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

both on iGPU

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

I don't see detect.py can support iGPU:
https://github.com/WongKinYiu/yolov7/blob/main/detect.py#L173

May I know the full command you used to run detect.py ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

O.k. retested it on CPU

Capture

this is output after detect.py

Capture

and this is the output after the cpp code (in seconds).

same model, same CPU....

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

May I know the full command you used to run detect.py ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

python detect.py --weights best_costum.pt --conf 0.2 --img-size 640 --source test7.avi --no-trace

made the change in the code to --> device = select_device('cpu') #opt.device

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

How do you measure the time cost in cpp example ? Does it include the total process of detection or just inference time ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024
double start, end, res;
start = cv::getTickCount();

//cv::cvtColor(boxed, img, cv::COLOR_BGR2RGB);

// -------- Step 5. Prepare input --------
//ov::Tensor input_tensor1 = infer_request.get_input_tensor(0);
// NHWC => NCHW

boxed.convertTo(boxed, CV_32FC3);
//ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), data1);
ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), (float*)boxed.data);
infer_request.set_input_tensor(input_tensor);
// -------- Step 6. Start inference --------
infer_request.infer();

// -------- Step 7. Process output --------
//auto output_tensor_p8 = infer_request.get_output_tensor(0);
//const float* result_p8 = output_tensor_p8.data<const float>();
//auto output_tensor_p16 = infer_request.get_output_tensor(1);
//const float* result_p16 = output_tensor_p16.data<const float>();
auto output_tensor_p32 = infer_request.get_output_tensor(2);
const float* result_p32 = output_tensor_p32.data<const float>();

std::vector<Object> proposals;
std::vector<Object> objects8;
std::vector<Object> objects16;
std::vector<Object> objects32;
std::vector<Object> objects;

//generate_proposals(8, result_p8, prob_threshold, objects8);
//proposals.insert(proposals.end(), objects8.begin(), objects8.end());
//generate_proposals(16, result_p16, prob_threshold, objects16);
//proposals.insert(proposals.end(), objects16.begin(), objects16.end());
generate_proposals(32, result_p32, prob_threshold, objects32);
proposals.insert(proposals.end(), objects32.begin(), objects32.end());

std::vector<int> classIds;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;

for (size_t i = 0; i < proposals.size(); i++)
{
    classIds.push_back(proposals[i].label);
    confidences.push_back(proposals[i].prob);
    boxes.push_back(proposals[i].rect);
}

std::vector<int> picked;

// do non maximum suppression for each bounding boxx
cv::dnn::NMSBoxes(boxes, confidences, prob_threshold, nms_threshold, picked);

float raw_h = src_img.rows;
float raw_w = src_img.cols;
float ratio_x = (float)raw_w / img_w;
float ratio_y = (float)raw_h / img_h;
end = cv::getTickCount();

for (size_t i = 0; i < picked.size(); i++)
{
    int idx = picked[i];
    cv::Rect box = boxes[idx];
    cv::Rect scaled_box = scale_box(box, padd);
    drawPred(classIds[idx], confidences[idx], scaled_box, padd[2], raw_h, raw_w, src_img, class_names);
}

res = (end - start) / cv::getTickFrequency();
cout << "time of output --> " << res;

`
all the preprocessing is no included in the time calculations.

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Thanks, I see.
Did you measure the time cost in detect.py for same part of code ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

It seems that in detect.py they were doing the same as I did in the cpp code... don't get why there's this huge time difference..

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

hi I implemented your method to measure the time cost:
time of output --> 0.0252228

Can you also try command benchmark_app -m ./yolov7-tiny.onnx to test the model's pure performance ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

on what CPU ?
"Can you also try command benchmark_app -m ./yolov7-tiny.onnx to test the model's pure performance ?"
where to implement the command ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

can you show how did you get the result "time of output --> 0.0252228" ?

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Im with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz.

double start, end, res;
start = cv::getTickCount();

//cv::cvtColor(boxed, img, cv::COLOR_BGR2RGB);

// -------- Step 5. Prepare input --------
//ov::Tensor input_tensor1 = infer_request.get_input_tensor(0);
// NHWC => NCHW

boxed.convertTo(boxed, CV_32FC3);
//ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), data1);
ov::Tensor input_tensor(input_port.get_element_type(), input_port.get_shape(), (float*)boxed.data);
infer_request.set_input_tensor(input_tensor);
// -------- Step 6. Start inference --------
infer_request.infer();

// -------- Step 7. Process output --------
//auto output_tensor_p8 = infer_request.get_output_tensor(0);
//const float* result_p8 = output_tensor_p8.data<const float>();
//auto output_tensor_p16 = infer_request.get_output_tensor(1);
//const float* result_p16 = output_tensor_p16.data<const float>();
auto output_tensor_p32 = infer_request.get_output_tensor(2);
const float* result_p32 = output_tensor_p32.data<const float>();

std::vector<Object> proposals;
std::vector<Object> objects8;
std::vector<Object> objects16;
std::vector<Object> objects32;
std::vector<Object> objects;

//generate_proposals(8, result_p8, prob_threshold, objects8);
//proposals.insert(proposals.end(), objects8.begin(), objects8.end());
//generate_proposals(16, result_p16, prob_threshold, objects16);
//proposals.insert(proposals.end(), objects16.begin(), objects16.end());
generate_proposals(32, result_p32, prob_threshold, objects32);
proposals.insert(proposals.end(), objects32.begin(), objects32.end());

std::vector<int> classIds;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;

for (size_t i = 0; i < proposals.size(); i++)
{
    classIds.push_back(proposals[i].label);
    confidences.push_back(proposals[i].prob);
    boxes.push_back(proposals[i].rect);
}

std::vector<int> picked;

// do non maximum suppression for each bounding boxx
cv::dnn::NMSBoxes(boxes, confidences, prob_threshold, nms_threshold, picked);

float raw_h = src_img.rows;
float raw_w = src_img.cols;
float ratio_x = (float)raw_w / img_w;
float ratio_y = (float)raw_h / img_h;
end = cv::getTickCount();

for (size_t i = 0; i < picked.size(); i++)
{
    int idx = picked[i];
    cv::Rect box = boxes[idx];
    cv::Rect scaled_box = scale_box(box, padd);
    drawPred(classIds[idx], confidences[idx], scaled_box, padd[2], raw_h, raw_w, src_img, class_names);
}

res = (end - start) / cv::getTickFrequency();
cout << "time of output --> " << res;

` all the preprocessing is no included in the time calculations.

Im with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz.
and use the same method as you did to measure the time.

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

on what CPU ? "Can you also try command benchmark_app -m ./yolov7-tiny.onnx to test the model's pure performance ?" where to implement the command ?

you can try install this tool by pip install openvino-dev, and run it in a terminal.

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

Microsoft Windows [Version 10.0.19044.2486]
(c) Microsoft Corporation. All rights reserved.

C:\Users\Nevo>cd Desktop

C:\Users\Nevo\Desktop>benchmark_app -m best_costum.onnx
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 968.16 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ] images (node: images) : f32 / [...] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,3,80,80,11]
[ INFO ] 1387 (node: 1387) : f32 / [...] / [1,3,40,40,11]
[ INFO ] 1401 (node: 1401) : f32 / [...] / [1,3,20,20,11]
[ INFO ] 1415 (node: 1415) : f32 / [...] / [1,3,10,10,11]
[ INFO ] 1350 (node: 1350) : f32 / [...] / [1,320,80,80]
[ INFO ] 1353 (node: 1353) : f32 / [...] / [1,640,40,40]
[ INFO ] 1356 (node: 1356) : f32 / [...] / [1,960,20,20]
[ INFO ] 1359 (node: 1359) : f32 / [...] / [1,1280,10,10]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ] images (node: images) : u8 / [N,C,H,W] / [1,3,640,640]
[ INFO ] Model outputs:
[ INFO ] output (node: output) : f32 / [...] / [1,3,80,80,11]
[ INFO ] 1387 (node: 1387) : f32 / [...] / [1,3,40,40,11]
[ INFO ] 1401 (node: 1401) : f32 / [...] / [1,3,20,20,11]
[ INFO ] 1415 (node: 1415) : f32 / [...] / [1,3,10,10,11]
[ INFO ] 1350 (node: 1350) : f32 / [...] / [1,320,80,80]
[ INFO ] 1353 (node: 1353) : f32 / [...] / [1,640,40,40]
[ INFO ] 1356 (node: 1356) : f32 / [...] / [1,960,20,20]
[ INFO ] 1359 (node: 1359) : f32 / [...] / [1,1280,10,10]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 1407.88 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: torch-jit-export
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] NUM_STREAMS: 4
[ INFO ] AFFINITY: Affinity.NONE
[ INFO ] INFERENCE_NUM_THREADS: 8
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'images'!. This input will be filled with random values!
[ INFO ] Fill input 'images' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 1067.40 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 96 iterations
[ INFO ] Duration: 63219.74 ms
[ INFO ] Latency:
[ INFO ] Median: 2594.64 ms
[ INFO ] Average: 2625.73 ms
[ INFO ] Min: 2238.78 ms
[ INFO ] Max: 3206.64 ms
[ INFO ] Throughput: 1.52 FPS

what can you infer from that ?
p.s. thank you for the help.

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

maybe something with the conversion to onnx ?

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

You can get a baseline performance of your model from benchmark_app on you device , which is about 1s for each inference with OpenVINO. Is your model yolov7 or yolov7-tiny in this case ?

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

I think it's yolov7-tiny but I'll re-check that, moreover I have the same CPU as yours, what can be reason of this gap in inference time ?

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Maybe your CPU is busy with other applications ? Seems the model you feed-in to benchmark_app is YOLOv7, 1s for Tiny is too slow.

from yolov7_openvino_cpp-python.

nevoGoldamn avatar nevoGoldamn commented on July 21, 2024

I run an ssd model at 40ms can explain ? same CPU situation...

from yolov7_openvino_cpp-python.

OpenVINO-dev-contest avatar OpenVINO-dev-contest commented on July 21, 2024

Sorry i have no idea on it. maybe you could test the SDD model on benchmark_app as well, and see whether the performance gap make sense for you.

from yolov7_openvino_cpp-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.