First of all, thank you so much for open-sourcing this repository. I've been looking f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Postprocess takes up too much time about yolov8 HOT 10 CLOSED

dme-compunet commented on May 23, 2024

Postprocess takes up too much time

from yolov8.

Comments (10)

dme-compunet commented on May 23, 2024

ML.NET is a machine learning framework, this means that it is used for training tasks, however this repo is about YOLOv8 which is trained in PyTorch environment and after exporting to onnx format, it can be used with ONNX Runtime. Regarding the speed of postprocess for detect task, I currently do not have a way to make it faster, regarding its preprocess for segment task, it requires a library for image processing for interpolation task work to change the size of the masks, this is done with ImageSharp, which slows down the performance a bit. Good Day!

from yolov8.

FunJoo commented on May 23, 2024

I'm planning to try using SkiaSharp (which is said to have better image processing performance) or a dll exported from C++/Rust for image processing in the future. If there's a significant performance improvement, can I submit a pull request?

from yolov8.

dme-compunet commented on May 23, 2024

The reason I didn't use SkiaSharp is because it is a large graphics library, I was looking for something more compact just for image processing, I could have chosen Magick.NET which has better performance, the reason I didn't choose it is because of the size, Magick.NET weighs 23 MB compared to ImageSharp which weighs 3 MB. But Skia could be a good idea if it improves the performance significantly, check the performance and share your conclusions with me.

from yolov8.

FunJoo commented on May 23, 2024

I tried SkiaSharp, but there was no significant improvement in image processing performance. So, based on my previous experience with System.Drawing.Bitmap, I attempted to change the method ImageSharp uses to traverse images.
The current optimizations include:

Using a single layer of Parallel.For .
In DetectionOutputParser.Parse() from line 34 to line 64:

        Parallel.For(0, output.Dimensions[2], i =>
        {
            for (int j = 0; j < metadata.Classes.Count; j++)
            {
                var confidence = output[0, j + 4, i];

                if (confidence <= parameters.Confidence)
                    continue;

               // same as the original code ...
            }
        });

Using image.DangerousTryGetSinglePixelMemory(out Memory<TPixel> memory) instead of image[x,y].
In ImageSharpExtensions.ForEachPixel() :

        var width = image.Width;
        var height = image.Height;
        var totalPixels = width * height;

        var flag = image.DangerousTryGetSinglePixelMemory(out Memory<TPixel> memory);

        Parallel.For(0, totalPixels, index =>
        {
            int x = index % width;
            int y = index / width;

            var point = new Point(x, y);
            var pixel = memory.Span[index];  // This line of code triggers an OutOfIndex error when processing certain images. It seems to be related to image size conversion, but it works normally for most images.

            action(point, pixel);
        });

- Detect - GPU
Image origin size: 2448x2048
Image count: 19
imgsz: 640x640

Preprocess: 0.0879 s ===> 0.03151 s
Inference: 0.06722 s ===> 0.05268 s // I don't know why it's getting faster
Postprocess: 0.34554 s ===> 0.00428 s
The processing time is taken as the average.

These optimizations perform well in the detect task, but I'm still not satisfied with the processing speed in segment. Primarily the SegmentationOutputParser.ProcessMask is still too slow. But Preprocess time and Postprocess time have indeed become faster (similar changes have already been made to DetectionOutputParser.Parse()).
Therefore, I'm planning to move all the image processing code to C++ in my project.

from yolov8.

dme-compunet commented on May 23, 2024

The documentation for the DangerousTryGetSinglePixelMemory function says that there can be a memory corruption while accessing the Span, this may be related to the error you are getting, maybe ProcessPixelRows will give a better result.

The bottleneck in deciphering the segmentation results is increasing the masks from 120x120 to the original size, to do that you need an interpolation algorithm to increase the pixels, I'm currently doing it using ImageSharp and that's what slows down the process, I can give up this increase but then it creates pixelated masks, so I chose to get good masks in exchange for a little longer time in postprocess.

Would you be willing to submit a PR with the fixes you've made so far?

Thanks for your work!

from yolov8.

FunJoo commented on May 23, 2024

I've submitted the PR.
In Image.ForEachPixel(), I finally used the following approach to avoid exceptions. It works for most images. If the flag is false, the mask will not be drawn on the result image.

        var flag = image.DangerousTryGetSinglePixelMemory(out Memory<TPixel> memory);

        if (flag)
        {
            Parallel.For(0, totalPixels, index =>
            {
                int x = index % width;
                int y = index / width;

                var point = new Point(x, y);
                var pixel = memory.Span[index];

                action(point, pixel);
            });
        }

I'm still looking for the reason why flag is set to false.

from yolov8.

dme-compunet commented on May 23, 2024

@FunJoo Your PR is merged, can you confirm that it works without errors?

from yolov8.

FunJoo commented on May 23, 2024

I've verified it. Both the demo weights and dataset tests, as well as my custom weights and dataset tests, can pass successfully.

from yolov8.

dme-compunet commented on May 23, 2024

@FunJoo I fixed some things in segmentation postprocess, can you check the performance of it now?

from yolov8.

FunJoo commented on May 23, 2024

Amazing! The postprocess is much faster now than before. Thank you, I think it's ready to be implemented in my project.

- Detect
Image origin size: 2448x2048
imgsz: 640x640
Average Postprocess Time: 0.00428 s ==> 0.00102 s

- Segment
Image origin size: 3840x2748
imgsz: 800x800
Average Postprocess Time: 1.18653 s ==> 0.25893 s

from yolov8.

Postprocess takes up too much time about yolov8 HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent