My code is as follows: <div class="snippet-clipboard-content notranslate position-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

DecodeSingleFrame operation make high CPU usage ? about videoprocessingframework HOT 3 CLOSED

nvidia commented on August 18, 2024

DecodeSingleFrame operation make high CPU usage ?

from videoprocessingframework.

Comments (3)

rarzumanyan commented on August 18, 2024

Hi @mchl0203

VPF supports on-GPU video frames export to PyTorch tensors:

    nvDec = nvc.PyNvDecoder(encFile, gpuID)
    to_rgb = nvc.PySurfaceConverter(nvDec.Width(), nvDec.Height(), nvc.PixelFormat.NV12, nvc.PixelFormat.RGB, gpuID)
    to_planar = nvc.PySurfaceConverter(nvDec.Width(), nvDec.Height(), nvc.PixelFormat.RGB, nvc.PixelFormat.RGB_PLANAR, gpuID)

    while True:
        # Obtain NV12 decoded surface from decoder;
        rawSurface = nvDec.DecodeSingleSurface()
        if (rawSurface.Empty()):
            break

        # Convert to RGB interleaved;
        rgb_byte = to_rgb.Execute(nv12_smaller)

        # Convert to RGB planar because that's what to_tensor + normalize are doing;
        rgb_planar = to_planar.Execute(rgb_byte)

        # Create torch tensor from it and reshape because
        # pnvc.makefromDevicePtrUint8 creates just a chunk of CUDA memory
        # and then copies data from plane pointer to allocated chunk;
        surfPlane = rgb_planar.PlanePtr()
        surface_tensor = pnvc.makefromDevicePtrUint8(surfPlane.GpuMem(), surfPlane.Width(), surfPlane.Height(), surfPlane.Pitch(), surfPlane.ElemSize())
        surface_tensor.resize_(3, target_h, target_w)

        # This is optional and probably you don’t need to do this. I did it because that’s what my NN expects as input 
        # Normalize to range desired by NN. Originally it's 
        # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        # But we scale that to [0;255] input;
        surface_tensor = surface_tensor.type(dtype=torch.cuda.FloatTensor)
        mean = torch.tensor([123.675, 116.28, 103.53], dtype=torch.float32, device='cuda')
        std = torch.tensor([58.395, 57.12, 65.025], dtype=torch.float32, device='cuda')

from videoprocessingframework.

xinyi61 commented on August 18, 2024

I have meet the same problems when I use branch cpp_samples.decode mult-thread rtsp stream.

OS: ubuntu 18.04
Nvidia driver version: 440.33.01
CUDA Version: 10.2
Video Codec SDK Version:Video_Codec_SDK_10.0.26
C++
CPU(s): 40

when run 64 thread decoding, the cpu use is:
1 [||||||||||||||||||||||||||||||98.2%] 11 [||||||||||||||||||||||||||||||92.5%] 21 [||||||||||||||||||||||||||||||96.9%] 31 [||||||||||||||||||||||||||||||96.3%]
2 [||||||||||||||||||||||||||||||97.5%] 12 [||||||||||||||||||||||||||||||91.4%] 22 [||||||||||||||||||||||||||||||99.4%] 32 [||||||||||||||||||||||||||||||95.0%]
3 [||||||||||||||||||||||||||||||98.2%] 13 [||||||||||||||||||||||||||||||94.4%] 23 [||||||||||||||||||||||||||||||98.8%] 33 [||||||||||||||||||||||||||||||97.5%]
4 [||||||||||||||||||||||||||||||98.2%] 14 [||||||||||||||||||||||||||||||96.3%] 24 [||||||||||||||||||||||||||||||99.4%] 34 [||||||||||||||||||||||||||||||95.1%]
5 [||||||||||||||||||||||||||||||98.8%] 15 [||||||||||||||||||||||||||||||93.2%] 25 [||||||||||||||||||||||||||||||98.8%] 35 [||||||||||||||||||||||||||||||95.8%]
6 [||||||||||||||||||||||||||||||98.8%] 16 [||||||||||||||||||||||||||||||97.0%] 26 [||||||||||||||||||||||||||||||98.1%] 36 [||||||||||||||||||||||||||||||93.1%]
7 [||||||||||||||||||||||||||||||98.2%] 17 [||||||||||||||||||||||||||||||96.3%] 27 [|||||||||||||||||||||||||||||100.0%] 37 [||||||||||||||||||||||||||||||92.0%]
8 [||||||||||||||||||||||||||||||98.8%] 18 [||||||||||||||||||||||||||||||91.9%] 28 [||||||||||||||||||||||||||||||98.2%] 38 [||||||||||||||||||||||||||||||92.7%]
9 [||||||||||||||||||||||||||||||98.2%] 19 [||||||||||||||||||||||||||||||95.1%] 29 [|||||||||||||||||||||||||||||100.0%] 39 [||||||||||||||||||||||||||||||95.7%]
10 [||||||||||||||||||||||||||||||97.5%] 20 [||||||||||||||||||||||||||||||95.7%] 30 [||||||||||||||||||||||||||||||99.4%] 40 [|||||||||||||||||||||||||||||100.0%]
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||11.4G/62.6G] Tasks: 220, 4149 thr; 40 running
Swp[|||||||| 210M/2.00G] Load average: 67.38 66.58 67.50
Uptime: 14 days, 05:20:00
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
36651 deploy 20 0 51.5G 2754M 1223M S 2197 4.3 2h36:58 ./cpp_demo

from videoprocessingframework.

rarzumanyan commented on August 18, 2024

@xinyi61

Actual decoding doesn't consume too much of CPU cycles.
Build & launch your application with profiling support (e. g. with gprof support) to see the top CPU-heavy functions.
Usually it's IO operations (disk load/save) and rescale / color conversions done by CPU.

from videoprocessingframework.

DecodeSingleFrame operation make high CPU usage ? about videoprocessingframework HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent