Git Product home page Git Product logo

Comments (3)

rarzumanyan avatar rarzumanyan commented on August 18, 2024

Hi @mchl0203

VPF supports on-GPU video frames export to PyTorch tensors:

    nvDec = nvc.PyNvDecoder(encFile, gpuID)
    to_rgb = nvc.PySurfaceConverter(nvDec.Width(), nvDec.Height(), nvc.PixelFormat.NV12, nvc.PixelFormat.RGB, gpuID)
    to_planar = nvc.PySurfaceConverter(nvDec.Width(), nvDec.Height(), nvc.PixelFormat.RGB, nvc.PixelFormat.RGB_PLANAR, gpuID)

    while True:
        # Obtain NV12 decoded surface from decoder;
        rawSurface = nvDec.DecodeSingleSurface()
        if (rawSurface.Empty()):
            break

        # Convert to RGB interleaved;
        rgb_byte = to_rgb.Execute(nv12_smaller)

        # Convert to RGB planar because that's what to_tensor + normalize are doing;
        rgb_planar = to_planar.Execute(rgb_byte)

        # Create torch tensor from it and reshape because
        # pnvc.makefromDevicePtrUint8 creates just a chunk of CUDA memory
        # and then copies data from plane pointer to allocated chunk;
        surfPlane = rgb_planar.PlanePtr()
        surface_tensor = pnvc.makefromDevicePtrUint8(surfPlane.GpuMem(), surfPlane.Width(), surfPlane.Height(), surfPlane.Pitch(), surfPlane.ElemSize())
        surface_tensor.resize_(3, target_h, target_w)

        # This is optional and probably you don’t need to do this. I did it because that’s what my NN expects as input 
        # Normalize to range desired by NN. Originally it's 
        # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        # But we scale that to [0;255] input;
        surface_tensor = surface_tensor.type(dtype=torch.cuda.FloatTensor)
        mean = torch.tensor([123.675, 116.28, 103.53], dtype=torch.float32, device='cuda')
        std = torch.tensor([58.395, 57.12, 65.025], dtype=torch.float32, device='cuda')

from videoprocessingframework.

xinyi61 avatar xinyi61 commented on August 18, 2024

I have meet the same problems when I use branch cpp_samples.decode mult-thread rtsp stream.

OS: ubuntu 18.04
Nvidia driver version: 440.33.01
CUDA Version: 10.2
Video Codec SDK Version:Video_Codec_SDK_10.0.26
C++
CPU(s): 40

when run 64 thread decoding, the cpu use is:
1 [||||||||||||||||||||||||||||||98.2%] 11 [||||||||||||||||||||||||||||||92.5%] 21 [||||||||||||||||||||||||||||||96.9%] 31 [||||||||||||||||||||||||||||||96.3%]
2 [||||||||||||||||||||||||||||||97.5%] 12 [||||||||||||||||||||||||||||||91.4%] 22 [||||||||||||||||||||||||||||||99.4%] 32 [||||||||||||||||||||||||||||||95.0%]
3 [||||||||||||||||||||||||||||||98.2%] 13 [||||||||||||||||||||||||||||||94.4%] 23 [||||||||||||||||||||||||||||||98.8%] 33 [||||||||||||||||||||||||||||||97.5%]
4 [||||||||||||||||||||||||||||||98.2%] 14 [||||||||||||||||||||||||||||||96.3%] 24 [||||||||||||||||||||||||||||||99.4%] 34 [||||||||||||||||||||||||||||||95.1%]
5 [||||||||||||||||||||||||||||||98.8%] 15 [||||||||||||||||||||||||||||||93.2%] 25 [||||||||||||||||||||||||||||||98.8%] 35 [||||||||||||||||||||||||||||||95.8%]
6 [||||||||||||||||||||||||||||||98.8%] 16 [||||||||||||||||||||||||||||||97.0%] 26 [||||||||||||||||||||||||||||||98.1%] 36 [||||||||||||||||||||||||||||||93.1%]
7 [||||||||||||||||||||||||||||||98.2%] 17 [||||||||||||||||||||||||||||||96.3%] 27 [|||||||||||||||||||||||||||||100.0%] 37 [||||||||||||||||||||||||||||||92.0%]
8 [||||||||||||||||||||||||||||||98.8%] 18 [||||||||||||||||||||||||||||||91.9%] 28 [||||||||||||||||||||||||||||||98.2%] 38 [||||||||||||||||||||||||||||||92.7%]
9 [||||||||||||||||||||||||||||||98.2%] 19 [||||||||||||||||||||||||||||||95.1%] 29 [|||||||||||||||||||||||||||||100.0%] 39 [||||||||||||||||||||||||||||||95.7%]
10 [||||||||||||||||||||||||||||||97.5%] 20 [||||||||||||||||||||||||||||||95.7%] 30 [||||||||||||||||||||||||||||||99.4%] 40 [|||||||||||||||||||||||||||||100.0%]
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||11.4G/62.6G] Tasks: 220, 4149 thr; 40 running
Swp[|||||||| 210M/2.00G] Load average: 67.38 66.58 67.50
Uptime: 14 days, 05:20:00
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
36651 deploy 20 0 51.5G 2754M 1223M S 2197 4.3 2h36:58 ./cpp_demo

from videoprocessingframework.

rarzumanyan avatar rarzumanyan commented on August 18, 2024

@xinyi61

Actual decoding doesn't consume too much of CPU cycles.
Build & launch your application with profiling support (e. g. with gprof support) to see the top CPU-heavy functions.
Usually it's IO operations (disk load/save) and rescale / color conversions done by CPU.

from videoprocessingframework.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.