Git Product home page Git Product logo

Comments (8)

ganyc717 avatar ganyc717 commented on September 14, 2024

Hi@anguoyang
As far as I am concerned, the performance issue is heavily depend on the hardware. For this project, I tested on GTX 970, the work that yolo.cfg to detect dog.jpg spend about 0.18 seconds. About double time cost compared with the origin cuda project on the same hardware(GTX 970).
Thank you for giving me this ticket.

from darknet-on-opencl.

AndrewSivrit avatar AndrewSivrit commented on September 14, 2024

Hi@anguoyang
What mode did you use ? Debug or Release ?
Check Release mode in Visual Studio.

from darknet-on-opencl.

anguoyang avatar anguoyang commented on September 14, 2024

hi@AndrewSivrit, I used Release mode in vs, thank you

from darknet-on-opencl.

anguoyang avatar anguoyang commented on September 14, 2024

hi@ganyc717 , yes maybe, but I want to use intel GPU instead of nvidia, which is cos-efficient for production. thank you for your quick reply.

from darknet-on-opencl.

anguoyang avatar anguoyang commented on September 14, 2024

double time cost compared to cuda is acceptable and reasonable, however, my test result is...really slow, almost hundreds over cuda(similar hardware), so I suppose there maybe something wrong with my program?

from darknet-on-opencl.

anguoyang avatar anguoyang commented on September 14, 2024

D:\Darknet-On-OpenCL\x64\Release>darknet_cl detect cfg/yolo.cfg yolo.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32
1 blas_kernels_1.cl build log:
1:82:37: warning: double precision constant requires cl_khr_fp64, casting to single precision
1:82:58: warning: double precision constant requires cl_khr_fp64, casting to single precision
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
25 route 16
26 conv 64 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 64
27 reorg / 2 38 x 38 x 64 -> 19 x 19 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 19 x 19 x1280 -> 19 x 19 x1024
30 conv 425 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 425
31 detection
mask_scale: Using default '1.000000'
Loading weights from yolo.weights...Done!
im2col_kernels.cl build log:
2:36:18: warning: '/*' within block comment
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

activation_kernels.cl build log:
4:21:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

maxpool_layer_kernels.cl build log:
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

blas_kernels_2.cl build log:
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

data/dog.jpg: Predicted in 7.806060 seconds.
dog: 82%
car: 28%
truck: 64%
bicycle: 85%

from darknet-on-opencl.

ganyc717 avatar ganyc717 commented on September 14, 2024

Hi @anguoyang
I have tested on my laptop with intel HD 4600, seems the majority of kernel time spend on sgemm function, this is BLAS function, and I suggest not modify this. But I noticed that clBLAS have special optimization with AMD GPU, and didn't include it in this repo, you may change another GPU and tried again. Or just choose a smaller scale of network like tiny-yolo.
Best Regards!

from darknet-on-opencl.

victorv avatar victorv commented on September 14, 2024

OpenCL performance is not platform independent so you would need to tune any CL code to the target platform to avoid register spilling, local memory overflow, etc..

from darknet-on-opencl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.