Comments (8)
Hi@anguoyang
As far as I am concerned, the performance issue is heavily depend on the hardware. For this project, I tested on GTX 970, the work that yolo.cfg to detect dog.jpg spend about 0.18 seconds. About double time cost compared with the origin cuda project on the same hardware(GTX 970).
Thank you for giving me this ticket.
from darknet-on-opencl.
Hi@anguoyang
What mode did you use ? Debug or Release ?
Check Release mode in Visual Studio.
from darknet-on-opencl.
hi@AndrewSivrit, I used Release mode in vs, thank you
from darknet-on-opencl.
hi@ganyc717 , yes maybe, but I want to use intel GPU instead of nvidia, which is cos-efficient for production. thank you for your quick reply.
from darknet-on-opencl.
double time cost compared to cuda is acceptable and reasonable, however, my test result is...really slow, almost hundreds over cuda(similar hardware), so I suppose there maybe something wrong with my program?
from darknet-on-opencl.
D:\Darknet-On-OpenCL\x64\Release>darknet_cl detect cfg/yolo.cfg yolo.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32
1 blas_kernels_1.cl build log:
1:82:37: warning: double precision constant requires cl_khr_fp64, casting to single precision
1:82:58: warning: double precision constant requires cl_khr_fp64, casting to single precision
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
25 route 16
26 conv 64 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 64
27 reorg / 2 38 x 38 x 64 -> 19 x 19 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 19 x 19 x1280 -> 19 x 19 x1024
30 conv 425 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 425
31 detection
mask_scale: Using default '1.000000'
Loading weights from yolo.weights...Done!
im2col_kernels.cl build log:
2:36:18: warning: '/*' within block comment
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
activation_kernels.cl build log:
4:21:12: warning: double precision constant requires cl_khr_fp64, casting to single precision
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
maxpool_layer_kernels.cl build log:
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
blas_kernels_2.cl build log:
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.
data/dog.jpg: Predicted in 7.806060 seconds.
dog: 82%
car: 28%
truck: 64%
bicycle: 85%
from darknet-on-opencl.
Hi @anguoyang
I have tested on my laptop with intel HD 4600, seems the majority of kernel time spend on sgemm function, this is BLAS function, and I suggest not modify this. But I noticed that clBLAS have special optimization with AMD GPU, and didn't include it in this repo, you may change another GPU and tried again. Or just choose a smaller scale of network like tiny-yolo.
Best Regards!
from darknet-on-opencl.
OpenCL performance is not platform independent so you would need to tune any CL code to the target platform to avoid register spilling, local memory overflow, etc..
from darknet-on-opencl.
Related Issues (20)
- Compilation errors HOT 1
- cmake found wrong opencl path HOT 2
- Linking CXX executable darknet issue HOT 1
- opencl execution error, code -11 -11 HOT 5
- Running this on FPGA HOT 6
- GPU to CPU copy speed bottleneck?
- OpenCL execution error, code -50
- compile error: data.cpp
- macOS/Windows support - CMakeLists.txt
- Support for simrdwn project
- how to use it in Android ?
- opencl execution error, code -6 -6 HOT 1
- OpenCL execution Error -13 (CL_MISALIGNED_SUB_BUFFER_OFFSET) on AMD GPU HOT 1
- clEnqueueNDRangeKernel NULL event_wait_list HOT 1
- couldn't open file: cfg/coco.data HOT 1
- opencl execution error, code -54 -54
- macOS on M1: running make command generates 46 warnings and 9 errors HOT 7
- ld: symbol(s) not found for architecture x86_64
- requires CUDA header files installation for AMD GPUs?
- Compile error in 12norm_layer.cpp HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from darknet-on-opencl.