Git Product home page Git Product logo

Comments (10)

doonny avatar doonny commented on August 25, 2024

Could you run the program using the provided "image.dat" file instead of using the "cat.jpg" file ?
That will give more detailed information, since the golden reference output is provided.

from pipecnn.

laski007 avatar laski007 commented on August 25, 2024

OK, I closed OpenCV. The result is shown below, actually everytime when I run the program I can get a different result.

[root@dhcp70 project]# ./run.exe conv.aocx


PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs


61063552 total weights read
154587 bytes image read
1024 total output reference read

Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
Device 0: de5a_net_e1 : Arria 10 Reference Platform (aclde5a_net_e10)
Device OpenCL Version: OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 16.1
Device Max Compute Units: 1
Device Max WorkGroup Size: 2147483647
Device Max WorkItem Size: 2147483647
Device Global Memory Size: 8192 MBytes
Device Local Memory Size: 16 KBytes
Device Max Clock Freq: 1000 Mhz

Loading kernel/binary from file conv.aocx
Reprogramming device [0] with handle 1

Executing Layer 1:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching single work-item kernel Pooling

Launching kernel MemWr with local size: 1, 1, 16 (global size: 27, 27, 96)

Launching kernel lrn with local size: 1, 1, 24 (global size: 27, 27, 24)

Executing Layer 2:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching single work-item kernel Pooling

Launching kernel MemWr with local size: 1, 1, 16 (global size: 13, 13, 256)

Launching kernel lrn with local size: 1, 1, 64 (global size: 13, 13, 64)

Executing Layer 3:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching kernel MemWr with local size: 1, 1, 16 (global size: 13, 13, 384)

Executing Layer 4:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching kernel MemWr with local size: 1, 1, 16 (global size: 13, 13, 384)

Executing Layer 5:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching single work-item kernel Pooling

Launching kernel MemWr with local size: 1, 1, 16 (global size: 6, 6, 256)

Executing Layer 6:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching kernel MemWr with local size: 1, 1, 16 (global size: 1, 1, 4096)

Executing Layer 7:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching kernel MemWr with local size: 1, 1, 16 (global size: 1, 1, 4096)

Executing Layer 8:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching kernel MemWr with local size: 1, 1, 16 (global size: 1, 1, 1024)

Copyed all batched results from fc_2 buffers.

Done !!!


Performance Summary

Total runtime: 0.058542s

Kernel runtime summary:
Layer-1:
MemRd: 8.899 ms
Conv : 8.858 ms
Pool : 8.844 ms
MemWr: 8.759 ms
Lrn : 0.645 ms
Layer-2:
MemRd: 14.093 ms
Conv : 14.060 ms
Pool : 14.052 ms
MemWr: 14.029 ms
Lrn : 0.273 ms
Layer-3:
MemRd: 9.393 ms
Conv : 9.364 ms
Pool : 0.000 ms
MemWr: 9.348 ms
Lrn : 0.000 ms
Layer-4:
MemRd: 7.053 ms
Conv : 7.025 ms
Pool : 0.000 ms
MemWr: 7.008 ms
Lrn : 0.000 ms
Layer-5:
MemRd: 4.749 ms
Conv : 4.713 ms
Pool : 4.702 ms
MemWr: 4.671 ms
Lrn : 0.000 ms
Layer-6:
MemRd: 2.592 ms
Conv : 2.561 ms
Pool : 0.000 ms
MemWr: 2.529 ms
Lrn : 0.000 ms
Layer-7:
MemRd: 1.183 ms
Conv : 1.154 ms
Pool : 0.000 ms
MemWr: 1.134 ms
Lrn : 0.000 ms
Layer-8:
MemRd: 0.327 ms
Conv : 0.298 ms
Pool : 0.000 ms
MemWr: 0.277 ms
Lrn : 0.000 ms

Total kernel runtime 48.033 ms
Batch size = 1, average process time per batch: 48.033 ms

Start verifying results ...
Selected item = 0 from the combined batch results in fc buffers
Item=0 is wrong (result=-1.000000, golden_ref=-14.000000)
Item=1 is wrong (result=2.000000, golden_ref=13.000000)
Item=2 is wrong (result=-22.000000, golden_ref=-5.000000)
Item=3 is wrong (result=-9.000000, golden_ref=-7.000000)
Item=4 is wrong (result=-17.000000, golden_ref=-9.000000)
Item=6 is wrong (result=-21.000000, golden_ref=-6.000000)
Item=7 is wrong (result=-18.000000, golden_ref=0.000000)
Item=8 is wrong (result=-2.000000, golden_ref=11.000000)
Item=9 is wrong (result=2.000000, golden_ref=-3.000000)
Totally 974 Wrong Results

When I run the program again, the new result is:
Total kernel runtime 48.005 ms
Batch size = 1, average process time per batch: 48.005 ms

Start verifying results ...
Selected item = 0 from the combined batch results in fc buffers
Item=0 is wrong (result=-8.000000, golden_ref=-14.000000)
Item=1 is wrong (result=22.000000, golden_ref=13.000000)
Item=2 is wrong (result=7.000000, golden_ref=-5.000000)
Item=3 is wrong (result=-14.000000, golden_ref=-7.000000)
Item=4 is wrong (result=-10.000000, golden_ref=-9.000000)
Item=5 is wrong (result=-4.000000, golden_ref=-3.000000)
Item=6 is wrong (result=3.000000, golden_ref=-6.000000)
Item=7 is wrong (result=13.000000, golden_ref=0.000000)
Item=8 is wrong (result=29.000000, golden_ref=11.000000)
Totally 949 Wrong Results

from pipecnn.

doonny avatar doonny commented on August 25, 2024

Can you run the program in software emulation mode again to check you are using the correct test files.
If the software emulation mode is OK, please run "aocl diagnose" to check the temperature of your board.
We once encountered the same situation when our board is working under the temp. above 80C.

from pipecnn.

laski007 avatar laski007 commented on August 25, 2024

Dear Prof. Wang,

Thank you so much for your help. Under the emulation, we can get the correct result as shown below:
..........
Layer-8:
MemRd: 79.789 ms
Conv : 88.976 ms
Pool : 0.000 ms
MemWr: 88.985 ms
Lrn : 0.000 ms

Total kernel runtime 16805.615 ms
Batch size = 1, average process time per batch: 16805.615 ms

Start verifying results ...
Selected item = 0 from the combined batch results in fc buffers

Check Pass !!!

The inference result is n02123045 tabby, tabby ca (the prob is 56.00)

But under the hardware, we still meet such weird problem. I check "aocl diagnose", everything seems OK.

[root@dhcp70 project]# aocl diagnose
aocl diagnose: Running diagnose from /home/dalab/intelFPGA_pro/16.1/hld/board/de5a_net_e1/linux64/libexec

------------------------- acl0 -------------------------
Vendor: Terasic

Phys Dev Name Status Information

aclde5a_net_e10Passed Arria 10 Reference Platform (aclde5a_net_e10)
PCIe dev_id = 2494, bus:slot.func = 02:00.00, Gen3 x8
FPGA temperature = 45.7383 degrees C.

DIAGNOSTIC_PASSED

from pipecnn.

doonny avatar doonny commented on August 25, 2024

Well, it is strange. Normally, when you pass software simulation, same should happen for hw implementation.
Please let me know your hw configurations. I will run the config in our own platform to verify it again.

from pipecnn.

laski007 avatar laski007 commented on August 25, 2024

Dear Prof. Wang,

I'm not quite sure what hardware configurations you need. I only modified the Makefile and add the board information, and did not change any OpenCL files:

ifeq ($(FLOW),sw_emu)
OCCFLAGS = -v --report -march=emulator --board de5a_net_e1 -I device/RTL -L device/RTL -l rtl_lib.aoclib
else ifeq ($(FLOW),hw)
OCCFLAGS = -v --report --profile --board de5a_net_e1 -I device/RTL -L device/RTL -l rtl_lib.aoclib
endif

Please check the attachment, if you need more information, please let me know. Thanks.

conv_pipe.cl.zip

from pipecnn.

doonny avatar doonny commented on August 25, 2024

Well, I have not encountered any problem in my de5a board.
Normally, if you always get the same wrong results, that's the program's fault. But when it is random, more likely, the timing of the FPGA part is not met.

Last try, Please comment the two pragmas that starts with #pragma ivdep ...., and then recompile the kernel.
If that does not change anything, maybe you should try on other boards.

from pipecnn.

laski007 avatar laski007 commented on August 25, 2024

Dear Prof. Wang,

I have tried to comment out that two sentences, but it still same results. As you mentioned, do you know how to set the timing of FPGA?
Another question is about the RTL folder which includes mult_add_fix8x4, c_model.cl, rtl_lib.h and rtl_lib.xml, I don't know how do you generated those files? Are these files same even if on different FPGA boards? Should we change them according to different platform?

from pipecnn.

doonny avatar doonny commented on August 25, 2024

The rtl are general IPs that works for all platforms. You do not need to change them.
I suggest you try the program on another board, or contact your board vendor.

from pipecnn.

thinkoco avatar thinkoco commented on August 25, 2024

@laski007 for RTL inline opencl kernel,this maybe help for you. aocl_programming_guide

from pipecnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.