Git Product home page Git Product logo

sdaccel_examples's Introduction

SDAccel Example Repository

Note: Please do not use this repository for Xilinx Latest Vitis Tool Chain. A entire new Repository is created here: https://github.com/Xilinx/Vitis_Accel_Examples for Xilinx new Vitis Tool chain. Use SDAccel Examples repository for Xilinx previously released product SDAccel or SDx.

Welcome to the SDAccel example repository. This repository contains the latest examples to get you started with application optimization targeting Xilinx PCIe FPGA acceleration boards. All examples are ready to be compiled and executed on SDAccel supported boards and accelerated cloud service partners. The repository is organized as follows:

  1. PREREQUISITE
  2. SUPPORTED PLATFORMS
  3. COMPILATION AND EXECUTION
  4. DIRECTORY STRUCTURE
  5. EXECUTION IN CLOUD ENVIRONMENTS
  6. SUPPORT

1. PREREQUISITE

SDAccel Git Examples Assumes that user is familiar with Basic SDAccel Environment, Setup, Programming and Debugging Flow. If not, it is recommended to cover these topics from SDAccel User Guides:

  • UG1238 - SDAccel Development Environment
  • UG1277 - SDAccel Programming Guide
  • UG1281 - SDAccel Debugging Guide

2. SUPPORTED PLATFORMS

Board Software Version
Xilinx Alveo U200 SDx 2019.1
Xilinx Alveo U250 SDx 2019.1
Xilinx Alveo U280 SDx 2019.1

3. COMPILATION AND EXECUTION

It is primarily recommended to start with Hello World example which makes the new users aware about the basic structure of an SDAccel based Application.

Compiling for Application Emulation

As part of the capabilities available to an application developer, SDAccel includes environments to test the correctness of an application at both a software functional level and a hardware emulated level.

These modes, which are named sw_emu and hw_emu, allow the developer to profile and evaluate the performance of a design before compiling for board execution. It is recommended that all applications are executed in at least the sw_emu mode before being compiled and executed on an FPGA board.

    cd <PATH TO SAMPLE APPLICATION>
    make all TARGET=<sw_emu|hw_emu> DEVICE=<FPGA Platform>

where

	sw_emu = software emulation
	hw_emu = hardware emulation

NOTE: The software emulation flow is a functional correctness check only. It does not estimate the performance of the application in hardware.

The hardware emulation flow is a cycle accurate simulation of the hardware generated for the application. As such, it is expected for this simulation to take a long time. It is recommended that for this example the user skips running hardware emulation or modifies the example to work on a reduced data set.

Executing Emulated Application

Recommended Execution Flow for Example Applications in Emulation

The makefile for the application can directly executed the application with the following command:

    cd <PATH TO SAMPLE APPLICATION>
    make check TARGET=<sw_emu|hw_emu> DEVICE=<FPGA Platform>

where

	sw_emu = software emulation
	hw_emu = hardware emulation

If the application has not been previously compiled, the check makefile rule will compile and execute the application in the emulation mode selected by the user.

Compiling for Application Execution in the FPGA Accelerator Card

The command to compile the application for execution on the FPGA acceleration board is

    cd <PATH TO SAMPLE APPLICATION>
    make all DEVICE=<FPGA Platform>

The default target for the makefile is to compile for hardware. Therefore, setting the TARGETS option is not required. NOTE: Compilation for application execution in hardware generates custom logic to implement the functionality of the kernels in an application. It is typical for hardware compile times to range from 30 minutes to a couple of hours.

4. DIRECTORY STRUCTURE

  • GETTING STARTED

Collection of examples geared at teaching the user best practices in how to use different features of SDAccel and start on their own application.

  • ACCELERATION

Collection of examples in processor offloading to FPGA based compute units.

  • VISION

Collection of examples in image and video processing.

  • LIBS

Collection of common libraries used across all examples to assist in the quick development of application host code.

  • UTILITY

Collection of utility functions used as part of the Makefiles in all of the examples. This set includes Makefile rules and scripts to launch SDAccel compiled applications onto boards hosted by Nimbix directly from the developers terminal shell.

5. Execution in Cloud Environments

FPGA acceleration boards have been deployed to the cloud. For information on how to execute the example within a specific cloud, take a look at the following guides.

6. SUPPORT

For questions and to get help on any project in this repository or your own projects, visit the SDAccel Forums.

To execute these example using the SDAccel GUI, follow the setup instructions in SDAccel GUI README

sdaccel_examples's People

Contributors

dutchalthoff avatar ericlxlnx avatar fmartinezv avatar hatchuta-xilinx avatar heeran-xilinx avatar heeranand avatar kaliudayxilinx avatar kamranjk avatar nacl avatar songchuanhua avatar spenser309 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdaccel_examples's Issues

Not able to build rtl examples

SDAccel_Examples/getting_started/rtl_kernel/rtl_streaming_free_running
Example Design giving errors when I tried to build
make all TARGET=hw_emu DEVICE=xilinx_u200_qdma_201910_1 check

logs:
****** Vivado v2019.1 (64-bit)
**** SW Build 2552052 on Fri May 24 14:47:09 MDT 2019
**** IP Build 2548770 on Fri May 24 18:01:18 MDT 2019
** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

ERROR: /home/stbodank/megh/dcp/workspace/xilinx_shell/SDAccel_Examples/getting_started/rtl_kernel/rtl_streaming_free_running/pinfo.json does not exist!
couldn't open "/home/stbodank/megh/dcp/workspace/xilinx_shell/SDAccel_Examples/getting_started/rtl_kernel/rtl_streaming_free_running/pinfo.json": no such file or directory
while executing
"open $pinfo r"
invoked from within
"set fid [open $pinfo r]"
(file "./src/gen_xo.tcl" line 59)
INFO: [Common 17-206] Exiting Vivado at Mon Aug 5 11:13:43 2019...
config.mk:3: recipe for target 'xclbin/myadder1.hw_emu.xilinx_u200_qdma_201910_1.xo' failed
make: *** [xclbin/myadder1.hw_emu.xilinx_u200_qdma_201910_1.xo] Error 1

Suggestion to improve EoU of the current makefiles in order to provide the customers more flexibility and guidences

The goal of the current makefile changes is to improve EoF of the GitHub designs. Using the median_filter example I modified several makefiles to demonstrate these changes.

By modifying the example I was trying to achieve several goals:

  • To limit the amount of changes the current makefile structure

  • The script can be easily adopted for any design we have in GitHub

  • Work Directory:
    Currently the user should run the design from the directory where the design sources are located. For instance vision/median_filter. This is not convenient, because if you wish to remove all generated files and start from scratch, you have a risk to delete the original files. Therefore, now you can now run the design from the work directory, located under vision/median_filter.
    o Note: if you look into the work directory you will see the xclbin directory (I suppose that it is probably used also for testing purposes). If you delete it, the make flow will automatically recreate it from scratch.

  • Using the provided makefile you can:
    o) Only compile
    o) Only run (after the application is compiled)
    o) Both compile and run
    the application.

  • In addition, using the existing makefile variable TARGETS, you can specify if you wish to run the application for
    o) CPU emulation: TARGETS=sw_emu
    o) Hardware Emulation: TARGETS=hw_emu
    o) FPGA Deployment: TARGETS=hw

  • I also added help information about the script:
    Run the following command from the work directory
    make -f ../Makefile info

  • If you only run the makefile for compilation, then the makefile will suggest you how to executed the application.
    Here are some examples:
    o) You compile the design for CPU Emulation:
    make -f ../Makefile TARGETS=sw_emu compile
    The compilation step will create all necessary files to run the sw_emu and provide the following info

==================================================================================================
Tip: to run the application in a 'sw_emu' mode you may use one of the following methods

  1. execute the makefile using 'run' as a target
    make -f ../Makefile run

    Note: if you wish to generate a TRACE view, then in addition, you need set TRACE=yes
    make -f ../Makefile TRACE=yes run

  2. run the following 2 commands
    export XCL_EMULATION_MODE=true
    ./median_X86.exe ../data/<bmp_file> xclbin/krnl_median.sw_emu.xilinx_adm-pcie-ku3_2ddr_3_1.xclbin

    Note: if you wish to generate a TRACE view, then in addition, you need set the following
    variables before running the application
    export SDACCEL_TIMELINE_REPORT=true
    export SDACCEL_DEVICE_PROFILE=true
    ==================================================================================================

Please see the Median_Filter.docx file for more info about the changes.
Median_Filter.docx
NEW_Median_Filter.tar.gz

Cannot upgrade to invalid target ''

when I make my rtl logic, I get the error:
ERROR: [XOCC 19-98] Generation of the IP CORE failed.
Cannot upgrade to invalid target ''
How should I to solve it? Thanks.

Verification of gemm fails

Verification of acceleration/gemm fails on xilinx:xil-accel-rd-ku115:4ddr-xpr:4.0 with SDx 2017.1.

Example execution below:

% LD_LIBRARY_PATH=/opt/Xilinx/SDx/2017.1/runtime/lib/x86_64 ./gemm 4096 4096 64
Creating context...
INFO: Importing xclbin/gemm0.hw.xilinx_xil-accel-rd-ku115_4ddr-xpr.xclbin
INFO: Loaded file
INFO: Created Binary
INFO: Built Program
INFO: input matrix size: M= 4096, N= 4096, K= 64
INFO: hw matrix size: row= 128, col= 64, depth= 64
Creating Buffers...
Copying Buffers to device....
Copying results to host....
INFO: Execution done
ERROR in - 0 - actual=31, expected=0
INFO: kernel time 0.004075 seconds numOfOps -2147483648.000000 Efficiency: -526.929595 GOPs
INFO: Test Failed

Steps to reproduce:

  1. make all TARGETS=hw
  2. .LD_LIBRARY_PATH=/opt/Xilinx/SDx/2017.1/runtime/lib/x86_64 /gemm 4096 4096 54

Installed DSA: xilinx:xil-accel-rd-ku115:4ddr-xpr:4.0

Other kernels execute correctly.

What are the acceptable parameters for rows, cols and depth, and what is the effect of depth? Perhaps put this in the readme?

Communication between SLRs

In this example, if I want to make the two kernels executing on two SLRs communicate with each other, which mechanisms should I use to make it happen?

Thanks.

2017.1.RTE not availble after following the tutorial (Create, configure and test an AWS F1 instance)

In section 4: Running the SDAccel 'Hello World' example on AWS F1, in the part where you need to execute the host application to run on the FPGA:

The link '/opt/Xilinx/SDx/2017.1.rte/setup.sh' is not valid since (2017.1.rte) is not present after following all previous steps with the latest (FPGA Developer AMI) from Amazon.

I tried to use the availble RTE which is (2017.4.rte.dyn) and the result is the app cant find a device and here is what it shows:

[0]user:0x1042:0x7:[???:??:0]
xclProbe found 1 FPGA slots with xocl driver running
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD0
WARNING: AwsXcl isGood: invalid user handle.
WARNING: xclOpen Handle check failed
[0]user:0x1042:0x7:[???:??:65535]
device[0].user_instance : 65535
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD65535
WARNING: AwsXcl isGood: invalid user handle.
ERROR: xclOpen Handle check failed
ERROR: Device setup failed
[0]user:0x1042:0x7:[???:??:0]
xclProbe found 1 FPGA slots with xocl driver running
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD0
WARNING: AwsXcl isGood: invalid user handle.
WARNING: xclOpen Handle check failed
[0]user:0x1042:0x7:[???:??:65535]
device[0].user_instance : 65535
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD65535
WARNING: AwsXcl isGood: invalid user handle.
ERROR: xclOpen Handle check failed
ERROR: Device setup failed
[0]user:0x1042:0x7:[???:??:0]
xclProbe found 1 FPGA slots with xocl driver running
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD0
WARNING: AwsXcl isGood: invalid user handle.
WARNING: xclOpen Handle check failed
[0]user:0x1042:0x7:[???:??:65535]
device[0].user_instance : 65535
WARNING: AwsXcl - Cannot open userPF: /dev/dri/renderD65535
WARNING: AwsXcl isGood: invalid user handle.
ERROR: xclOpen Handle check failed
ERROR: Device setup failed
Error: Failed to find Xilinx platform

Failed to Run example: helloworld_ocl

https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/host/helloworld_ocl.
I ran this sample on kcu1500. SDx: 2017.4 target: hw. But the test failed.
Here are the logs:

Found Device=xilinx_kcu1500_dynamic_5_0
XCLBIN File Name: vector_addition
INFO: Importing xclbin/vector_addition.hw.xilinx_kcu1500_dynamic.xclbin
Loading: 'xclbin/vector_addition.hw.xilinx_kcu1500_dynamic.xclbin'
Result =
Error: Result mismatch:
i = 0 CPU result = 42 Device result = 16711935
TEST FAILED

Using 4 ddr banks in rtl kernel in Sdaccel 2019.1 on Alevo U200

Hi
I use 6 MAXI interfaces in the rtl design, I can use 4 ddr banks max on U200, so I reuse the banks. I have 8 buffers to place in the 4 ddr banks. The code is as below. When exectuting to cl:Buffer line, below error is called:

ERROR: bad host_ptr of mem use flags
src/host.cpp:1051 Error calling cl::Buffer buffer_cnn_fmap(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, cnn_fmap_len, &GlobMem_BUF_output_Ext, &err), error code is: -37
what is the problem? what should I do to reuse the ddr banks correctly? Thanks!
The platform is xilinx_u200_xdma_201830_1/xilinx_u200_xdma_201830_1.xpfm and SDx version is 2019.1.
host.cpp code:

cl_mem_ext_ptr_t ddr0;
ddr0.param = krnl_deepspeech2.get();
ddr0.flags = 0;
ddr0.obj = 0;

cl_mem_ext_ptr_t ddr1;
ddr1.param = krnl_deepspeech2.get();
ddr1.flags = 1;
ddr1.obj = 0;

cl_mem_ext_ptr_t ddr2;
ddr2.param = krnl_deepspeech2.get();
ddr2.flags = 2;
ddr2.obj = 0;

cl_mem_ext_ptr_t ddr3;
ddr3.param = krnl_deepspeech2.get();
ddr3.flags = 3;
ddr3.obj = 0;

OCL_CHECK(err, cl::Buffer buffer_cnn_krnl(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, cnn_krnl_len, &ddr1, &err));
OCL_CHECK(err, cl::Buffer buffer_cnn_fmap(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, cnn_fmap_len, &ddr2, &err));
OCL_CHECK(err, cl::Buffer buffer_cnn_yout (context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY | CL_MEM_EXT_PTR_XILINX, cnn_output_len, &ddr2, &err));
OCL_CHECK(err, cl::Buffer buffer_blstm_ddr0(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, blstm_ddr0_len, &ddr0, &err));
OCL_CHECK(err, cl::Buffer buffer_blstm_ddr1(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, blstm_ddr1_len, &ddr1, &err));
OCL_CHECK(err, cl::Buffer buffer_blstm_ddr2(context, CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, blstm_ddr2_len, &ddr2, &err));
OCL_CHECK(err, cl::Buffer buffer_blstm_ddr3(context, CL_MEM_USE_HOST_PTR |CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, blstm_ddr3_len, &ddr3, &err));
OCL_CHECK(err, cl::Buffer buffer_blstm_yout(context, CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY | CL_MEM_EXT_PTR_XILINX, blstm_output_len, &ddr1, &err));

q.enqueueWriteBuffer(buffer_cnn_krnl,CL_TRUE,0,cnn_krnl_len,cnn_krnl_buff);
q.enqueueWriteBuffer(buffer_cnn_fmap,CL_TRUE,0,cnn_fmap_len,cnn_fmap_buff);
q.enqueueWriteBuffer(buffer_blstm_ddr0,CL_TRUE,0,blstm_ddr0_len,blstm_ddr0_buff);
q.enqueueWriteBuffer(buffer_blstm_ddr1,CL_TRUE,0,blstm_ddr1_len,blstm_ddr1_buff);
q.enqueueWriteBuffer(buffer_blstm_ddr2,CL_TRUE,0,blstm_ddr2_len,blstm_ddr2_buff);
q.enqueueWriteBuffer(buffer_blstm_ddr3,CL_TRUE,0,blstm_ddr3_len,blstm_ddr3_buff);
Make file code:

(XCLBIN)/deepspeech2.$(TARGET).$(DSA).xclbin: $(BINARY_CONTAINER_deepspeech2_OBJS)
mkdir -p $(XCLBIN)
$(XOCC) $(CLFLAGS) $(LDCLFLAGS) -lo $(XCLBIN)/deepspeech2.$(TARGET).$(DSA).xclbin $(XCLBIN)/deepspeech2.$(TARGET).$(DSA).xo --sp DeepSpeech2_TOP_1.M00_AXI:DDR[0] --sp DeepSpeech2_TOP_1.M01_AXI:DDR[1] --sp DeepSpeech2_TOP_1.M02_AXI:DDR[2] --sp DeepSpeech2_TOP_1.M03_AXI:DDR[3] --sp DeepSpeech2_TOP_1.M04_AXI:DDR[1] --sp DeepSpeech2_TOP_1.M05_AXI:DDR[2]

Unable to run VADD RTL kernel Example

I am trying to run the basic RTL vadd example(https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started/rtl_kernel/rtl_vadd) using the SDx 2019. I created a kernel using the RTL kernel Wizard exactly as shown in the example and used to host code provided, but I am unable to see the results for both hw_emu and system. The only thing that I am missing is the top level RTL wrapper since I created the kernel using the SDx RTL Kernel Wizard.

In hw_emu, get the following error:-
Found Platform
Platform Name: Xilinx
INFO: Reading binary_container_1.xclbin
Loading: 'binary_container_1.xclbin'
FATAL ERROR : Simulation process did not launch
*** Error in `./test_rtl_kernl.exe': double free or corruption (fasttop): 0x00000000012e0810 ***
Segmentation fault

In System, I get the following error:-
Found Platform
Platform Name: Xilinx
INFO: Reading binary_container_1.xclbin
Loading: 'binary_container_1.xclbin'
XRT build version: 2.2.2250
Build hash: dd210161e204e882027d22132725d8ffdf285149
Build date: 2019-09-04 23:27:48
Git branch: 2019.1
PID: 8681
UID: 55026
[XRT] ERROR: Invalid scalar argument size, expected 4 got 8
ERROR: clSetKernelArg() for kernel "krnl_vadd_rtl_int", argument index 0.
../src/host.cpp:97 Error calling err = krnl_vadd.setArg(0, buffer_r1), error code is: -51

I did not clone the repo the run the examples. I copy pasted the code directly from the repo.

Thank you in advance for your help.

Support for separate compile and link with XO file

Current Makefile flow directly generate ".xclbin". It does not intermediate file ".xo". Please make changes to generate intermediate files it will improve the compilation and link time incase of multiple Kernels.

Failed to Run Examples

Hi all,

I failed to run any examples here and I guess it's due to the opencl runtime environment.
My hardware platform is xilinx:adm-pcie-7v3:1ddr:3.0 and software platform is SDx-2017.1.
Taking /getting_started/host/helloworld_ocl for instance, If I compile host program with linkage -lOpenCL, executing it will immediately complain Error: Failed to find Xilinx platform.
If I compile host program with linkage -lxlinxopencl, the situation seems a little bit better. The output of host program is shown below.

platform Name: Xilinx
Vendor Name : Xilinx
Found Platform
Found Device=xilinx:adm-pcie-7v3:1ddr:3.0
XCLBIN File Name: vector_addition
INFO: Importing xclbin/vector_addition.hw.xilinx_adm-pcie-7v3_1ddr.xclbin
Loading: 'xclbin/vector_addition.hw.xilinx_adm-pcie-7v3_1ddr.xclbin'
ERROR: Failed to load xclbin
ERROR: program is nullptr
[1]    3626 segmentation fault (core dumped)  ./helloworld

Does anyone have any advice? Your help is really appreciated.

Multiple compute units support

As per my understanding, current Makefile flow support single compute unit kernels. Can you enhance Makefile flow to allow user to create multiple compute units for each kernel.

Issues with Using printf() to Debug Kernels

Hi,
I am trying using printf() to debug Kernels, according to:
https://china.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug1281-sdaccel-debugging-guide.pdf

The kernel code is from helloworld_ocl example:
`#define BUFFER_SIZE 256
kernel attribute((reqd_work_group_size(1, 1, 1)))
void vector_add(global int* c,
global const int* a,
global const int* b,
const int n_elements)
{
int arrayA[BUFFER_SIZE];
int arrayB[BUFFER_SIZE];
printf("BUFFER_SIZE is %d\n", BUFFER_SIZE);
printf("n_elements is %d\n", n_elements);
printf("Some number is is %d\n", 55);

for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
{
    int size = BUFFER_SIZE;
    printf("Current i is %d\n", i);
    if (i + size > n_elements) size = n_elements - i;
    readA: for (int j = 0 ; j < size ; j++) arrayA[j] = a[i+j];
    readB: for (int j = 0 ; j < size ; j++) arrayB[j] = b[i+j];
    vadd_writeC: for (int j = 0 ; j < size ; j++) c[i+j] = arrayA[j] + arrayB[j];
}

}`

The command I run on AWS F1 instance is:
make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all

The result is:

BUFFER_SIZE is 8
n_elements is 8
Some number is is 8
Current i is 8

As you can see all are 8, not sure why.
Can you please help me to figure out how to use printf()?

Error hw_emu.

Error running the following command: make check TARGETS=hw_emu DEVICES=$AWS_PLATFORM all.

How can I fix it? Any ideas?

Thanks.

****** vpl v2017.4 (64-bit)
**** SW Build 2193837 on Tue Apr 10 18:06:59 MDT 2018
** Copyright 1986-2017 Xilinx, Inc. All Rights Reserved.

INFO: [VPL 60-839] Read in kernel information from file 'kernel_info.dat'.
INFO: [VPL 60-895] Target platform: /home/centos/src/project_data/aws-fpga/SDAccel/aws_platform/xilinx_aws-vu9p-f1_dynamic_5_0/xilinx_aws-vu9p-f1_dynamic_5_0.xpfm
INFO: [VPL 60-423] Target device: xilinx_aws-vu9p-f1_dynamic_5_0
INFO: [VPL 60-251] Hardware accelerator integration...
ERROR: [VPL 60-777] Sorry, but it appears that a Xilinx program has terminated unexpectedly. Please contact Xilinx technical support for further assistance and give them the contents of /home/centos/src/project_data/aws-fpga/SDAccel/examples/xilinx_2017.4/getting_started/host/helloworld_ocl/_xocc_link_vector_addition.hw_emu.xilinx_aws-vu9p-f1_dynamic_5_0_vector_addition.hw_emu.xilinx_aws-vu9p-f1_dynamic_5_0.dir/_vpl/ipi/CrashLog
ERROR: [VPL 60-399] vivado failed, please see log file for detail: '/home/centos/src/project_data/aws-fpga/SDAccel/examples/xilinx_2017.4/getting_started/host/helloworld_ocl/_xocc_link_vector_addition.hw_emu.xilinx_aws-vu9p-f1_dynamic_5_0_vector_addition.hw_emu.xilinx_aws-vu9p-f1_dynamic_5_0.dir/_vpl/ipi/vivado.log'
ERROR: [VPL 60-806] Failed to finish platform linker
ERROR: [XOCC 60-398] vpl failed
ERROR: [XOCC 60-626] Kernel link failed to complete
ERROR: [XOCC 60-703] Failed to finish linking

XOCC crash

I've followed the instructions to install xocc on a t2.micro instance and came across this error:

$ xocc
terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid
/opt/Xilinx/SDx/2018.2.op2258646/bin/loader: line 194:  1869 Aborted                 "$RDI_PROG" "$@"

I am following the instructions here. These instructions used to work for me, but not anymore. Thoughts?

"make help" support

Can we add additional target in Makefile (help) which will print the makefile usage?

Child process did not launch

Hi,

I met this problem and I found that kcu1500 isn't in the support device list.
Is it the reason for this error message?
If yes, could you help me to fixed it? or give me some suggestions to fixed it?
Thank you.

-- SDAccel_Examples/getting_started/clk_freq/large_loop_ocl --
INFO: Importing xclbin/cnn_GOOD.sw_emu.xilinx_kcu1500_4ddr-xpr_4_0.xclbin
INFO: Loaded file
FATAL ERROR : child process did not launch
WARNING: Profiling may contain incomplete information. Please call clReleaseProgram() from your host code.
WARNING: Profiling may contain incomplete information. Please call clReleaseProgram() from your host code.
Segmentation fault (core dumped)

-- SDAccel_Examples/getting_started/host/helloworld_ocl --
platform Name: Intel(R) FPGA SDK for OpenCL(TM)
Vendor Name : Xilinx
platform Name: Xilinx
Vendor Name : Xilinx
Found Platform
Found Device=xilinx_kcu1500_4ddr-xpr_4_0
XCLBIN File Name: vector_addition
INFO: Importing xclbin/vector_addition.sw_emu.xilinx_kcu1500_4ddr-xpr_4_0.xclbin
Loading: 'xclbin/vector_addition.sw_emu.xilinx_kcu1500_4ddr-xpr_4_0.xclbin'
FATAL ERROR : child process did not launch
WARNING: Profiling may contain incomplete information. Please call clReleaseProgram() from your host code.
WARNING: Profiling may contain incomplete information. Please call clReleaseProgram() from your host code.
*** Error in `./helloworld': Segmentation fault (core dumped)

clEnqueueWriteBuffer() cause DMA deadlock when transfering large dataset

In the example "dataflow_pipes_ocl", It is said in the host code:

// Using clEnqueueMigrateMemObjects() instead of clEnqueueWriteBuffer() to avoid
// deadlock in real hardware which can be noticed only for large dataset.
// Rootcause: design leads to a deadlock when host->DDR and
// output_stage->DDR causes a contention and deadlock. In small dataset, the
// data gets transferred from host-> DDR in 1 burst and hence no deadlock.
// Solution: Start output_stage when host->DDR data transfer is completed.
// clEnqueueMigrateMemObject() event is used for all three kernels to avoid deadlock.

Why clEnqueueWriteBuffer() does not work for large dataset ? One could also use blocking Writebuffer and wait for the transfer of the large dataset is complete, Since it is said in the comments as long as host->DDR data transfer is completed, there will be no contention may exits.

Can I apply it on SDSoc based on ZCU102

Hi! I got two problems.

  1. Can I apply it on SDSoc based on ZCU102

2.Following the UG1021, I choose zcu102(I don't have a KCU1500 and SDAccel license), and the software platform are as follows:System configuration:A53 OpenCL Linux;Runtime:OpenCL(make it more similar to the kcu1500), and there was a problem in adding SDx Examples, I press the button "Download" for the SDAccel Examples(or the SDSoc Examples), An error windows popped up,“Download Error”:Couldn't download https://github.com/Xilinx/SDAccel_Examples.git.
Do you know the solution?

Regards

hbm_bandwidth on U280 error design does not meet timing

Building the hbm_bandwidth example, either as is, with 8 CU, or with only 2 CU, seems to fail to meet timing on the Alveo U280 accelerator card using SDAccel 2019.1. The log file is provided below. Perhaps this happens because I am using an ES1 device? The hbm_simple example worked with no issues though.
runme.log

Please advice. Thank you!

Issues when installing SHA on aws FPGA

Hi, I'm new about aws fpga and want to try sha on the board.

I'm following the procedures shown here: https://github.com/Xilinx/SDAccel_Examples/tree/2018.2/security/sha1

I've used FPGA Developer AMI v1.5 to launch a f1-2xlarge instance on aws. However, as I'm trying to first Compiling for Application Emulation, I met the following messages:

[centos@ip-172-31-9-50 sha1]$ make TARGETS=sw_emu all
mkdir -p ./xclbin
/opt/Xilinx/SDx/2018.2.op2258646/bin/xocc -t sw_emu --platform xilinx_vcu1525_dynamic --save-temps  --xp "param:compiler.preserveHlsOutput=1" --xp "param:compiler.generateExtraRunData=true" -c -k dev_sha1_update -I'src' -o'xclbin/krnl_sha1.sw_emu.xilinx_vcu1525_dynamic.xo' 'src/krnl_clSha1.cl'

****** xocc v2018.2_AR71715_op (64-bit)
  **** SW Build 2258646 on Thu Jun 14 20:02:38 MDT 2018
    ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.

Attempting to get a license: ap_opencl
Feature available: ap_opencl
INFO: [XOCC 60-585] Compiling for software emulation target
Running SDx Rule Check Server on port:40026
ERROR: [XOCC 60-705] No platform was found that matches 'xilinx_vcu1525_dynamic'. Make sure that the platform is specified correctly and that valid license is installed. The valid platforms supported by the license found are:

ERROR: [XOCC 60-587] Failed to add a platform: specified platform xilinx_vcu1525_dynamic is not found
ERROR: [XOCC 60-600] Kernel compile setup failed to complete
ERROR: [XOCC 60-592] Failed to finish compilation
make: *** [xclbin/krnl_sha1.sw_emu.xilinx_vcu1525_dynamic.xo] Error 1

Can anyone help me fix this please?

Using Multiple DDRs

Hi
Setting up the multiple DDR using the command below in xocc compiler and linker setting
-max_memory_ports all --sp cnn_1.m_axi_gmem0:bank0 --sp cnn_1.m_axi_gmem1:bank1
gives a error:

"make incremental
/opt/Xilinx/SDx/2017.1/bin/xocc -t hw_emu --platform xilinx:kcu1500:4ddr-xpr:4.0 --save-temps --report estimate -c -k cnn -g --messageDb cnn_GOOD/cnn.mdb -I"../src" --xp misc:solution_name=_xocc_compile_cnn_GOOD_cnn -max_memory_ports all --sp cnn_1.m_axi_gmem0:bank0 --sp cnn_1.m_axi_gmem1:bank1 -o"cnn_GOOD/cnn.xo" "../src/cnn_convolution.cpp"

/opt/Xilinx/SDx/2017.1/bin/unwrapped/lnx64.o/xocc: invalid option -- 'm'
"
The sample design fails and gives a error

make all and make all-POWER problems

Hi,

  1. make all-POWER Launches compilation but produced executable hello_POWER.exe cannot be execute.
    Error is: cannot execute binary file
  2. make all produces hello_X86.exe
    and error is: libOpenCl.so not found

I have tried many methods and getting emulation, hw or sw, is not a big deal when tlc scripts are used, but executing opencl code on fpga is still very problematic.
I have get stuck at the same problem even when I use run_system command in sdaccel.

AES implementation should use a column-major state matrix

Hello!

When working with the aes_decrypt example, I compared the results of directly invoking the aes_ecb code (not the encrypted bitmap output, which has an unencrypted BMP file header) against other available implementations, including OpenSSL. After accounting for the standard issues (mostly padding), the results still differed.

After tracing some lightweight implementations and the one provided in the aes_decrypt tree, I discovered that the implementation there assumed that the state matrix is in row major order, so ShiftRows and MixColumns were doing precisely that. However, the state matrix for AES defined to e in column-major order, so ShiftRows and MixColumns were in fact "ShiftColumns" and "MixRows".

The simplest way of correcting this would be to modify the code for ShiftRows and MixColumns so that they operates on a transposed matrix. Thus, the code should be corrected so that operations that act on rows of the state matrix should instead act on the columns of the row-major order matrix that the C/OpenCL code is operating on.

I will be submitting a pull request that contains a fix for this issue.

not support multiple devices in the platform

In the libs/xcl/xcl.c file, the function call "xcl_world_single_vendor" scan the platform ID to match the vendor name "Xilinx", but do not scan the device ID with the same platform name "Xilinx".

The codes only call the function "clGetDeviceIDs" once to get the first device, then use the device name to match the xclbin file.

  1. So it always use the first device even through there are multiple FPGA cards with different Xilinx chips.
  2. The codes are too ugly because it relies on the xclbin file name to match the device name. There are some embedded text information inside the xclbin file, why not use these information instead?

I know the xcl.c is neither part of SDAccel standard nor OpenCL specs. But since it is inside the official git hub, many people will use these wrapper functions instead of OpenCL functions when they are using SDAccel.

possible bug in "Systolic Arrays (CL)" example

I was checking out the mentioned example and I noticed this:

int localC[MAX_SIZE][MAX_SIZE] __attribute__((xcl_array_partition(complete, 0)));;

Am I missing something? Isn't dimension supposed to be the non zero index of array's dimension(1 to N)? here it's zero which doesn't make sense. (according to the SDx Pragma Reference Guide)

There is this comment in the code about local matrix C being partitioned completely(maybe by setting dimension to zero) but nothing on the document explains such capability.

dataflow_subfunc_ocl example and dataflow pragma

I was checking the mentioned example and after HW_EMU build, I noticed that (II) for the whole kernel is 137 clocks. It is way more than the interval of the same kernel WITHOUT dataflow pragma.

Link to the kernel file

If the purpose of this example is to show how dataflow pragma could decrease (II), it might be useful to add dataflow pragma on the top function of the kernel itself.

__attribute__ ((reqd_work_group_size(1, 1, 1)))
__attribute__ ((xcl_dataflow))
void adder(__global int *in, __global int *out, int inc, int size)
{
    run_subfunc(in, out, inc, size);
}

With dataflow pragma on top function of the kernel I get: Latency=137, Interval=2
Without dataflow pragma on top function of the kernel I get: Latency=138, Inteval=138 (master branch without any edits on the kernel)
Without dataflow pragma on any of the top or sub functions I get: Latency=6, Inteval=6

EDGE_COMMON_SW variable is not set

Hi there,

Seeing this error when attempting to build the hello world example for zcu102 board for software emulation. (Same thing for aarch32.) Not sure what I'm missing.

  • J.
(base) adclab@solo:~/Vitis_Accel_Examples/hello_world$ make all TARGET=sw_emu DEVICE=xilinx_zcu102_base_202010_1 HOST_ARCH=aarch64
utils.mk:61: *** EDGE_COMMON_SW variable is not set, please set correctly and rerun.  Stop.

Edit:

(base) adclab@solo:~/Vitis_Accel_Examples/hello_world$ vitis --version

****** Xilinx Vitis Development Environment
****** Vitis v2020.1 (64-bit)
  **** SW Build 2902540 on Wed May 27 19:55:13 MDT 2020
    ** Copyright 1986-2020 Xilinx, Inc. All Rights Reserved.

"make em_config" Support

For sw and hw emulation, user has to create json file using emconfig command. Can we add this steps as part of Makefile flow?

xbinst support in Makefile

Hi,
Can we add xbinst support in Makefile? I guess when user is creating design for hardware, user need runtime drivers to run on board (or nimbix cloud). It is better if we add one make target to run xbinst to generate target DSA package.

Runtime include errors

This commit seems to have broken all examples: 3b7e906

Compiling any example gives the following error because XILINX_XRT is not set (also there is no possible value of XILINX_XRT that is correct, since it seems to assume a different directory structure from what's in SDAccel 2018.2):

% XILINX_SDX=/opt/Xilinx/SDx/2018.2 make exe
mkdir -p ./xclbin
/opt/Xilinx/SDx/2018.2/bin/xcpp -I /include/ -I//opt/Xilinx/SDx/2018.2/Vivado_HLS/include/ -O0 -g -Wall -fmessage-length=0 -std=c++14 -I../../..//libs/xcl2 src/host.cpp ../../..//libs/xcl2/xcl2.cpp -o 'vadd' -lOpenCL -lpthread -lrt -lstdc++ -L/lib/ 
In file included from src/host.cpp:29:0:
../../..//libs/xcl2/xcl2.hpp:47:10: fatal error: CL/cl2.hpp: No such file or directory
 #include <CL/cl2.hpp>
          ^~~~~~~~~~~~
compilation terminated.
In file included from ../../..//libs/xcl2/xcl2.cpp:33:0:
../../..//libs/xcl2/xcl2.hpp:47:10: fatal error: CL/cl2.hpp: No such file or directory
 #include <CL/cl2.hpp>
          ^~~~~~~~~~~~
compilation terminated.

What happened?

HLS C-Kernel Support

I guess current Makefile flow does not support HLS C-Kernel. Can we enhance Makefile flow to support this?

How to connect DDR4 to RTL Kernel

I believe the DDR4 memory is used for communication with OpenCL kernel. How can I get a AXI-Master MM to connect to RTL Kernel without disturbing OpenCL space

"no HwEm HAL handle" error when executing hardware emulation

Hi,

When I compile and run gemm in hw_emu mode, I encounter a crash error in clCreateProgramWithBinary:

ERROR: device::load_binary binary target=HwEm, no HwEm HAL handle
Error: Failed to create compute program from binary -44!

What causes this error? Thanks.

troore

the application hangs on s_axi_control write transaction

Hi,
I was using waveform for Kernel Debugging in Emulation-HW mode, and found the host app continues issuing write transaction to s_axi_control interface until it hits the write address = 0x354.
(1) The AW valid signal has been pulled high, but wvalid signal remains low. The s_axi_control interface has no action since then, I think the runtime of my my waveform is long enough, ~450us. Owing to it, the configuration process hangs, and I could not see ap_start pulse. I have no idea why this happens.
(2) In our kernel.xml, we have defined 136 arguments for our kernel. Some are bonded with s_axi_control port, with address 4bytes aligned, while others are bonded with m00_axi port, with address 8-bytes aligned.
(3) In our Emulation console, it prints the following information.

platform Name: Xilinx
Vendor Name : Xilinx
Found Platform
XCLBIN File Name: v2mc
INFO: Importing ../v2mc.xclbin
Loading: '../v2mc.xclbin'
INFO: [SDx-EM 01] Hardware emulation runs simulation underneath. Using a large data set will result in long simulation times. It is recommended that a small dataset is used for faster execution. This flow does not use cycle accurate models and hence the performance data generated is approximate.
INFO: [SDx-EM 22] [Wall clock time: 00:13, Emulation time: 0.00388295 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB    

INFO: [SDx-EM 22] [Wall clock time: 00:18, Emulation time: 0.00810919 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

INFO: [SDx-EM 22] [Wall clock time: 00:24, Emulation time: 0.0123588 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

INFO: [SDx-EM 22] [Wall clock time: 00:29, Emulation time: 0.0165217 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

INFO: [SDx-EM 22] [Wall clock time: 00:34, Emulation time: 0.0209012 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

INFO: [SDx-EM 22] [Wall clock time: 00:39, Emulation time: 0.0253441 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

INFO: [SDx-EM 22] [Wall clock time: 00:44, Emulation time: 0.0297537 ms] Data transfer between kernel(s) and global memory(s)
v2mc_1:m00_axi          RD = 0.000 KB               WR = 0.000 KB        

ERROR: buffer (44) is not resident in device (0)

(4) In the above design, there are 4 processing units, with 34x4 configuration registers inside kernel.xml and settings for 34x4 kernel arguments in host.cpp. But when I modify kernel.xml and host.cpp just for 1 processing unit, the simulation pass. I thought there might be error in software. But I am not so sure about it.

Could anyone give me some guidance to address the issue ?

xcl_import_binary() API is not useful incase of multiple different kernels

xcl_import_binary() API creates program and one kernel. Incase of multiple different kernels, this API cannot be used multiple times as it will create multiple programs along with multiple kernels.
So I would suggest to have a separate APIs like xcl_create_program() and xcl_create_kernel(). xcl_create_program() will be used once and xcl_create_kernel() API will be called for each kernel.

"make check" Support

Can you add additional make Target "check" to run the design? Currently user has to manually run application and need to refer README.md file to know about exact command.

GUI on F1

I looked for instructions related to the GUI but could not find any. Should I use vnc?

Host streaming example for OpenCL

I can't help but noticing that all examples for host streaming are for Cpp kernels. Is it not possible to achieve the same functionality with OpenCL kernels?

I'm thinking for instance of host/streaming_host_bandwidth or even host/streaming_simple.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.