-
Notifications
You must be signed in to change notification settings - Fork 210
Run your first SDAccel program on AWS F1
This tutorial explains the procedure to package an RTL design as an SDAccel kernel and then use this RTL kernel to accelerate a host application. The tutorial uses the vadd_kernel example from the SDAccel Github examples repository and covers the following:
- Writing an RTL design adhering to the SDAccel kernel interface requirements
- Packaging the RTL design as an SDAccel kernel (XO file)
- Compiling the host application and the FPGA binary containing the RTL kernel
- Creating the Amazon FPGA Image
- Executing the host application with the Amazon FPGA image
Note: This tutorial doesn't presently use the SDAccel RTL Kernel Wizard. The SDAccel RTL Kernel Wizard is a new feature which assists users through the process of packaging RTL designs as SDAccel kernels. The RTL Kernel Wizard generates the required XML file, an example project design and a set of scripts to build that example design into an XO file. For more details on how to use the RTL Kernel Wizard, you can watch this online video
This example is a simple vector-add design. The host application writes two vectors A and B of arbitrary length to the FPGA kernel which in turn sums the two vectors together to produce an output vector C. The host application then reads back the result.
The kernel has an AXI memory mapped master interface and an AXI lite slave interface:
- The AXI master interface is used to read the values of A and B from global memory and write back the values of C
- The AXI lite slave interface is used to pass paramaters and control the kernel as follows:
- Offset 0x00: Control and status register
- Offset 0x10: Base address of vector A in global memory
- Offset 0x1C: Base address of vector B in global memory
- Offset 0x28: Base address of vector C in global memory
- Offset 0x34: Length of the vectors
The kernel starts executing when bit 0 of the control register is set to 1. The AXI master issues bursts requests to read values of A and B from global memory and streams them into two FIFOs, one for the values of A, one for the values of B. The adder module reads from both FIFOs, sums the values to compute C[i] = A[i] + B[i] and writes the result into an output FIFO. This FIFO is read by the AXI master to burst the results of the vector-add back into global memory. When the entire vectors have been processed, the kernel asserts bit 1 of the control register to indicate it is done.
The host.cpp file provides a very simple application to exercise the vector-add kernel. All FPGA-side operations are triggered using standard OpenCL API calls:
- Buffers are created in the FPGA using
cl::Buffer
- Data is copy to and from the FPGA using
<command_queue>.enqueueMigrateMemObjects
- Kernel arguments (length of the vectors, base addresses of A, B, C) are passed using
<kernel>.setArg
- Kernel is executed using
<command_queue>.enqueueTask
Of note, the FPGA device is initialized using the xcl::find_binary_file
and xcl::import_binary_file
utility functions. The xcl::find_binary_file
function makes it very easy to find the desired FPGA binary file. The function looks in 4 predefined directories for a binary file matching one of the following names:
- <name>.<target>.<device>.(aws)xclbin
- <name>.<target>.<device_versionless>.(aws)xclbin
- binary_container_1.(aws)xclbin
- <name>.(aws)xclbin
- Execute the following commands to clone the Github repository and configure the SDAccel environment:
$ git clone https://github.com/aws/aws-fpga.git
$ cd aws-fpga
$ source sdaccel_setup.sh
- Go to the testcase directory
$ cd SDAccel/examples/xilinx/getting_started/rtl_kernel/rtl_vadd
The SDAccel Github examples use common header files and those needs to be copied in the local project source folder to make it easier to use.
- Type the command make local-files to copy all necessary files in the local directory.
$ make local-files
To be used as an SDAccel kernel, an RTL design must comply with the following signals and interface requirements:
- Clock.
- Active Low reset.
- 1 or more AXI4 memory mapped (MM) master interfaces for global memory. All AXI MM master interfaces must have 64-bit addresses.
- You are responsible for partitioning global memory spaces. Each partition in the global memory becomes a kernel argument. The memory offset for each partition must be set by a control register programmable via the AXI4 MM Slave Lite interface.
- One and only one AXI4 MM slave lite I/F for control interface. The AXI Lite interface name must be S_AXI_CONTROL.
- Offset 0 of the AXI4 MM slave lite must have the following signals:
- Bit 0: start signal - The kernel starts processing data when this bit is set.
- Bit 1: done signal - The kernel asserts this signal when the processing is done.
- Bit 2: idle signal - The kernel asserts this signal when it is not processing any data.
- Offset 0 of the AXI4 MM slave lite must have the following signals:
- One or more AXI4-Stream interfaces for streaming data between kernels.
In this example, the RTL is already compliant and doesn't need to be modified.
The RTL code for this example is located in the ./src/hdl
directory.
A fully packaged RTL Kernel is delivered as an XO file which has a file extension of .xo. This file is a container encapsulating a Vivado IP object (including RTL source files) and a kernel description XML file. The XO file can be compiled into the platform and run in the SDAccel hardware or hardware emulation flows.
To package the kernel and create the XO file the following steps are required:
- Writing a kernel description XML file
- Packaging the RTL as a Vivado IP suitable for use in IP Integrator
- Running the
package_xo
command to generate the XO file
A special XML file is needed to describe the interface properties of the RTL kernel. The format for the kernel XML file is described in the Create Kernel Description XML File section of the documentation.
This XML file can be created manually or with the RTL Kernel Wizard. In this example, the XML file is already provided (./src/kernel.xml
).
- Look at the content of the file to familiarize yourself with the information captured in the XML description.
The example comes with the ./scripts/package_kernel.tcl
script which takes the existing RTL design and packages it as a Vivado IP. The script places it in an IP directory called ./packaged_kernel_${suffix}
where “suffix” is specified as an user argument.
- In the
SDAccel/examples/xilinx/getting_started/rtl_kernel/rtl_vadd directory
, run the following commands to package the RTL and create the XO file:
$ vivado -mode tcl
# Set suffix for the directory for RTL-IP import
Vivado% set suffix rtl_ip
# Import the RTL to the “packaged_kernel_{$suffix}” IP directory
Vivado% source scripts/package_kernel.tcl
# Create the XO file
Vivado% package_xo -xo_path ./src/rtl_vadd.xo \
-kernel_name krnl_vadd_rtl \
-ip_directory ./packaged_kernel_rtl_ip \
-kernel_xml ./src/kernel.xml
# Exit Vivado
Vivado% exit
The ./src/rtl_vadd.xo
file gets generated. It contains all the necessary information SDAccel needs to use the kernel.
This section covers the following steps:
- Creating and configuring a new project
- Starting the SDAccel GUI
- Creating a workspace
- Setting the platform
- Creating a new empty project
- Importing the application host code and kernel XO file
- Specifying the binary container for the kernel executable
- Verifying the application using the hardware emulation flow
- Compiling the host application and the FPGA binary for hardware execution
The host application code for this example is in the ./src/host.cpp
file.
In the SDAccel flow the host code uses OpenCL APIs to interacts with the FPGA.
- Open the SDx GUI by running the following command:
$ sdx
- In the Workspace Launcher window, add a workspace inside the current directory named
Test_dir
as shown below. A new directoryTest_dir
will be created and used to store all the logfiles from our runs.
- In the Welcome window, click Add custom platform to set the path to AWS F1 platform.
- Click on the plus sign as shown below.
- Then browse to the /SDAccel/aws_platform/xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0/ directory, select platform.
- Click Apply and OK. This completes the platform setup process.
- In the Welcome window, click Create SDX Project and set the project name to TEST_RTL_KERNEL.
- Move through the next three screens (keeping the default selections) by clicking Next -> Next -> Next.
- Finally select an Empty Application in the Available Templates section, and then click Finish.
On the left side of the SDAccel GUI you will see the Project Explorer pane.
- Right Click on project.sdx and then select Import.
- Select General -> Filesystem and then click on Next.
- Browse to the source file directory of the current example, rtl_vadd/src
- Select the files host.cpp , xcl2.cpp, xcl2.h and rtl_vadd.xo as shown below.
Now that the files have been imported, we must instruct SDAccel to add a binary container, in other words an output file where the FPGA design will be compiled to.
In the center of the SDAccel GUI you will see the SDx Project Settings.
- Click Add Binary Container the icon as shown below
The default name for the binary container is binary_container_1
. Since the host application uses the xcl::find_binary_file utility function, it will automatically find the container by searching for a file with the default name.
The project creation and setup is now complete.
SDAccel provides three different build configurations: Emulation-CPU, Emulation-HW and System.
In the Emulation-CPU mode, the host application executes with a C/C++ or OpenCL model of the kernel(s). The main goal of this mode is to ensure functional correctness of your application. Note: this mode is not presently supported for RTL kernels.
In the Emulation-HW mode, the host application executes with a RTL model of the kernel(s). This mode enables the programmer to check the correctness of the logic generated for the custom compute units and gives performance estimates.
In the System mode, the host application executes with the actual FPGA.
- To run hardware emulation, go to SDx Project Settings and make sure that Active build configuration is set to Emulation-HW.
- Click the Build icon to start the emulation build process.
- After the emulation build process completes, run Hardware Emulation by clicking the Run Icon.
After completion of Hardware Emulation run, you can find and inspect various reports in the Reports tab, such as the System Estimate, Profile Summary, and Application Timeline.
- To run hardware execution, go to SDx Project Settings and set Active build configuration to System.
- Click the Build icon to initiate the hardware build process.
It generally takes few hours to complete the hardware build.
At the end of this process, the host executable (TEST_RTL_KERNEL.exe
) and FPGA binary (binary_container_1.xclbin
) are generated in the Test_dir/TEST_RTL_KERNEL/System
directory.
- Exit the SDAccel GUI.
In order to execute the application on F1, an Amazon FPGA Image (AFI) must first be created from the FPGA binary (*.xclbin). This step cannot be presently performed through the SDAccel GUI. The AFI is created using the AWS create_sdaccel_afi.sh command line script.
- Using your S3 bucket, S3 dcp folder and S3 log folder information, execute the following command:
$ cd ./Test_dir/TEST_RTL_KERNEL/System
$ $SDACCEL_DIR/tools/create_sdaccel_afi.sh \
-xclbin=binary_container_1.xclbin \
-o=binary_container_1 \
-s3_bucket=<bucket-name> \
-s3_dcp_key=<dcp-folder-name> \
-s3_logs_key=<logs-folder-name>
The above step generates an *.awsxclbin file and an *_afi_id.txt file containing the ID of your AFI. The AFI ID can be used to check the status of the AFI generation process.
- Note your AFI ID
$ cat <timestamp>_afi_id.txt
- Check the status of the AFI generation process
$ aws ec2 describe-fpga-images --fpga-image-ids <AFI ID>
The command will return Available when the AFI is created, registered and ready to be used. The command will return Pending otherwise.
"State: {
"Code" : Available"
}
Once the AFI is Available, you can execute the application on the F1 instance.
$ sudo sh
# source /opt/Xilinx/SDx/2017.1.rte/setup.sh
# ./TEST_RTL_KERNEL.exe
Device/Slot[0] (/dev/xdma0, 0:0:1d.0)
xclProbe found 1 FPGA slots with XDMA driver running
platform Name: Xilinx
Vendor Name : Xilinx
Found Platform
XCLBIN File Name: vadd
INFO: Importing ./binary_container_1.awsxclbin
Loading: './binary_container_1.awsxclbin'
TEST PASSED
Behind these deceptively simple log messages, a lot just happened. The application:
- Detected the FPGA platform
- Loaded the
binary_container_1.awsxclbin
container - Retrieved the AFI id from the container and requested that the corresponding AFI be downloaded in the FPGA
- Created buffers in the FPGA and transferred two vectors A and B
- Triggered the FPGA kernel to sum the two vectors A and B
- Read the results back and checked them for correctness
This concludes this tutorial on how to run your first SDAccel program on F1 using RTL kernels.
Do not forget to stop or terminate your instance.
SDAccel Examples Wiki