See Vitis™ Development Environment on xilinx.com See Vitis™ AI Development Environment on xilinx.com |
Version: Vitis 2024.1
Versal™ adaptive SoCs combine programmable logic (PL), processing system (PS), and AI Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. The hardware and software are targeted for programming and optimization by data scientists and software and hardware developers. A host of tools, software, libraries, IP, middleware, and frameworks enable Versal adaptive SoCs to support all industry-standard design flows.
This tutorial demonstrates the creation of a beamforming system running on the AI Engine, PL, and PS, and the validation of the design running on this heterogeneous domain.
The tutorial has been divided into modules. It explains creating a custom embedded platform, a bare-metal host application, and a custom PetaLinux-based Linux host application, as well as hardware emulation and hardware build flows in the context of a complete Versal adaptive SoC system integration. Each module uses a Makefile to build the relevant aspect of the design.
This beamforming tutorial is a system-level design that uses the AI Engine, PL, and PS resources. This design showcases the following features:
- A high utilization of PL and AI Engine resources, which require advanced timing closure techniques
- Custom platform creation
- An AI Engine graph that implements matrix multiplication functions for uplink and downlink beamforming
- RTL kernels that interface with the AI Engine and operate at 400 MHz
- A scalable architecture that only needs a small number of kernels to be developed and can be copied multiple times to extend compute power
- Bare-metal and PetaLinux PS host application development process
- Timing closure methods for a high utilization design
- Hardware emulation and VCK190 board flows
- A hierarchical Makefile structure to highlight dependencies between build steps and showcase a way for multiple developers to work on the same repository at the same time (AI Engine developers, RTL designers, and software developers)
To fully grasp the design, it is assumed that you have the following knowledge and resources:
- Ability to read Tcl scripts
- Ability to read C++ based source code to understand the AI Engine kernels and host application source code (bare metal and Linux)
- Ability to read Verilog RTL to understand the AMD Vivado™ projects created for the RTL PL kernels
- A base bootable design (for example, you have brought up your board, and have a working hardware and board through a simple Vivado design)
This tutorial targets the VCK190 ES board. This board is currently available through early access. If you have already purchased this board, download the necessary files from the lounge and ensure that you have the correct licenses installed. If you do not have a board and ES license, get in touch with your AMD sales contact.
- Obtain a license to enable beta devices in AMD tools (to use the VCK190 platform).
- Obtain licenses for AI Engine tools.
- Follow the instructions in Installing Xilinx Runtime and Platforms (XRT).
- Download and set up the VCK190 Vitis Platform for 2024.1.
- Follow the instructions to install PetaLinux tools in the PetaLinux Tools Documentation (UG1144).
- Download the VCK190 PetaLinux 2024.1 BSP from the Versal AI Core Series VCK190 HeadStart Early Access Site.
To build and run the Beamforming tutorial, download and install the following tools:
When the elements of the Vitis software platform are installed, update the shell environment script. Set the necessary environment variables to your system specific paths.
- Edit the
sample_env_setup.sh
script with your file paths:
export PATH_TO_BSP=<path-to-bsps> #(the folder that contains xilinx-vck190-v2024.1-final.bsp)
source <XILINX-INSTALL-LOCATION>/Vitis/2024.1/settings64.sh
source <path-to-installed-PetaLinux>/settings.sh
- Source the environment script in bash shell:
To get bash shell, use the below command
export SHELL=/bin/bash
echo $SHELL
Source the environment script
source sample_env_setup.sh
Make sure you are using the 2024.1 version of the Xilinx tools.
which vitis
which aiecompiler
If you are a novice user, review the following tutorials to understand the basic AMD Vitis™ compiler concepts and how to build simple AI Engine designs:
This tutorial showcases a beamforming system with 32 layers and 64 antennas implemented on an XCVC1902 Versal ACAP device in the VCK190 board. The beamforming system consists of a downlink subsystem which contains the DL64A32L AI Engine subgraph and the dlbf_data
, dlbf_coeff
, and dlbf_slave
PL RTL kernels. The beamforming system also consists of the uplink subsystem, which contains the UL64A32L AI Engine subgraph and the ulbf_data
, ulbf_coeff
, and ulbf_slave
PL RTL kernels. Together, the downlink and uplink subsystems implement the uplink and downlink matrix multiplication equations for M=32 layers and N=64 antennas and compute sample data. The results are compared to reference downlink and uplink result data for verification. The entire beamforming system is copied three times to make full use of the available AI Engine and PL resources.
The module shows when to create a custom platform rather than a base platform. It also shows how to create a custom platform, using a beamforming platform as an example.
-
Teaches AI Engine developers how to:
- Map beamforming functions to AI Engine kernels.
- Design AI Engine graphs with beamforming source code as example.
- Use the AI Engine compilers (and understand why unique options are used for this design).
- Use the AI Simulator to test against reference output data.
This module shows RTL designers how to:
- Map data storage and data capture functions to Custom RTL PL kernels, which will connect to the AI Engine and custom platform.
- Design PL kernels with the beamforming PL source RTL as an example.
- Package RTL PL kernels in to XO files.
This module shows developers how to:
- Combine an AI Engine graph (
libadf.a
) and*.xo
PL kernels into an XCLBIN. - Guide the Vivado tool to close timing on a high utilization design.
This module shows software developers how to create a bare-metal application for beamforming.
This module shows developers how to:
- Package their design using the Vitis compiler for hardware or hardware emulation.
- Run hardware emulation.
- Run their bare metal design on hardware (VCK190 board).
This module shows developers how to:
- Build a custom PetaLinux software platform.
- Package the linked XSA and custom Petalinux software platform into a new Versal Custom Platform (
.xpfm
).
This module shows developers how to create a Linux PS host application for functional and performance tests.
This module shows developers how to:
- Package their design using the Vitis compiler for a hardware run with a Linux PS host application.
- Run their design on hardware (VCK190 board).
This tutorial shows efficient implementation of beamforming functionality on AI Engine arrays in the AMD Versal AI Engine devices. The design methodology is applicable to many use cases that need high throughput matrix multiplication, such as 5G wireless communication. The following figure shows an example illustration of how matrix multiplication is used in the beamforming of an orthogonal frequency division multiplex (OFDM) system with four layers and six antennas.
A single symbol of an OFDM system contains a frequency component and time component allocated to a single user (X0,0). Multiple symbols in different layers of an OFDM system (X0,0,X0,1,X0,2,X0,3) are multiplied by a specific set of complex weights (H0,0,H1,0,H2,0,H3,0) so the data between layers becomes “orthogonal” to each other. This orthogonality allows the layers to be summed together into a single signal (Y0,0) which is sent to an antenna. A second antenna signal (X0,1) can be created by multiplying another set of weights (H0,1,H1,1,H2,1,H3,1) to each layer (X0,0,X0,1,X0,2,X0,3). The same is done to create the rest of the antenna signals (Y0,2,Y0,3,Y0,4, and Y0,5).
The example and generalized downlink matrix multiplication formulas are given below.
At the receiving end, the antenna data (Y0,0-Y0,5) can be demultiplexed back into their original layers (X0,0-Y0,3) because of their orthogonal feature.
GitHub issues will be used for tracking requests and bugs. For questions go to forums.xilinx.com.
Copyright © 2020–2024 Advanced Micro Devices, Inc