Skip to content

7 FPGA Based on Xilinx ZCU104

Junning Wu edited this page Jul 6, 2018 · 10 revisions

7. FPGA Based on Xilinx ZCU104

基于 Xilinx ZCU104开发板,开发一个智能监控摄像头的Demo,第一步基于ARM+DLA实现,最终基于RISCV+DLA实现。姑且称为RvdlaCAM。

Useful Examples

  1. Tincy YOLO: a real-time, low-latency, low-power object detection system running on a Zynq UltraScale+ MPSoC

7.1 Booting Linux Kernel

7.1.1 Booting with the pre-built Linux images

首先,我们先使用Xilinx发布的boot文件测试一下板子,且熟悉一下具体的流程。这里需要先准备好的设备或者软件如下:

  1. 制作SD卡启动盘。按照How to format SD card for SD boot说明,对SD卡进行分区,boot分区+设备分区。分区之后,将下载并解压后的BOOT.BIN和image.ub文件拷入boot分区,确保文件复制完备

  2. 连接板子。通过Micro-USB将笔记本或者台式机与ZCU104开发板连接,电源线。

Board USB

3.对串口进行配置。并启动串口调试程序,这里使用的是PuTTY,也可以使用其他工具。

COM Setup

接通电源以后,应该可以通过终端看到如下的booting信息(有略节):

84510356 bytes read in 5614 ms (14.4 MiB/s)
## Loading kernel from FIT Image at 10000000 ...
   Using 'conf@system-top.dtb' configuration
   Trying 'kernel@1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  gzip compressed
     Data Start:   0x10000108
     Data Size:    6960378 Bytes = 6.6 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: 0x00080000
     Entry Point:  0x00080000
     Hash algo:    sha1
     Hash value:   d1fff2b0bd2a7e08ca118b5d8471e5a4cce9d488
   Verifying Hash Integrity ... sha1+ OK
## Loading ramdisk from FIT Image at 10000000 ...
   Using 'conf@system-top.dtb' configuration
   Trying 'ramdisk@1' ramdisk subimage
     Description:  petalinux-user-image
     Type:         RAMDisk Image
     Compression:  gzip compressed
     Data Start:   0x106ac8c4
     Data Size:    77510718 Bytes = 73.9 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: unavailable
     Entry Point:  unavailable
     Hash algo:    sha1
     Hash value:   e61bf4564b78dc6aa6d6edef8fb5bf693542dee7
   Verifying Hash Integrity ... sha1+ OK
## Loading fdt from FIT Image at 10000000 ...
   Using 'conf@system-top.dtb' configuration
   Trying 'fdt@system-top.dtb' fdt subimage
     Description:  Flattened Device Tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0x106a3708
     Data Size:    37110 Bytes = 36.2 KiB
     Architecture: AArch64
     Hash algo:    sha1
     Hash value:   987e376db6912a65b0115cf38fecafdf02bb1643
   Verifying Hash Integrity ... sha1+ OK
   Booting using the fdt blob at 0x106a3708
   Uncompressing Kernel Image ... OK
   Loading Ramdisk to 03614000, end 07fff83e ... OK
   Loading Device Tree to 0000000003607000, end 00000000036130f5 ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.14.0-xilinx-v2018.2 (oe-user@oe-host) (gcc versio                                                                                                             n 7.2.0 (GCC)) #1 SMP Fri Jun 15 04:52:59 MDT 2018
[    0.000000] Boot CPU: AArch64 Processor [410fd034]
[    0.000000] Machine model: ZynqMP ZCU104 RevC
[    0.000000] earlycon: cdns0 at MMIO 0x00000000ff000000 (options '115200n8')

。。。。。。


Fingerprint: md5 61:68:b4:a0:34:f0:2f:59:b9:b5:ef:95:1d:6e:ea:d0
dropbear.
Starting syslogd/klogd: done
Starting tcf-agent: OK

PetaLinux 2018.2 xilinx-zcu104-2018_2 /dev/ttyPS0

xilinx-zcu104-2018_2 login:
  1. 使用默认的用户名root和密码root登陆之后,执行uname -a,可以查看系统信息。

Loged In

  1. 按照惯例,Hello World。通过交叉编译器,编译一个简单的hello world程序,打印“Hello World for HAI-1.0”,编译之后将可执行文件a.out拷贝到SD卡,boot分区或者root分区。在终端将sd卡mount到/mnt目录下,直接执行就可以看到输出结果。

Hello World

7.1.2 Running Dense Optical Flow Example

  1. Board Setup

所需材料:HDMI线,ZCU104开发板,USB线,See3CAM_CU30摄像头,Dell U2414H显示器。

接好后的如下图所示。 Board Setup

  1. 准备SD卡

将zcu104-rv-ss-2018-2中optical flow示例的相关文件拷入SD卡,设置好开发板的boot方式。

  1. 启动并完成准备工作

系统启动完成之后,将应用程序所需的库文件拷入到相应的文件夹,如下所示

# cp lib/libopticalflow.so /usr/lib
# cp gstreamer-1.0/libgstsdxopticalflow.so /usr/lib/gstreamer-1.0
# cp lib/libgstsdxbase.so /usr/lib/gstreamer-1.0
# cp lib/libgstsdxallocator.so /usr/lib/gstreamer-1.0

执行video_cmd -S查看可用的视频源。其中ID=1就是我们接入的USB摄像头。

root@xilinx:/mnt# video_cmd -S
VIDEO SOURCE ID VIDEO DEVNODE
HDMI Input 0 /dev/video2
USB Webcam 1 /dev/video4
Virtual Video De 2 /dev/video0

设置源和目的显示的分辨率以及端口,设置命令如下:

video_cmd -s 1 -i 1920x1080@UYVY -X //选择USB源,输入分辨率为1920x1080
video_cmd -d 1 -o 1920x1080@UYVY & //选择HDMI输出,分辨率为1920x1080
  1. 启动

设置完成之后,可以调用gst-launch-1.0命令,执行应用程序。

gst-launch-1.0 v4l2src device=/dev/video4 io-mode=dmabuf ! "video/x-raw, width=1920, height=1080, format=UYVY" ! sdxopticalflow filter-mode=1 ! queue ! kmssink bus-id=b00c0000.v_mix plane-id=31 sync=false

这里需要设置一下kmssink插件的一些配置信息,这里指定的bus-id指示HDMI,plane-id则指示UYVY格式。之后,你应该能够看到输出如下:

result for of 01 result for of 02

7.2 A53+NVDLA(Based on ZCU104)

7.2.1 NVDLA nv_small memory list

RAM<Arch>_<Depth>X<Width>[_Options]<_Mux-Option>_<Rev>
Arch        required, physical implementation of the cell:
              -PDP  pseudo-dual port SRAM.  Created by double clocking
                    a single port RAM.
              -DP   true dual port SRAM.  Always has independent read
                    and write ports.
Depth       required, number of words in the RAM
Width       required, number of bits in the RAM
Options     GL for all RAMs
Mux-Option  Required, fixed width field describing column mux options
              - Mn  Column mux specification.
Rev         Revision: E2 for DP RAMS, D2 for PDP RAMs
  • RAMDP: Dual-Port SRAM

The macro is designed to perform read and write operations independently. RAMDP is a true dual port high density SRAM, which allows read and write to operate at the same time. All write operations are synchronized to the rising edge of write memory clock, CLK_W. The SRAM core is written when WE = ‘1’. Read operation is synchronized to the rising edge of the read memory clock, CLK_R. The SRAM core is read when RE = ‘1’. A latch holds the read data whenever RE = ‘0’. There is no write through capability. If the read address matches the write address, read out data may be corrupted.

  • RAMPDP: Pseudo-Dual Port SRAM

The RAMPDP macro behaves like a dual port RAM, but is created by double clocking a single port RAM. The RAMPDP behaves like a dual port RAM, but is created by double clocking a single port RAM. It can perform a ‘single read’ (1R), a ‘single write’ (1W) or a ‘read followed by write’ (1R+1W) operation in any given clock cycle. A read operation is performed when the signal RE is active high (RE= ‘1’). The output data will be driven to the output port RD in the same cycle read commands are issued. A latch holds the read data when ‘RE’=0. A write operation is performed when WE is high (WE= ‘1’). The input data must be put on the input data bus WD at the same time with the write command. Note that if the read and write address match during a (1R+1W) operation, i.e. RE=WE=’1’, the read data will contain the previous contents of the RAM (read occurs before write).

mem module RAM_INST File
nv_ram_rws_256x7.v nv_ram_rws_256x7_logic.v RAMDP_256x7_GL_M2_E2 NV_NVDLA_NOCIF_DRAM_READ_cq.v
nv_ram_rws_256x3.v nv_ram_rws_256x3_logic.v RAMDP_256x4_GL_M2_E2 NV_NVDLA_NOCIF_DRAM_WRITE_cq.v
nv_ram_rwst_256x8.v nv_ram_rwst_256x8_logic.v RAMDP_256x8_GL_M2_E2 NV_NVDLA_NOCIF_DRAM_READ_cq.v
nv_ram_rwst_256x8.v nv_ram_rwst_256x8_logic.v RAMDP_256x8_GL_M2_E2 NV_NVDLA_NOCIF_DRAM_WRITE_cq.v
nv_ram_rwsp_61x65.v nv_ram_rwsp_61x65_logic.v RAMPDP_64x66_GL_M1_D2 NV_NVDLA_CDP_RDMA_eg.v
nv_ram_rwsthp_80x9.v nv_ram_rwsthp_80x9_logic.v RAMDP_80x9_GL_M2_E2 NV_NVDLA_CDP_DP_syncfifo.v
nv_ram_rwsthp_60x21.v nv_ram_rwsthp_60x21_logic.v RAMDP_60x22_GL_M1_E2 NV_NVDLA_CDP_DP_syncfifo.v
nv_ram_rwsthp_80x15.v nv_ram_rwsthp_80x15_logic.v RAMDP_80x15_GL_M2_E2 NV_NVDLA_CDP_DP_syncfifo.v
nv_ram_rwsthp_19x4.v ------ ------ NV_NVDLA_CDP_DP_intp.v
nv_ram_rws_128x18.v(8) nv_ram_rws_128x18_logic.v RAMPDP_128x18_GL_M2_D2 NV_NVDLA_PDP_CORE_cal2d.v
nv_ram_rwsp_8x65.v nv_ram_rwsp_8x65_logic.v RAMDP_8x66_GL_M1_E2 NV_NVDLA_CDMA_wt.v
nv_ram_rwsp_128x6.v nv_ram_rwsp_128x6_logic.v RAMDP_128x6_GL_M2_E2 NV_NVDLA_CDMA_WT_fifo.v
nv_ram_rwsp_128x6.v nv_ram_rwsp_128x6_logic.v RAMDP_128x6_GL_M2_E2 NV_NVDLA_CDMA_DC_fifo.v
nv_ram_rwsp_128x11.v nv_ram_rwsp_128x11_logic.v RAMDP_128x11_GL_M2_E2 NV_NVDLA_CDMA_IMG_fifo.v
nv_ram_rws_16x64.v(16) nv_ram_rws_16x64_logic.v RAMDP_16x64_GL_M1_E2 NV_NVDLA_CDMA_shared_buffer.v
nv_ram_rws_256x64.v(64) nv_ram_rws_256x64_logic.v RAMPDP_256x64_GL_M2_E2 NV_NVDLA_cbuf.v
nv_ram_rws_16x272.v nv_ram_rws_16x272_logic.v RAMDP_16x272_GL_M1_E2 NV_NVDLA_CACC_assembly_buffer.v
nv_ram_rws_16x256.v nv_ram_rws_16x256_logic.v RAMDP_16x256_GL_M1_E2 NV_NVDLA_CACC_delivery_buffer.v
nv_ram_rwsp_80x14.v nv_ram_rwsp_80x14_logic.v RAMDP_80x14_GL_M2_E2 NV_NVDLA_SDP_MRDMA_cq.v
nv_ram_rwsp_80x65.v nv_ram_rwsp_80x65_logic.v RAMPDP_80x66_GL_M1_E2 NV_NVDLA_SDP_MRDMA_EG_din.v
nv_ram_rwsp_160x16.v nv_ram_rwsp_160x16_logic.v RAMPDP_160x16_GL_M2_D2 NV_NVDLA_SDP_BRDMA_cq.v
nv_ram_rwsp_160x65.v nv_ram_rwsp_160x65_logic.v RAMPDP_160x65_GL_M2_D2 NV_NVDLA_SDP_BRDMA_lat_fifo.v