Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance scripts #238

Merged
merged 2 commits into from
Sep 3, 2024
Merged

Add performance scripts #238

merged 2 commits into from
Sep 3, 2024

Conversation

maxzhen
Copy link
Collaborator

@maxzhen maxzhen commented Sep 1, 2024

This PR adds two shell scripts for performance analysis based on the built-in trace events in XRT. After installation of the amdxdna plug-in package, these scripts can be found under /opt/xilinx/xrt/amdxdna. They rely on 'perf' command on Linux, so it has to be available in your PATH env.

  • npu_perf_trace.sh: this script can be used to run an XRT application. It will enable all necessary trace events, collect perf data and convert to log file for further analysis.
  • npu_perf_analyze.sh: this script can be used to analyze the output from npu_perf_trace.sh. It can take two events specified by user, parse the output log and calculate the average time difference. User can also specify a range of the log for better understanding the trend of the performance.

Example:

Let's first collect performance data from xrt-smi validate -r latency test

# /opt/xilinx/xrt/amdxdna/npu_perf_trace.sh /opt/xilinx/xrt/bin/xrt-smi validate -d -r latency
[INFO]: Found NPU device 0000:c5:00.1 at /sys/kernel/debug/accel
[INFO]: XRT SDT is removed
[INFO]: XRT SDT is added
[INFO]: perf record -e amdxdna_trace:* -e sdt_xrt:*  -a /opt/xilinx/xrt/bin/xrt-smi validate -d -r latency
Validate Device           : [0000:c5:00.1]
    Platform              : RyzenAI-npu4
    Power Mode            : Default
-------------------------------------------------------------------------------
Verbose: Enabling Verbosity
Test 1 [0000:c5:00.1]     : latency                                             
    Description           : Run end-to-end latency test
    Xclbin                : /opt/xilinx/xrt/amdxdna/bins/17f0_10/validate.xclbin
    Details               : Kernel name is 'DPU_PDI_0'
                            Instruction size: '20' bytes
                            No. of iterations: '10000'
                            Average latency: '46.4' us
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
Validation completed
[ perf record: Woken up 65 times to write data ]
[ perf record: Captured and wrote 17.190 MB perf.data (170133 samples) ]
[INFO]: XRT SDT is removed

Now, let's take a look at average time between xrt::run.start() and xrt::run.wait2() (skipping the first 100 events since they may be slower due to CPU frequence ramping up)

# /opt/xilinx/xrt/amdxdna/npu_perf_analyze.sh 100: "sdt_xrt:xrt_run_start_enter:" "sdt_xrt:xrt_run_wait2_exit:"
Parsing perf.converted.out...
10000 events for: 'sdt_xrt:xrt_run_start_enter:'
10000 events for: 'sdt_xrt:xrt_run_wait2_exit:'
Average over 9900 events: 44us
Largest: 121us@5901
Smallest: 28us@551

Signed-off-by: Max Zhen <max.zhen@amd.com>
Signed-off-by: Max Zhen <max.zhen@amd.com>
@maxzhen maxzhen merged commit b209ee9 into amd:main Sep 3, 2024
@maxzhen maxzhen deleted the trace branch September 3, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant