Skip to content

Releases: ROCm/Tensile

Tensile-4.26.0 for ROCm 4.1.0

23 Mar 01:18
47dd2c4
Compare
Choose a tag to compare

Added

  • Make messagepack python dependency optional
  • TensileCreateLibraryFiles: auto create target for build time lib generation
  • Tensile cluster tuning tool
  • Framework for filtering solutions
  • Workflow for manually editing Kernels
  • Tuning client design doc
  • MatrixInstruction for general int8
  • Tensile integration test for TensileCreateLibrary
  • Trig float and random narrow init patterns for new client
  • Summation dimension mirroring (contributed by timlathy & Slimakanzer)
  • ROCm 4.1 TargetID support in Tensile; source kernels force xnack=OFF
  • Tensile/Utilities/merge.py revamp for merging logic yaml files
    • now merge.py requires python3
    • add -v verbosity levels (up to 2)
    • add --notrim to retain leading dimensions in sizes
  • New BoundsCheck design: Access guard page will trigger memory fault
  • Solution fitness metric
  • Auto-tuning documentation and build script improvements
  • Support for High Precision Accumulate FP16/BF16 In FP32 Out
  • CHANGELOG.md

Optimizations

  • Refine PersistentKernel: support PKn1, EPS, optimize LW-vmcnt and sMagicDiv2

Fixed

  • targets to clang-offload-bundler updated to use hipv4 prefix when appropriate
  • Fix bugs of tail-loop branch label, and LR addr restore
  • locateExe in Tensile/Common.py looks in defaultPath first
  • Honor $ENV{ROCM_PATH} to support relocatable ROCm location

Tensile 4.24.0 for rocm 3.10.0

18 Dec 15:28
ab44bf4
Compare
Choose a tag to compare

New Features

  • No new features

Known Issues

  • None

Tensile 4.24.0 for rocm 3.10.0

30 Nov 17:05
ab44bf4
Compare
Choose a tag to compare

Known Issues

  • None

Tensile 4.23.0 for rocm 3.9.0

27 Oct 20:19
b68edc6
Compare
Choose a tag to compare
Merge pull request #1160 from zaliu/master

ROCm 3.9 merge develop into master

Tensile-4.22.0 for ROCm 3.8.0

18 Sep 21:32
9123205
Compare
Choose a tag to compare

New Features

Known Issues

  • None

V4.9.0 Performance improvements

28 Feb 21:39
Compare
Choose a tag to compare

Features

  • Improve persistent kernel implementation
  • Add sequential indexing to tensile merge script

V4.8.0 Performance improvements, bug fixes, add assembly hpa_igemm

31 Jan 20:35
Compare
Choose a tag to compare

NOTE: ABI/API breaking changes introduced in this release, related to the addition of a separate pointer to the output Matrix D.

Features

  • new solution selection logic
  • add persistant kernel
  • enable StaggerU
  • Add 6x6 and 6x4 micro-tiles
  • Add FUSS kernels
  • gfx906 DGEMM NN/NT tuning
  • add dot4 instructions for i8_r/i32_r gemm_ex on gfx906
  • Matrix D Support.
    • GEMM Calls now take in a separate pointer to the output matrix D, replacing matrix C for output (D = aAB + bC).
    • Matrix C will now only be used for the input to GEMM calls.
    • For this release, Matrix D uses the same stride as Matrix C.
    • Previous functionality can be obtained by passing in the same pointer to both Matrix C and D.
  • fixes for merge script
  • add scripts for tuning automation
  • add replacement kernel logic
  • Improved Tensile run times for large numbers of solutions

-

18 Jan 20:32
bcbb800
Compare
Choose a tag to compare
-
Merge pull request #448 from amcamd/fix_replace_master

PrintLevel 2 for write_assemblyFilename

V4.7.0 Performance improvements, bug fixes, add assembly hpa_hgemm, initial source hpa_igemm

19 Dec 19:35
Compare
Choose a tag to compare

Features

  • add dot2 instructions for fp16/fp32 hpa_hgemm on gfx906
  • initial i8/i32 hpa_igemm
  • enable fractional loads
  • enable precise bounds check

V4.6.0 Performance improvements, Bug fixes, add source hpa_hgemm

11 Oct 23:34
Compare
Choose a tag to compare

Features

  • Merge gfx906 code into gfx900/gfx803 code
  • Tune hgemm and sgemm for Resnet50 on gfx906
  • Add source hpa_hgemm
  • Use precise bounds check when possible
  • Tested on ROCm 1.9