Skip to content

V4.4.0 Performance Improvements and Bug Fixes

Compare
Choose a tag to compare
@amcamd amcamd released this 10 Aug 03:43
· 3789 commits to master since this release

Features

  • Support Global Split U for half and double
  • Support Local Split U for half and hpa
  • Fix beta for hpa
  • Add AssertFree0ElementMultiple requirement and runtime launch check
  • Intercept solution selection logic and call hgemm HIP kernel when summation index or first free index is odd
  • correct reordered_schedules fallback for hgemm
  • disable PreciseBoundsCheck
  • update rocblas_hgemm_asm_full.yaml to call source with VW=2 for m,n,k <= 32
  • update rocblas_hgemm_asm_full.yaml to call source with VW=1 for m,n,k == 1
  • Use alternating sign in random init for half
  • use hipGetDevice in place of hipCtxGetDevice
  • use _Float16 in place of __fp16
  • add device to llvm_fma_v2f16