Skip to content

v4.0.2 Performance improvements and initial mixed precision support

Compare
Choose a tag to compare
@amcamd amcamd released this 10 Apr 22:31
· 4020 commits to master since this release

Features

  • Initial mixed precision support
  • Performance Improvements
    • Use Buffer Load for global reads (saves registers, reduce instruction count)
    • Support DirectToLds (save registers, reduce latency)
    • Reduce global read offset vgprs (save registers)
    • Use Buffer Store for global stores (reduce instruction count)
    • Optimize global store address calculaton (reduce instruction count)
    • Support LdsPad to reduce LDS write bank conflicts
  • Improve debug for assembly path (asserts, state dump, init LDS)