Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS-14.1.2 for ROCm1.8.2
Changelist:
- Add initial rocblas_gemm_ex for mixed precision support and foundation for future capabilities
- use Tensile 4.5.0 for bug fixes and performance improvements
- separate tests into quick, pre_checkin, and nightly
- add sweep tests for gemm
rocBLAS 14.1.1 for ROCm 1.8.2
Changelist:
- update hgemm asm_full YAML file for performance; re-train hgemm hip_lite YAML file
- new YAML files with PreciseBoundsCheck disabled
- update hgemm asm_full YAML file, source and VW=2 for m,n,k <= 32
- update hgemm asm_full YAML file, source and VW=1 for m,n,k == 1
- add strided_batched tests for hgemm
- correct gemm test matrix initialization
- change cmake and source files to support hip-clang
- change from __fp16 to _Float16
rocBLAS 14.1.0 for ROCm1.8.2
Changelist:
- partition gemm m and n dimension to avoid offset exceeding 32 bit
- fix set_get_matrix memory leak
- TRSM improved performance and make asynch
- Use hip_device target for ROCm1.8.2
- Improve gemm-strided-batched testing
rocBLAS-14.0.0 for ROCm1.7.1
Changelist:
- fix Xtrsm for large size ldb
- fix set_get_matrix for large size
- fix Xgemm test for large size
- additional training for ResNet sizes
- fix dot, asum, nrm2
rocBLAS-12.3.1 for ROCm1.7.1
Changelist:
- add gemm_kernel_name and gemm_strided_batched_kernel_name
- Tensile training for ResNet1x1
- add mi25 Device 6860 to vega10
- set AMDGPU_TARGETS to gfx803;gfx900
- fix bug in kernel2 in sum, dot, nrm2
rocBLAS-12.2.1 for ROCm 1.7.1
Changelist:
- add function syr
- use Tensile v4.0.1
- add Exact sizes to Tensile yaml files
rocBLAS-0.12.1.0 for ROCm 1.7.1
Changelist
- fix dependency installation
rocBLAS-0.12.0.0 release for ROCM 1.7.1
Same source as rocBLAS-0.12.0.0 release for ROCM 1.7.0 but for ROCM 1.7.1
rocBLAS-0.12.0.0 release for ROCM 1.7.0
Changelist:
- add hgemm
- additional fix for multi-process and multi-threading
- new solution selection logic
rocBLAS-0.10.4.0 release for ROCM 1.7.0
Changelist:
- fix race condition for multi-process and multi-thread
- hipLaunchKernelGGL replaces hipLaunchKernel
- add logging