Skip to content

Releases: ROCm/rocBLAS

rocBLAS-0.10.3.0 release for ROCM 1.6.4

05 Dec 15:35
Compare
Choose a tag to compare

Changelist:

  • add dgemm assembly from Tensile v3.4.0
  • fix packaging install path
  • integrate clang-format

rocBLAS-0.10.2.0 release for ROCM 1.6.4

30 Nov 23:39
Compare
Choose a tag to compare

Changelist:

  • ported to CentOS
  • updated to use Tensile v3.3.7 with v_add_i32->u32 fix and fix for M<4
  • refactored code and tests for rocblas_pointer_mode

rocBLAS-0.10.1.0 release for ROCM 1.6.4

15 Nov 00:18
Compare
Choose a tag to compare
Pre-release

Changelist:

  • add MI25 tuning for Tensile 3.3.4
  • fix sgemm assembly kernels for thread safety
  • correct iXamax to 1 based indexing
  • refactor tests

Release for ROCM 1.6.4

17 Oct 15:04
Compare
Choose a tag to compare
Pre-release

NOTE: API breaking changes introduced in this release related to: rocblas_iXamax, rocblas_iXamin, complex functions, and half functions.

Changelist:

  • correct API: rocblas_samax -> rocblas_isamax, rocblas_damax -> rocblas_idamax
  • remove from the API functions for complex and half that have not been implemented
  • update to Tensile v3.2.0. This uses sgemm assembly kernels for gfx803 and gfx900
  • add rocblas_sgeam and rocblas_dgeam functions
  • improve repeatability of rocblas_Xgemm performance tests
  • update perf script

release for ROCM 1.6.3

16 Oct 22:09
Compare
Choose a tag to compare
Pre-release

NOTE: API breaking changes introduced in this release, primarily related to library NAME and SONAME.

Changelist:

  • Library removed the suffix which annotated platform (i.e. now librocblas.so)
  • so-name link renamed to reflect the MAJOR version number, (currently 0, changed from 1)
  • Build system entirely rewritten to simplify build/install process. Convenience bash script added to automate builds on Ubuntu distro (install.sh script added to root)
  • Tensile updated to v3.0.4, which includes fixes for NaN propogating on GEMM calls with beta == 0
  • 2 new samples added in samples directory (gemm & strided gemm)
  • haxpy implementation added
  • extra unit tests added and benchmarking capabilities for axpy, dot, scal
  • Improved stability of TRSM unit tests

rocBLAS-0.4.3.0 release for ROCM 1.6

25 Jul 21:34
Compare
Choose a tag to compare
Pre-release

Library release associated with ROCM v1.6 release.

Library tuned for Fiji family hardware.

rocBLAS-0.4.2.3 release for ROCM 1.5

23 Jun 15:14
Compare
Choose a tag to compare
Pre-release

Library release associated with the ROCm v1.5 platform release.

Library tuned for Fiji family hardware.

API Change: The order parameter has been removed from the gemm function. gemm functions now only support column major ordering. If you have row major matrices switch the following parameters: transa and transb, m and n, A and B, lda and ldb.

Below is the rocblas_sgemm function prototype.

rocblas_sgemm(
rocblas_handle handle,
rocblas_operation transa, rocblas_operation transb,
rocblas_int m, rocblas_int n, rocblas_int k,
const float *alpha,
const float *A, rocblas_int lda,
const float *B, rocblas_int ldb,
const float *beta,
float *C, rocblas_int ldc);

rocBLAS-0.4.2.0 release for ROCM 1.6

05 Jul 20:25
Compare
Choose a tag to compare
Pre-release

Library release associated with ROCM v1.6 platform release.

Library tuned for Fiji family hardware.

rocBLAS-0.4.0.2 release for ROCm 1.5

02 May 21:39
Compare
Choose a tag to compare
Pre-release

Library release associated with the ROCm v1.5 platform release

Library tuned for Fiji family hardware. At time of release, there is a known unit test failure in

  • rocblas_trsm_matrix_size/trsm_gtest.trsm_gtest_float/12
  • and others related to TRSM family

This has been identified as an issue in the software stack below the library, and a fix should be forthcoming. We will update release notes when the fix is available.

Preview Release

14 Mar 20:24
Compare
Choose a tag to compare
Preview Release Pre-release
Pre-release

Support for: gemm, trmm, trsm, tritri, gemv, ger, amax, amin, asum, axpy, copy, dot, nrm2, scal, swap