Releases · OpenMathLib/OpenBLAS

08 Aug 21:01

martin-frbg

v0.3.28

5ef8b19

OpenBLAS 0.3.28 version Latest

Latest

general:

Reworked the unfinished implementation of HUGETLB from GotoBLAS
for allocating huge memory pages as buffers on suitable systems
Changed the unfinished implementation of GEMM3M for the generic
target on all architectures to at least forward to regular GEMM
Improved multithreaded GEMM performance for large non-skinny matrices
Improved BLAS3 performance on larger multicore systems through improved
parallelism
Improved performance of the initial memory allocation by reducing
locking overhead
Improved performance of GBMV at small problem sizes by introducing
a size barrier for the switch to multithreading
Added an implementation of the CBLAS_GEMM_BATCH extension
Fixed miscompilation of CAXPYC and ZAXPYC on all architectures in
CMAKE builds (error introduced in 0.3.27)
Fixed corner cases involving the handling of NAN and INFINITY
arguments in ?SCAL on all architectures
Added support for cross-compiling to WEBM with CMAKE (in addition
to the already present makefile support)
Fixed NAN handling and potential accuracy issues in compilations with
Intel ICX by supplying a suitable fp-model option by default
The contents of the github project wiki have been converted into
a new set of documentation included with the source code.
It is now possible to register a callback function that replaces
the built-in support for multithreading with an external backend
like TBB (openblas_set_threads_callback_function)
Fixed potential duplication of suffixes in shared library naming
Improved C compiler detection by the build system to tolerate more
naming variants for gcc builds
Fixed an unnecessary dependency of the utest on CBLAS
Fixed spurious error reports from the BLAS extensions utest
Fixed unwanted invocation of the GEMM3M tests in cross-compilation
Fixed a flaw in the makefile build that could lead to the pkgconfig
file containing an entry of UNKNOWN for the target cpu after installing
Integrated fixes from the Reference-LAPACK project:
- Fixed uninitialized variables in the LAPACK tests for ?QP3RK (PR 961)
- Fixed potential bounds error in ?UNHR_COL/?ORHR_COL (PR 1018)
- Fixed potential infinite loop in the LAPACK testsuite (PR 1024)
- Make the variable type used for hidden length arguments configurable (PR 1025)
- Fixed SYTRD workspace computation and various typos (PR 1030)
- Prevent compiler use of FMA that could increase numerical error in ?GEEVX (PR 1033)

x86_64:

reverted thread management under Windows to its state before 0.3.26
due to signs of race conditions in some circumstances now under study
fixed accidental selection of the unoptimized generic SBGEMM kernel
in CMAKE builds for CooperLake and SapphireRapids targets
fixed a potential thread buffer overrun in SBSTOBF16 on small systems
fixed an accuracy issue in ZSCAL introduced in 0.3.26
fixed compilation with CMAKE and recent releases of LLVM
added support for Intel Emerald Rapids and Meteor Lake cpus
added autodetection support for the Zhaoxin KX-7000 cpu
fixed autodetection of Intel Prescott (probably broken since 0.3.19)
fixed compilation for older targets with the Yocto SDK
fixed compilation of the converter-generated C versions
of the LAPACK sources with gcc-14
improved compiler options when building with CMAKE and LLVM for
AVX512-capable targets
added support for supplying the L2 cache size via an environment
variable (OPENBLAS_L2_SIZE) in case it is not correctly reported
(as in some VM configurations)
improved the error message shown when thread creation fails on startup
fixed setting the rpath entry of the dylib in CMAKE builds on MacOS

arm:

fixed building for baremetal targets with make

arm64:

Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
matrix to the corresponding GEMV kernel
added optimized SGEMV and DGEMV kernels for A64FX
added optimized SVE kernels for small-matrix GEMM
added A64FX to the cpu list for DYNAMIC_ARCH
fixed building with support for cpu affinity
worked around accuracy problems with C/ZNRM2 on NeoverseN1 and
Apple M targets
improved GEMM performance on Neoverse V1
fixed compilation for NEOVERSEN2 with older compilers
fixed potential miscompilation of the SVE SDOT and DDOT kernels
fixed potential miscompilation of the non-SVE CDOT and ZDOT kernels
fixed a potential overflow when using very large user-defined BUFFERSIZE
fixed setting the rpath entry of the dylib in CMAKE builds on MacOS

power:

Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
matrix to the corresponding GEMV kernel
significantly improved performance of SBGEMM on POWER10
fixed compilation with OpenMP and the XLF compiler
fixed building of the BLAS extension utests under AIX
fixed building of parts of the LAPACK testsuite with XLF
fixed CSWAP/ZSWAP on big-endian POWER10 targets
fixed a performance regression in SAXPY on POWER10 with OpenXL
fixed accuracy issues in CSCAL/ZSCAL when compiled with LLVM
fixed building for POWER9 under FreeBSD
fixed a potential overflow when using very large user-defined BUFFERSIZE
fixed an accuracy issue in the POWER6 kernels for GEMM and GEMV

riscv64:

Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
matrix to the corresponding GEMV kernel
fixed building for RISCV64_GENERIC with OpenMP enabled
added DYNAMIC_ARCH support (comprising GENERIC_RISCV64 and the two
RVV 1.0 targets with vector length of 128 and 256)
worked around the ZVL128B kernels for AXPBY mishandling the special
case of zero Y increment

loongarch64:

improved GEMM performance on servers of the 3C5000 generation
improved performance and stability of DGEMM
improved GEMV and TRSM kernels for LSX and LASX vector ABIs
fixed CMAKE compilation with the INTERFACE64 option set
fixed compilation with CMAKE
worked around spurious errors flagged by the BLAS3 tests
worked around a miscompilation of the POTRS utest by gcc 14.1
mips64:
fixed ASUM and SUM kernels to accept negative step sizes in X
fixed complex GEMV kernels for MSA

md5sums:
0f54185b6ef804173c01b9a40520a0e8 OpenBLAS-0.3.28.tar.gz
2b3bb81f49453b12c4a563579bfc1e9f OpenBLAS-0.3.28.zip
80001511e2af8265ca88acaf8d37f308 OpenBLAS-0.3.28-x64-64.zip
a526ff1012d4a5dd1ec1130704195a73 OpenBLAS-0.3.28-x64.zip
660158a21ffe9c7e65877b6c358a4aca OpenBLAS-0.3.28-x86.zip

Assets 7

04 Apr 20:33

martin-frbg

v0.3.27

ce3f668

OpenBLAS 0.3.27 version

general:

added initial (generic) support for the CSKY architecture
capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
underutilized or idle threads
sped up multithreaded POTRF on all platforms
added extension openblas_set_num_threads_local() that returns the previous thread count
re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
for too small workloads
improved the fallback code used when the precompiled number of threads is exceeded,
and made it callable multiple times during the lifetime of an instance
added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
fixed a potential buffer overflow in the interface to the GEMMT kernels
fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
fixed unwanted case sensitivity of the character parameters in ?TRTRS
sped up the OpenMP thread management code
fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
added a testsuite for the BLAS extensions
modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
spurious errors
added support for building the benchmark collection with CMAKE
added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
with OpenMP enabled that use clang with gfortran
fixed building on systems with ucLibc
added support for calling ?NRM2 with a negative increment value on all architectures
added support for the LLVM18 version of the flang-new compiler
fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
Integrated fixes from the Reference-LAPACK project:
- Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)

x86:

fixed handling of NaN and Inf arguments in ZSCAL
fixed GEMM3M functions failing in CMAKE builds

x86-64:

removed all instances of sched_yield() on Linux and BSD
fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
fixed GEMM3M functions failing in CMAKE builds
fixed handling of NaN and Inf arguments in ZSCAL
added compiler checks for AVX512BF16 compatibility
fixed LLVM compiler options for Sapphire Rapids
fixed cpu handling fallbacks for Sapphire Rapids with
disabled AVX2 in DYNAMIC_ARCH mode
fixed extensions SCSUM and DZSUM
improved GEMM performance for ZEN targets

arm:

fixed handling of NaN and Inf arguments in ZSCAL

arm64:

added initial support for the Cortex-A76 cpu
fixed handling of NaN and Inf arguments in ZSCAL
fixed default compiler options for gcc (-march and -mtune)
added support for ArmCompilerForLinux
added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
fixed mishandling of the INTERFACE64 option in CMAKE builds
corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)
added SVE-enabled kernels for CSUM/ZSUM
worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M

power:

improved performance of SGEMM on POWER8/9/10
improved performance of DGEMM on POWER10
added support for OpenMP builds with xlc/xlf on AIX
improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
fixed cpu core counting on AIX
added support for building a shared library on AIX

riscv64:

added support for the X280 cpu
added support for semi-generic RISCV models with vector length 128 or 256
added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
fixed handling of NaN and Inf arguments in ZSCAL
improved cpu model autodetection
fixed corner cases in ?AXPBY for C910V
fixed handling of zero increments in ?AXPY kernels for C910V

loongarch64:

added optimized kernels for ?AMIN and ?AMAX
fixed handling of NaN and Inf arguments in ZSCAL
fixed handling of corner cases in ?AXPBY
fixed computation of SAMIN and DAMIN in LSX mode
fixed computation of ?ROT
added optimized SSYMV and DSYMV kernels for LSX and LASX mode
added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
added optimized CGEMV and ZGEMV kernels

mips:

fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
fixed handling of NaN and Inf arguments in ZSCAL
fixed mishandling of the INTERFACE64 option in CMAKE builds

zarch:

fixed handling of NaN and Inf arguments in ZSCAL
fixed calculation of ?SUM on Z13

md5sum
ef71c66ffeb1ab0f306a37de07d2667f OpenBLAS-0.3.27.tar.gz
4b85246b10d61f362fe8b9b45cd145f0 OpenBLAS-0.3.27.zip
317c6c4f93f233d8be8ea0ad6fd7979e OpenBLAS-0.3.27-x64-64.zip
2b8d25e6a01ad4830ecca4e521172b02 OpenBLAS-0.3.27-x64.zip
c59038e5ea36ee431f5cb7f5de8bf9d9 OpenBLAS-0.3.27-x86.zip

Assets 7

02 Jan 21:27

martin-frbg

v0.3.26

6c77e5e

OpenBLAS 0.3.26 version

general:

improved the version of openblas.pc that is created by the CMAKE build
fixed a CMAKE-specific build problems on older versions of MacOS
worked around linking problems on old versions of MacOS
corrected installation location of the lapacke_mangling header in CMAKE builds
added type declarations for complex variables to the MSVC-specific parts of the LAPACK header
significantly sped up ?GESV for small problem sizes by introducing a lower bound for multithreading
imported additions and corrections from the Reference-LAPACK project:
- added new LAPACK functions for truncated QR with pivoting (Reference-LAPACK PRs 891&941)
- handle miscalculation of minimum work array size in corner cases (Reference-LAPACK PR 942)
- fixed use of uninitialized variables in ?GEDMD and improved inline documentation (PR 959)
- fixed use of uninitialized variables (and consequential failures) in ?BBCSD (PR 967)
- added tests for the recently introduced Dynamic Mode Decomposition functions (PR 736)
- fixed several memory leaks in the LAPACK testsuite (PR 953)
- fixed counting of testsuite results by the Python script (PR 954)

x86-64:

fixed computation of CASUM on SkylakeX and newer targets in the special
case that AVX512 is not supported by the compiler or operating environment
fixed potential undefined behaviour in the CASUM/ZASUM kernels for AVX512 targets
worked around a problem in the pre-AVX kernel for GEMV
sped up the thread management code on MS Windows

arm64:

fixed building of the LAPACK testsuite with Xcode 15 on Apple M1 and newer
sped up the thread management code on MS Windows
sped up SGEMM and DGEMM on Neoverse V1
sped up ?DOT on SVE-capable targets
reduced the number of targets in DYNAMIC_ARCH builds by eliminating functionally equivalent ones
included support for Apple M1 and newer targets in DYNAMIC_ARCH builds

power:

improved the SGEMM kernel for POWER10
fixed compilation with (very) old versions of gcc
fixed detection of old 32bit PPC targets in CMAKE-based builds
added autodetection of the POWERPC 7400 subtype
fixed CMAKE-based compilation for PPCG4 and PPC970 targets

loongarch64:

added and improved optimized kernels for almost all BLAS functions

md5sums:
bd496a1c81769ed19a161c1f8f904ccd OpenBLAS-0.3.26.tar.gz
f2524d2eaa55e9c2bad4d203401d4c7f OpenBLAS-0.3.26.zip
739d5666e46b046425b932fb83ce5571 OpenBLAS-0.3.26-x86.zip
3b573471bbc7639b896d1aab356b7e57 OpenBLAS-0.3.26-x64.zip
7522e53dfb4c8c3207c191e66de59430 OpenBLAS-0.3.26-x64-64.zip
(note that you need to edit the paths in the openblas.pc and OpenBLASConfig.cmake files of the Windows binary packages to reflect
your installation location, if you plan to have OpenBLAS findable via pkgconfig or cmake on your Windows system)

Assets 7

12 Nov 21:58

martin-frbg

v0.3.25

5e1a429

OpenBLAS 0.3.25 version

general:

improved the error message shown on exceeding the maximum thread count
improved the code to add supplementary thread buffers in case of overflow
fixed a potential division by zero in ?ROTG
improved the ?MATCOPY functions to accept zero-sized rows or columns
corrected empty prototypes in function declarations
cleaned up unused declarations in the f2c-converted versions of the LAPACK sources
fixed compilation with the Cray CCE Compiler suite
improved link line rewriting to avoid mixed libgomp/libomp builds with clang&gfortran
worked around OPENMP builds with LLVM14's libomp hanging on FreeBSD
improved the Makefiles to require less option duplication on "make install"
imported the following changes from the upcoming release 3.12 of Reference-LAPACK
- deprecate utility functions ?GELQS and ?GEQRS (LAPACK PR 900)
- apply rounding up to workspace calculations done in floating point (LAPACK PR 904)
- avoid overflow in STGEX2/DTGEX2 (LAPACK PR 907)
- fix accumulation in ?LASSQ (LAPACK PR 909)
- fix handling of NaN values in ?GECON (LAPACK PR 926)
- avoid overflow in CBDSQR/ZBDSQR (LAPACK PR 927)
- fix poor vector orthogonalizations in ?ORBDB5/?UNBDB5 (LAPACK PR 928 & 930)

x86-64:

fixed compile-time autodetection of AMD Ryzen3 and Ryzen4 cpus
fixed capability-based fallback selection for unknown cpus in DYNAMIC_ARCH
added AVX512 optimizations for ?ASUM on Sapphire Rapids and Cooper Lake

ARM64:

fixed building on Apple with homebrew gcc
fixed building with XCODE 15
fixed building on A64FX and Cortex A710/X1/X2
increased the default buffer size for recent ARM server cpus

POWER:

fixed building with the IBM xlf 16.1.1 compiler
fixed building with IBM XL C
added support for DYNAMIC_ARCH builds with clang
fixed union declaration in the BFLOAT16 test case
enable optimizations for the AIX assembler on POWER10

LOONGARCH64:

added an optimized SGEMV kernel
added an optimized DTRSM kernel

md5sums:
db39b32181b10ec2d1572e81e3dc869c OpenBLAS-0.3.25.zip
48384e324cd1cdcfbdb0d2e16ca55327 OpenBLAS-0.3.25.tar.gz
cc93916bd780a13429b65eb9c05527f2 OpenBLAS-0.3.25-x64.zip
58bb5dfc626d3af86aab7fab409c192d OpenBLAS-0.3.25-x64-64.zip
07a19abeac6c67595ec447315244ccd3 OpenBLAS-0.3.25-x86.zip

Assets 7

03 Sep 21:21

martin-frbg

v0.3.24

9f815cf

OpenBLAS 0.3.24 version

general:

declared the arguments of cblas_xerbla as const (in accordance with the reference implementation
and others, the previous discrepancy appears to have dated back to GotoBLAS)
fixed the implementation of ?GEMMT that was added in 0.3.23
made cpu-specific SWITCH_RATIO parameters for GEMM available to DYNAMIC_ARCH builds
fixed application of SYMBOLSUFFIX in CMAKE builds
fixed missing SSYCONVF function in the shared library
fixed parallel build logic used with gmake
added support for compilation with LLVM17, in particular its new Fortran compiler
added support for CMAKE builds using the NVIDIA HPC compiler
fixed INTERFACE64 builds with CMAKE and the f95 Fortran compiler
fixed cross-build detection and management in c_check
disabled building of the tests with CMAKE when ONLY_CBLAS is defined
fixed several issues with the handling of runtime limits on the number of OPENMP threads
corrected the error code returned by SGEADD/DGEADD when LDA is too small
corrected the error code returned by IMATCOPY when LDB is too small
updated ?NRM2 to support negative increment values (as introduced in release 3.10.0
of the Reference BLAS)
updated ?ROTG to use the safe scaling algorithm introduced in release 3.10.0 of the Reference BLAS
fixed OpenMP builds with CLANG for the case where libomp is not in a standard location
fixed a potential overwrite of unrelated memory during thread initialisation on startup
fixed a potential integer overflow in the multithreading threshold for ?SYMM/?SYRK
fixed build of the LAPACKE interfaces for the LAPACK 3.11.0 ?TRSYL functions added in 0.3.22
fixed installation of .cmake files in concurrent 32 and 64bit builds with CMAKE
applied additions and corrections from the development branch of Reference-LAPACK:
- fixed actual arguments passed to a number of LAPACK functions (from Reference-LAPACK PR 885)
- fixed workspace query results in LAPACK ?SYTRF/?TRECV3 (from Reference-LAPACK PR 883)
- fixed derivation of the UPLO parameter in LAPACKE_?larfb (from Reference-LAPACK PR 878)
- fixed a crash in LAPACK ?GELSDD on NRHS=0 (from Reference-LAPACK PR 876)
- added new LAPACK utility functions CRSCL and ZRSCL (from Reference-LAPACK PR 839)
- corrected the order of eigenvalues for 2x2 matrices in ?STEMR (Reference-LAPACK PR 867)
- removed spurious reference to OpenMP variables outside OpenMP contexts (Reference-LAPACK PR 860)
- updated file comments on use of LAMBDA variable in LAPACK (Reference-LAPACK PR 852)
- fixed documentation of LAPACK SLASD0/DLASD0 (Reference-LAPACK PR 855)
- fixed confusing use of "minor" in LAPACK documentation (Reference-LAPACK PR 849)
- added new LAPACK functions ?GEDMD for dynamic mode decomposition (Reference-LAPACK PR 736)
- fixed potential stack overflows in the EIG part of the LAPACK testsuite (Reference-LAPACK PR 854)
- applied small improvements to the variants of Cholesky and QR functions (Reference-LAPACK PR 847)
- removed unused variables from LAPACK ?BDSQR (Reference-LAPACK PR 832)
- fixed a potential crash on allocation failure in LAPACKE SGEESX/DGEESX (Reference-LAPACK PR 836)
- added a quick return from SLARUV/DLARUV for N < 1 (Reference-LAPACK PR 837)
- updated function descriptions in LAPACK ?GEGS/?GEGV (Reference-LAPACK PR 831)
- improved algorithm description in ?GELSY (Reference-LAPACK PR 833)
- fixed scaling in LAPACK STGSNA/DTGSNA (Reference-LAPACK PR 830)
- fixed crash in LAPACKE_?geqrt with row-major data (Reference-LAPACK PR 768)
- added LAPACKE interfaces for C/ZUNHR_COL and S/DORHR_COL (Reference-LAPACK PR 827)
- added error exit tests for SYSV/SYTD2/GEHD2 to the testsuite (Reference-LAPACK PR 795)
- fixed typos in LAPACK source and comments (Reference-LAPACK PRs 809,811,812,814,820)
- adopt refactored ?GEBAL implementation (Reference-LAPACK PR 808)

x86_64:

added cpu model autodetection for Intel Alder Lake N
added activation of the AMX tile to the Sapphire Rapids SBGEMM kernel
worked around miscompilations of GEMV/SYMV kernels by gcc's tree-vectorizer
fixed compilation of Cooperlake and Sapphire Rapids kernels with CLANG
fixed runtime detection of Cooperlake and Sapphire Rapids in DYNAMIC_ARCH
fixed feature-based cputype fallback in DYNAMIC_ARCH
added support for building the AVX512 kernels with the NVIDIA HPC compiler
corrected ZAXPY result on old pre-AVX hardware for the INCX=0 case
fixed a potential use of uninitialized variables in ZTRSM

ARMV8:

added cpu model autodetection for Apple M2
fixed wrong results of CGEMM/CTRMM/DNRM2 under OSX (use of reserved register)
added support for building the SVE kernels with the NVIDIA HPC compiler
added support for building the SVE kernels with the Apple Clang compiler
fixed compiler option handling for building the SVE kernels with LLVM
implemented SWITCH_RATIO parameter for improved GEMM performance on Neoverse
activated SVE SGEMM and DGEMM kernels for Neoverse V1
improved performance of the SVE CGEMM and ZGEMM kernels on Neoverse V1
improved kernel selection for the ARMV8SVE target and added it to DYNAMIC_ARCH
fixed runtime check for SVE availability in DYNAMIC_ARCH builds to take OS or
container restrictions into account
fixed a potential use of uninitialized variables in ZTRSM
fix a potential misdetection of ARMV8 hardware as 32bit in CMAKE builds

LOONGARCH64:

added ABI detection
added support for cpu affinity handling
fixed compilation with early versions of the Loongson toolchain
added an optimized SGEMM kernel for 3A5000
added optimized DGEMV kernels for 3A5000
improved the performance of the DGEMM kernel for 3A5000

MIPS64:

fixed miscompilation of TRMM kernels for the MIPS64_GENERIC target

POWER:

fixed compiler warnings in the POWER10 SBGEMM kernel

RISCV:

fixed application of the INTERFACE64 option when building with CMAKE
fix a potential misdetection of RISCV hardware as 32bit in CMAKE builds
fixed IDAMAX and DOT kernels for C910V
fixed corner cases in the ROT and SWAP kernels for C910V
fixed compilation of the C910V target with recent vendor compilers

md5sum:
9fb0d53bf3559d4dea074fa5d7691d39 OpenBLAS-0.3.24.zip
23599a30e4ce887590957d94896789c8 OpenBLAS-0.3.24.tar.gz
3aba5a264dfb0a545723c648b311ae5a OpenBLAS-0.3.24-x86.zip
fc08fe8c0dc7364da115d0e09b5a134f OpenBLAS-0.3.24-x64.zip

note that the Windows binary packages have been regenerated on September 14 because a problem has been found with the included .lib file (referencing a nonexistent "libopenblas.exp.dll" instead of "libopenblas.dll").
If you downloaded the original zip files, their md5sums were
431ef4c46ccd133935fa40be6e02eb14 OpenBLAS-0.3.24-x86.zip
e53de38d326547d6220296a6cec0d9aa OpenBLAS-0.3.24-x64.zip

Assets 6

01 Apr 20:20

martin-frbg

v0.3.23

394a9fb

OpenBLAS 0.3.23 version

general:

fixed a serious regression in GETRF/GETF2 and ZGETRF/ZGETF2 where
subnormal but nonzero data elements triggered the singularity flag
fixed a long-standing bug in CSPR/ZSPR in single-threaded operation
for cases where elements of the X vector are real numbers (or
complex with only the real part zero)
fixed gmake builds with the option NO_LAPACK
fixed a few instances in the gmake Makefiles where expressly
setting NO_LAPACK=0 or NO_LAPACKE=0 would have the opposite effect

x86_64:

added further CPUID values for Intel Raptor Lake

md5sums
115634b39007de71eb7e75cf7591dfb2 OpenBLAS-0.3.23.tar.gz
6c35babfc01534eb04acba653d378839 OpenBLAS-0.3.23.zip
c28473bb8bba85a92f77e350182abddb OpenBLAS-0.3.23-x86.zip
49f156f42622d251aa440ddcd425787d OpenBLAS-0.3.23-x64.zip

note that the Windows binary packages have been regenerated on September 14 because a problem has been found with the included .lib file (referencing a non-existent "libopenblas.exp.dll" instead of "libopenblas.dll").
If you downloaded the original zip files, their md5sums were
d77c18780b2d8a65c9340a415c125918 OpenBLAS-0.3.23-x86.zip
c428119f8d54de25e341ec1becc32251 OpenBLAS-0.3.23-x64.zip

Assets 6

26 Mar 21:45

martin-frbg

v0.3.22

e46971b

OpenBLAS 0.3.22 version

This release has now been found to have an inadvertent regression in LU factorization (GETRF/GETF2)
A new release will be made as soon as the fixes currently under testing are confirmed to be sufficient

general:

Updated the included LAPACK to Reference-LAPACK release 3.11.0
plus post-release corrections and improvements
Added initial support for processing with the EMSCRIPTEN javascript
converter (yielding a single-threaded build only)
Added a threshold for multithreading in SYMM, SYMV and SYR2K
Increased the threshold for multithreading in SYRK
OpenBLAS no longer decreases the global OMP_NUM_THREADS when it
exceeds the maximum thread count the library was compiled for.
fixed ?GETF2 potentially returning NaN with tiny matrix elements
fixed openblas_set_num_threads to work in USE_OPENMP builds
fixed cpu core counting in USE_OPENMP builds returning the number
of OMP "places" rather than cores
fixed interpretation of USE_PERL=0 in build scripts
fixed linking of the library with libm in CMAKE builds
fixed startup delays resulting from a wrong default setting of
NO_WARMUP in CMAKE builds
fixed inconsistent defaults for overriding of LAPACK SPMV, SPR,
SYMV, SYR functions in gmake and CMAKE builds
fixed stride calculation in the optimized small-matrix path of
complex SYR
fixed compilation of ReLAPACK with CMAKE
fixed pkgconfig file contents for INTERFACE64 builds
fixed building of Reference-LAPACK with recent gfortran
fixed building with only a subset of precision types on Windows
added new environment variable OPENBLAS_DEFAULT_NUM_THREADS
added a GEMV-based implementation of GEMMT
added support for building under QNX
updated support for (cross-)building for ALPHA targets

x86_64:

added autodetection of Intel Raptor Lake cpu models
added SSCAL microkernels for Haswell and newer targets
improved the performance of the Haswell DSCAL microkernel
added CSCAL and ZSCAL microkernels for SkylakeX targets
fixed detection of gfortran and Cray CCE compilers
fixed detection of recent versions of the Intel Fortran compiler
fixed compilation with LLVM to no longer run out of AVX512 registers
fix cpu type option setting with recent NVIDIA HPC compiler versions
fixed compilation for/on AMD Ryzen 4 cpus
fixed compilation of AVX2-capable targets with Apple Clang
fixed runtime selection of COOPERLAKE in DYNAMIC_ARCH builds
worked around gcc/llvm using risky FMA operations in CSCAL/ZSCAL
worked around miscompilations of GEMV, SYMV and ZDOT kernels
by gcc12's tree-vectorizer on OSX and Windows

ARM:

fixed cross-compilation to ARMV5 and ARMV6 targets with CMAKE

ARMV8:

fixed cross-compilation to CortexA53 with CMAKE
fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
added cpu autodetection for Cortex X3 and A715
fixed conditional compilation of SVE-capable targets in DYNAMIC_ARCH
sped up SVE kernels by removing unnecessary prefetches
improved the GEMM performance of Neoverse V1
added SVE kernels for SDOT and DDOT
added an SBGEMM kernel for Neoverse N2
improved cpu-specific compiler option selection for Neoverse cpus
added support for setting CONSISTENT_FPCSR

MIPS64:

improved MSA capability detection and handling
added a MIPS64_GENERIC build target
fixed corner cases in DNRM2

LOONGARCH64:

fixed handling of the INTERFACE64 option

RISCV:

fixed handling of the INTERFACE64 option

md5sums:
354e552c15d1ce93fc95cf1e3b181ddc OpenBLAS-0.3.22.tar.gz
c4de94c48a6ddb8ac3036763269aaf27 OpenBLAS-0.3.22.zip
4a5ee2693546ffd03d3a60829f3c6054 OpenBLAS-0.3.22-x64.zip
e1008c13d26caea6f0398ea7d8ce2f8f OpenBLAS-0.3.22-x86.zip

Assets 6

07 Aug 20:50

martin-frbg

v0.3.21

b89fb70

OpenBLAS 0.3.21 version

general:

updated the included LAPACK to Reference-LAPACK 3.10.1
when no Fortran compiler is available, OpenBLAS builds will now automatically
build LAPACK from an f2c-converted copy of LAPACK 3.9.0 unless the NO_LAPACK option
is specified (more recent releases make too heavy use of Fortran90+ features to be easily convertible to C)
similarly added C versions of the BLAS and CBLAS tests
enabled building of the ReLAPACK GEMMT kernels when ReLAPACK is built
function LAPACKE_lsame is now annotated with the GCC attribute "const" to aid static analyzers
added USE_TLS to the list of options reported by the openblas_get_config() function
added openblas_getaffinity() as a Linux-only convenience function wrapping pthread_getaffinity_np()
CMAKE builds now support the BUILD_TESTING keyword (to disable the LAPACK testsuite) of Reference-LAPACK
fixed CMAKE builds of the laswp_ncopy and neg_tcopy kernels
removed the build system requirements for PERL (while keeping the original perl scripts as backup)
handle building and running OpenBLAS on systems that report zero available cpu cores
added SYMBOLPREFIX/SYMBOLSUFFIX handling for LAPACK 3.10.0 functions added in 0.3.20
fixed linking of the utests on QNX
Added support for compilation with the Intel ifx compiler
Added support for compilation with the Fujitsu FCC compiler for Fugaku
Added support for compilation with the Cray C and Fortran compilers
reverted OpenMP threadpool behaviour in the exec_blas call to its state before 0.3.11, that is
the threadpool will no longer grow or shrink on demand as the overhead for this is too big at least with
GNU OpenMP. The adaptive behaviour introduced in 0.3.11 can still be requested at runtime by setting
the environment variable OMP_ADAPTIVE
worked around spurious STFSM/CTFSM errors reported by the LAPACK testsuite

x86_64:

fixed determination of compiler support for AVX512 and removed the 0.3.19
workaround for building SKYLAKEX kernels on Sandybridge hardware
fixed compilation for the SKYLAKEX target with gcc 6
fixed compilation of the CooperLake SBGEMM kernel with LLVM
fixed compilation of the SkyLakeX small matrix GEMM kernels with LLVM or ICC
fixed compilation of some BFLOAT16 kernels with CMAKE
added support for the Zhaoxin/Centaur KH40000 cpu
fixed a potential crash in the ZSYMV kernel used for all targets except generic
fixed gmake compilation for DYNAMIC_ARCH with a DYNAMIC_LIST including ATOM
fixed compilation of LAPACKE with the INTEGER64 option on Windows
added support for cross-compiling to individual Intel or AMD targets using CMAKE
(previously only CORE2 supported, added targets are ATOM, PRESCOTT, NEHALEM, SANDYBRIDGE,
HASWELL,SKYLAKEX, COOPERLAKE, SAPPHIRERAPIDS, OPTERON, BARCELONA, BULLDOZER, PILEDRIVER,
STEAMROLLER,EXCAVATOR, ZEN)

SPARC:

worked around an overflow error in the DNRM2 kernel

POWER:

worked around an overflow error in the POWER6 DNRM2 kernel
fixed compilation on PPC440
fixed a performance regression in the level1 BLAS on POWER10
fixed the POWER10 ZGEMM kernel
fixed singlethreaded builds for POWER10
fixed compilation of the POWER10 DGEMV kernel with older gcc versions
enabled compilation of the BFLOAT16 kernels by default
enabled the small matrix kernels by default for DYNAMIC_ARCH builds
added a workaround for a miscompilation of the CDOT and ZDOT kernels by GCC 12

RISCV:

fixed cpu autodetection logic

ARMV8:

added an SBGEMM kernel for Neoverse N2
worked around an overflow error in the DNRM2 kernel used on M1, NeoverseN1, ThunderX2T99
added support for ARM64 systems running MS Windows
added support for cross-compiling to the GENERIC ARMV8 target under CMAKE (Windows/MSVC)
fixed a performance regression in the generic ARMV8 DGEMM kernel introduced in 0.3.19
added initial support for the Apple M1 cpu under Linux
added initial support for the Phytium FT2000 cpu
added initial support for the Cortex A510, A710, X1 and X2 cpu
fixed an accidental mixup of cpu identifiers in the autodetection code introduced in 0.3.20
fixed linking of Apple M1 builds on macOS 12 and later with recent XCode
made NeoverseN2 available in DYNAMIC_ARCH builds

MIPS,MIPS64:

worked around an overflow error in the DNRM2 kernel

LOONGARCH64:

worked around an overflow error in the DNRM2 kernel
added preliminary support for the LOONGSON2K1000 cpu
added DYNAMIC_ARCH support

md5sum
ffb6120e2309a2280471716301824805 OpenBLAS-0.3.21.tar.gz
4f013627138be6ecbd2c8d1435f2ec40 OpenBLAS-0.3.21.zip
c605e9e4ef227605ebcafa6466f14e25 OpenBLAS-0.3.21-x64.zip
16e2cc782e893df47fef97be09896ae1 OpenBLAS-0.3.21-x86.zip

Assets 6

20 Feb 21:38

martin-frbg

v0.3.20

0b678b1

OpenBLAS 0.3.20 version

general:

some code cleanup, with added casts etc.
fixed obtaining the cpu count with OpenMP and OMP_PROC_BIND unset
fixed pivot index calculation by ?LASWP for negative increments other than one
fixed input argument check in LAPACK ? GEQRT2
improved the check for a Fortran compiler in CMAKE builds
disabled building OpenBLAS' optimized versions of LAPACK complex SPMV,SPR,SYMV,SYR with NO_LAPACK=1
fixed building of LAPACK on certain distributed filesystems with parallel gmake
fixed building the shared library on MacOS with classic flang

x86_64:

fixed cross-compilation with CMAKE for CORE2 target
fixed miscompilation of AVX512 code in DYNAMIC_ARCH builds
added support for the "incidental" AVX512 hardware in Alder Lake when enabled in BIOS

E2K:

add new architecture (Russian Elbrus E2000 family)

SPARC:

fix IMIN/IMAX

ARMV8:

added SVE-enabled CGEMM and ZGEMM kernels for ARMV8SVE and A64FX
added support for Neoverse N2 and V1 cpus

MIPS64:

fixed autodetection of MSA capability

LOONGARCH64:

added an optimized DGEMM kernel

abfaa43d995046ca4c56ccf14165c93c OpenBLAS-0.3.20.tar.gz
33526b15e15971edb657edc15de0c67f OpenBLAS-0.3.20.zip
3d9daef71592665261c032888bd810d6 OpenBLAS-0.3.20-x64.zip
5bfe847082510e44cdc59755cd49b941 OpenBLAS-0.3.20-x86.zip

Assets 6

19 Dec 19:58

martin-frbg

v0.3.19

2480e50

OpenBLAS 0.3.19 version

general:

reverted unsafe TRSV/ZRSV optimizations introduced in 0.3.16
fixed a potential thread race in the thread buffer reallocation routines
that were introduced in 0.3.18
fixed miscounting of thread pool size on Linux with OMP_PROC_BIND=TRUE
fixed CBLAS interfaces for CSROT/ZSROT and CROTG/ZROTG
made automatic library suffix for CMAKE builds with INTERFACE64 available
to CBLAS-only builds

x86_64:

DYNAMIC_ARCH builds now fall back to the cpu with most similar capabilities
when an unknown CPUID is encountered, instead of defaulting to Prescott
added cpu detection for Intel Alder Lake
added cpu detection for Intel Sapphire Rapids
added an optimized SBGEMM kernel for Sapphire Rapids
fixed DYNAMIC_ARCH builds on OSX with CMAKE
worked around DYNAMIC_ARCH builds made on Sandybridge failing on SkylakeX
fixed missing thread initialization for static builds on Windows/MSVC
fixed an excessive read in ZSYMV

POWER:

added support for POWER10 in big-endian mode
added support for building with CMAKE
added optimized SGEMM and DGEMM kernels for small matrix sizes

ARMV8:

added basic support and cputype detection for Fujitsu A64FX
added a generic ARMV8SVE target
added SVE-enabled SGEMM and DGEMM kernels for ARMV8SVE and A64FX
added optimized CGEMM and ZGEMM kernels for Cortex A53 and A55 cpus
fixed cpuid detection for Apple M1 and improved performance
improved compiler flag setting in CMAKE builds

RISCV64:

fixed improper initialization in CSCAL/ZSCAL for strided access patterns

MIPS:

added a GENERIC target for MIPS32
added support for cross-compiling to MIPS32 on x86_64 using CMAKE

MIPS64:

fixed misdetection of MSA capability

9721d04d72a7d601c81eafb54520ba2c OpenBLAS-0.3.19.tar.gz
bd74be5bafbc748266b4e9578bba955b OpenBLAS-0.3.19.zip
507a02d501944bd7586caeee4944d409 OpenBLAS-0.3.19-x86.zip
0cff635aeda36435813caeac391ca39e OpenBLAS-0.3.19-x64.zip

Assets 6

Releases: OpenMathLib/OpenBLAS

OpenBLAS 0.3.28 version

general:

x86_64:

arm:

arm64:

power:

riscv64:

loongarch64:

OpenBLAS 0.3.27 version

general:

x86:

x86-64:

arm:

arm64:

power:

riscv64:

loongarch64:

mips:

zarch:

OpenBLAS 0.3.26 version

general:

x86-64:

arm64:

power:

loongarch64:

OpenBLAS 0.3.25 version

general:

x86-64:

ARM64:

POWER:

LOONGARCH64:

OpenBLAS 0.3.24 version

general:

x86_64:

ARMV8:

LOONGARCH64:

MIPS64:

POWER:

RISCV:

OpenBLAS 0.3.23 version

general:

x86_64:

OpenBLAS 0.3.22 version

general:

x86_64:

ARM:

ARMV8:

MIPS64:

LOONGARCH64:

RISCV:

OpenBLAS 0.3.21 version

general:

x86_64:

SPARC:

POWER:

RISCV:

ARMV8:

MIPS,MIPS64:

LOONGARCH64:

OpenBLAS 0.3.20 version

general:

x86_64:

E2K:

SPARC:

ARMV8:

MIPS64:

LOONGARCH64:

OpenBLAS 0.3.19 version

general:

x86_64:

POWER:

ARMV8:

RISCV64:

MIPS:

MIPS64: