Skip to content

OpenBLAS 0.3.0 version

Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 23 May 13:57
· 5705 commits to release-0.3.0 since this release
939452e

common:

* fixed some more thread race and locking bugs
* added preliminary support for calling an OpenMP build of the library from multiple threads
* removed performance impact of thread locks added in 0.2.20 on OpenMP code
* general code cleanup 
* optimized DSDOT implementation
* improved thread distribution for GEMM
* corrected IMATCOPY/OMATCOPY implementation
* fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations
* cmake build improvements
* pkgconfig file now contains build options
* openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build
* corrections and improvements for systems with more than 64 cpus
* LAPACK code updated to 3.8.0 including later fixes
* added ReLAPACK, a recursive implementation of several LAPACK functions
* Rewrote ROTMG to handle cases that the netlib code failed to address
* Disabled (broken) multithreading code for xTRMV
* corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard
* shared memory access failures on startup are now handled more gracefully
* restored utests from earlier releases (and made them pass on all affected systems)

SPARC:

* several fixes for cpu autodetection

POWER:

* corrected vector register overwriting in several Power8 kernels
* optimized additional BLAS functions

ARM:

* added support for CortexA53 and A72 
* added autodetection for ThunderX2T99
* made most optimized kernels the default for generic ARMv8 targets 

x86_64:

* parallelized DDOT kernel for Haswell
* changed alignment directives in assembly kernels to boost performance on OSX
* fixed register handling in the GEMV microkernels (bug exposed by gcc7)
* added support for building on OpenBSD and Dragonfly 
* updated compiler options to work with Intel release 2018
* support fully optimized build with clang/flang on Microsoft Windows
* fixed building on AIX

IBM Z:

* added optimized BLAS 1/2 functions

MIPS:

* fixed cpu autodetection helper code
* added mips32 1004K cpu (Mediatek MT7621 and similar SoC)
* added mips64 I6500 cpu

md5sum
4330647cb9755d4d993b27e46b6e7b53 OpenBLAS-0.3.0.zip
42cde2c1059a8a12227f1e6551c8dbd2 OpenBLAS-0.3.0.tar.gz
Download OpenBLAS