Skip to content

OpenBLAS 0.3.6 version

Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 29 Apr 17:34
· 4966 commits to release-0.3.0 since this release
15cb124

common:

- the build tools now check that a given cpu TARGET is actually valid
- the build-time check of system features (c_check) has been made
  less dependent on particular perl features (this should mainly
  benefit building on Windows)
- several problems with ReLAPACK and its integration were fixed,
  including INTERFACE64 support and building a shared library
- building with CMAKE on BSD systems was improved
- a non-absolute SUM function was added based on the
  existing optimized code for ASUM
- CBLAS interfaces to the IxMIN and IxMAX functions were added
- a name clash between LAPACKE and BOOST headers was resolved
- CMAKE builds with OpenMP failed to include the appropriate getrf_parallel
  kernels
- a crash on thread (key) deletion with the USE_TLS=1 memory management
  option was fixed
- restored several earlier fixes, in particular for OpenMP performance,
  building on BSD, and calling fork on CYGWIN, which had inadvertently
  been dropped in the 0.3.3 rewrite of the memory management code.
  

POWER:

- single precision BLAS1/2 functions have received optimized POWER8 kernels
- POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel
- building on PPC970 systems under OSX Leopard or Tiger is now supported
- out-of-bounds memory accesses in the gemm_beta microkernels were fixed
- building a shared library on AIX is now supported for POWER6
- DYNAMIC_ARCH support has been added for POWER6 and newer

ARMV7:

- corrected xDOT behaviour with zero INC_X or INC_Y 
- a bug in the IMIN implementation made it return the result of IMAX

ARMV8:

- added support for HiSilicon TSV110 cpus
- the CMAKE build system now recognizes 32bit userspace on 64bit hardware 
- cross-compilation with CMAKE now works again
- a bug in the IMIN implementation made it return the result of IMAX
- ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7

x86_64:

- the AVX512 DGEMM kernel has been disabled again due to unsolved problems
- building with old versions of MSVC was fixed
- it is now possible to build a static library on Windows with CMAKE
- accessing environment variables on CYGWIN at run time was fixed
- the CMAKE build system now recognizes 32bit userspace on 64bit hardware
- Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected
- building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported
  with CMAKE as well
- building for DYNAMIC_ARCH with GENERIC as the default target is now supported
- a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed
- assembly bugs involving undeclared modification of input operands were fixed
  in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem, 
  Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause
  test failures or segfaults when compiled with recent versions of gcc from 8 onward.
- a similar bug was fixed in the blas_quickdivide code used to split workloads
  in most functions
- a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX
- fixed building on SkylakeX systems when either the compiler or the (emulated) operating 
  environment does not support AVX512
- improved GEMM performance on ZEN targets

x86:

- build failures caused by the recently added checks for AVX512 were fixed
- an inline assembly bug involving undeclared modification of an input argument was
  fixed in the blas_quickdivide code used to split workloads in most functions
- a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX

MIPS32:

- a bug in the IMIN implementation made it return the result of IMAX

IBM Z:

- optimized microkernels for single precicion BLAS1/2 functions have been added for Z13 and Z14

md5sum
67b3b45ec47d81a158ed219dddf6b69e OpenBLAS-0.3.6.zip
8a110a25b819a4b94e8a9580702b6495 OpenBLAS-0.3.6.tar.gz
Download OpenBLAS