Skip to content

OpenBLAS 0.3.4 version

Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 02 Dec 22:52
· 5304 commits to release-0.3.0 since this release
c0827a7

common:

  • the new, experimental thread-local memory allocation had
    inadvertently been left enabled for gmake builds in 0.3.3
    despite the announcement. It is now disabled by default, and
    single-threaded builds will keep using the old allocator even
    if the USE_TLS option is turned on.
  • OpenBLAS will now provide enough buffer space for at least 50
    threads by default.
  • The output of openblas_get_config() now contains the version
    number.
  • A serious thread safety bug in GEMV operation with small M and
    large N size has been fixed.
  • The code will now automatically call blas_thread_init after a
    fork if needed before handling a call to openblas_set_num_threads
  • Accesses to parallelized level3 functions from multiple callers
    are now serialized to avoid thread races (unless using OpenMP).
    This should provide better performance than the known-threadsafe
    (but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
  • When building LAPACK with gfortran, -frecursive is now (again)
    enabled by default to ensure correct behaviour.
  • The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
    CBLAS_LAYOUT as the name of the matrix row/column order option.
  • Externally set LDFLAGS are now passed through to the final compile/link
    steps to facilitate setting platform-specific linker flags.
  • A potential race condition during the build of LAPACK (that would
    usually manifest itself as a failure to build TESTING/MATGEN) has been
    fixed.
  • xHEMV has been changed to stay single-threaded for small input sizes
    where the overhead of multithreading exceeds any possible gains
  • CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
    ThunderX hardware with sizable input.
  • Linker flags for the PGI compiler have been updated
  • Behaviour of AXPY with zero increments is now handled in the C interface,
    correcting the result on at least Intel Atom.
  • The result matrix from calling SGELSS with an all-zero input matrix is
    now zeroed completely.

x86_64:

  • Autodetection of AMD Ryzen2 has been fixed (again).
  • CMAKE builds now support labeling of an INTERFACE64=1 build of
    the library with the _64 suffix.
  • AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
    has been sped up by rewriting with C intrinsics
  • Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)

POWER:

  • added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
  • CPU type detection has been implemented for AIX.
  • CPU type detection has been fixed for NETBSD.

MIPS64:

  • AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
  • DSDOT on LOONGSON3A has been fixed.
  • the SGEMM microkernel has been hardened against potential data loss.

ARMV8:

  • DYNAMic_ARCH support is now available for 64bit ARM
  • cross-compiling for ARMV8 under iOS now works.
  • cpu-specific code has been rearranged to make better use of both
    hardware commonalities and model-specific compiler optimizations.
  • XGENE1 has been removed as a TARGET, superseded by the improved generic
    ARMV8 support.

ARMV7:

  • Older assembly mnemonics have been converted to UAL form to allow
    building with clang 7.0
  • Cross compiling LAPACKE for Android has been fixed again (broken by
    update to LAPACK 3.7.0 some while ago).

md5sum
59495ec36d31cae9cf82937515e8c0ad OpenBLAS-0.3.4.zip
e4d940c2983c547da212bee6a491589e OpenBLAS-0.3.4.tar.gz
Download OpenBLAS