rebase? (#1)

* With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes OpenMathLib#1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for OpenMathLib#2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For OpenMathLib#2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For OpenMathLib#2009 * Correct range_n limiting same bug as seen in OpenMathLib#1388, somehow missed in corresponding PR OpenMathLib#1389 * Allow multithreading TRMV again revert workaround introduced for issue OpenMathLib#1332 as the actual cause appears to be my incorrect fix from OpenMathLib#1262 (see OpenMathLib#1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes OpenMathLib#2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for OpenMathLib#2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes OpenMathLib#2035 by addressing some concerns from OpenMathLib#1422 * init * address warning introed with OpenMathLib#1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of OpenMathLib#1468 that was missed in OpenMathLib#2004 to address performance regression reported in OpenMathLib#1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for OpenMathLib#2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug OpenMathLib#2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue OpenMathLib#2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for OpenMathLib#2048 * Add Intel Denverton * Change 64-bit detection as explained in OpenMathLib#2056 * Trivial typo fix as suggested in OpenMathLib#2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in OpenMathLib#1955 and OpenMathLib#2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for OpenMathLib#2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for OpenMathLib#2056, using same approach as OpenMathLib#2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes OpenMathLib#2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see Reference-LAPACK/lapack#333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes OpenMathLib#2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for numpy/numpy#13401 as mentioned in OpenMathLib#1955 * Disable DGEMMINCOPY as well for now OpenMathLib#1955 * init * Fix errors in cpu enumeration with glibc 2.6 for OpenMathLib#2114 * Change two http links to https Closes OpenMathLib#2109 * remove redundant code OpenMathLib#2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (OpenMathLib#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of OpenMathLib#2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (OpenMathLib#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue OpenMathLib#2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for OpenMathLib#2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix OpenMathLib#2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for OpenMathLib#1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in OpenMathLib#1912 and comment on OpenMathLib#1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for OpenMathLib#2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for OpenMathLib#2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for OpenMathLib#2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for OpenMathLib#2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from OpenMathLib#2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes OpenMathLib#2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)
TiborGY · Jul 7, 2019 · 2a669e7 · 2a669e7
1 parent 5608999
commit 2a669e7
Show file tree

Hide file tree

Showing 453 changed files with 54,248 additions and 9,486 deletions.
diff --git a/.drone.yml b/.drone.yml
@@ -0,0 +1,143 @@
+---
+kind: pipeline
+name: arm64_gcc_make
+
+platform:
+  os: linux
+  arch: arm64
+
+steps:
+- name: Build and Test
+  image: ubuntu:19.04
+  environment:
+    CC: gcc
+    COMMON_FLAGS: 'DYNAMIC_ARCH=1 TARGET=ARMV8 NUM_THREADS=32'
+  commands:
+    - echo "MAKE_FLAGS:= $COMMON_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC gfortran perl
+    - $CC --version
+    - make QUIET_MAKE=1 $COMMON_FLAGS
+    - make -C test $COMMON_FLAGS
+    - make -C ctest $COMMON_FLAGS
+    - make -C utest $COMMON_FLAGS
+
+---
+kind: pipeline
+name: arm32_gcc_make
+
+platform:
+  os: linux
+  arch: arm
+
+steps:
+- name: Build and Test
+  image: ubuntu:19.04
+  environment:
+    CC: gcc
+    COMMON_FLAGS: 'DYNAMIC_ARCH=1 TARGET=ARMV6 NUM_THREADS=32'
+  commands:
+    - echo "MAKE_FLAGS:= $COMMON_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC gfortran perl
+    - $CC --version
+    - make QUIET_MAKE=1 $COMMON_FLAGS
+    - make -C test $COMMON_FLAGS
+    - make -C ctest $COMMON_FLAGS
+    - make -C utest $COMMON_FLAGS
+
+---
+kind: pipeline
+name: arm64_clang_make
+
+platform:
+  os: linux
+  arch: arm64
+
+steps:
+- name: Build and Test
+  image: ubuntu:18.04
+  environment:
+    CC: clang
+    COMMON_FLAGS: 'DYNAMIC_ARCH=1 TARGET=ARMV8 NUM_THREADS=32'
+  commands:
+    - echo "MAKE_FLAGS:= $COMMON_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC gfortran perl
+    - $CC --version
+    - make QUIET_MAKE=1 $COMMON_FLAGS
+    - make -C test $COMMON_FLAGS
+    - make -C ctest $COMMON_FLAGS
+    - make -C utest $COMMON_FLAGS
+
+---
+kind: pipeline
+name: arm32_clang_cmake
+
+platform:
+  os: linux
+  arch: arm
+
+steps:
+- name: Build and Test
+  image: ubuntu:18.04
+  environment:
+    CC: clang
+    CMAKE_FLAGS: '-DDYNAMIC_ARCH=1 -DTARGET=ARMV6 -DNUM_THREADS=32 -DNOFORTRAN=ON -DBUILD_WITHOUT_LAPACK=ON'
+  commands:
+    - echo "CMAKE_FLAGS:= $CMAKE_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC g++ perl cmake
+    - $CC --version
+    - mkdir build && cd build
+    - cmake $CMAKE_FLAGS ..
+    - make -j
+    - ctest
+
+---
+kind: pipeline
+name: arm64_gcc_cmake
+
+platform:
+  os: linux
+  arch: arm64
+
+steps:
+- name: Build and Test
+  image: ubuntu:18.04
+  environment:
+    CC: gcc
+    CMAKE_FLAGS: '-DDYNAMIC_ARCH=1 -DTARGET=ARMV8 -DNUM_THREADS=32 -DNOFORTRAN=ON -DBUILD_WITHOUT_LAPACK=ON'
+  commands:
+    - echo "CMAKE_FLAGS:= $CMAKE_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC g++ perl cmake
+    - $CC --version
+    - mkdir build && cd build
+    - cmake $CMAKE_FLAGS ..
+    - make -j
+    - ctest
+
+---
+kind: pipeline
+name: arm64_clang_cmake
+
+platform:
+  os: linux
+  arch: arm64
+
+steps:
+- name: Build and Test
+  image: ubuntu:18.04
+  environment:
+    CC: clang
+    CMAKE_FLAGS: '-DDYNAMIC_ARCH=1 -DTARGET=ARMV8 -DNUM_THREADS=32 -DNOFORTRAN=ON -DBUILD_WITHOUT_LAPACK=ON'
+  commands:
+    - echo "CMAKE_FLAGS:= $CMAKE_FLAGS"
+    - apt-get update -y
+    - apt-get install -y make $CC g++ perl cmake
+    - $CC --version
+    - mkdir build && cd build
+    - cmake $CMAKE_FLAGS ..
+    - make -j
+    - ctest
diff --git a/.travis.yml b/.travis.yml
@@ -25,6 +25,15 @@ matrix:
         - TARGET_BOX=LINUX64
         - BTYPE="BINARY=64"
 
+    - <<: *test-ubuntu
+      os: linux-ppc64le
+      before_script:
+        - COMMON_FLAGS="DYNAMIC_ARCH=1 TARGET=POWER8 NUM_THREADS=32"
+      env:
+        # for matrix annotation only
+        - TARGET_BOX=PPC64LE_LINUX
+        - BTYPE="BINARY=64 USE_OPENMP=1"
+
     - <<: *test-ubuntu
       env:
         - TARGET_BOX=LINUX64
@@ -160,45 +169,10 @@ matrix:
         - BTYPE="BINARY=64 INTERFACE64=1"
 
     - <<: *test-macos
+      osx_image: xcode8.3
       env:
         - BTYPE="BINARY=32"
 
-    - &emulated-arm
-      dist: trusty
-      sudo: required
-      services: docker
-      env: IMAGE_ARCH=arm32 TARGET_ARCH=ARMV6 COMPILER=gcc
-      name: "Emulated Build for ARMV6 with gcc"
-      before_install: sudo docker run --rm --privileged multiarch/qemu-user-static:register --reset
-      script: |
-        echo "FROM openblas/alpine:${IMAGE_ARCH}
-        COPY . /tmp/openblas
-        RUN mkdir /tmp/openblas/build                             &&  \
-            cd /tmp/openblas/build                                &&  \
-            CC=${COMPILER} cmake -D DYNAMIC_ARCH=OFF                  \
-                                 -D TARGET=${TARGET_ARCH}             \
-                                 -D BUILD_SHARED_LIBS=ON              \
-                                 -D BUILD_WITHOUT_LAPACK=ON           \
-                                 -D BUILD_WITHOUT_CBLAS=ON            \
-                                 -D CMAKE_BUILD_TYPE=Release ../  &&  \
-            cmake --build ." > Dockerfile
-        docker build .
-    - <<: *emulated-arm
-      env: IMAGE_ARCH=arm32 TARGET_ARCH=ARMV6 COMPILER=clang
-      name: "Emulated Build for ARMV6 with clang"
-    - <<: *emulated-arm
-      env: IMAGE_ARCH=arm64 TARGET_ARCH=ARMV8 COMPILER=gcc
-      name: "Emulated Build for ARMV8 with gcc"
-    - <<: *emulated-arm
-      env: IMAGE_ARCH=arm64 TARGET_ARCH=ARMV8 COMPILER=clang
-      name: "Emulated Build for ARMV8 with clang"
-
-  allow_failures:
-    - env: IMAGE_ARCH=arm32 TARGET_ARCH=ARMV6 COMPILER=gcc
-    - env: IMAGE_ARCH=arm32 TARGET_ARCH=ARMV6 COMPILER=clang
-    - env: IMAGE_ARCH=arm64 TARGET_ARCH=ARMV8 COMPILER=gcc
-    - env: IMAGE_ARCH=arm64 TARGET_ARCH=ARMV8 COMPILER=clang
-
 # whitelist
 branches:
   only:

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -6,7 +6,7 @@ cmake_minimum_required(VERSION 2.8.5)
 project(OpenBLAS C ASM)
 set(OpenBLAS_MAJOR_VERSION 0)
 set(OpenBLAS_MINOR_VERSION 3)
-set(OpenBLAS_PATCH_VERSION 6.dev)
+set(OpenBLAS_PATCH_VERSION 7.dev)
 set(OpenBLAS_VERSION "${OpenBLAS_MAJOR_VERSION}.${OpenBLAS_MINOR_VERSION}.${OpenBLAS_PATCH_VERSION}")
 
 # Adhere to GNU filesystem layout conventions
@@ -20,9 +20,14 @@ if(MSVC)
 option(BUILD_WITHOUT_LAPACK "Do not build LAPACK and LAPACKE (Only BLAS or CBLAS)" ON)
 endif()
 option(BUILD_WITHOUT_CBLAS "Do not build the C interface (CBLAS) to the BLAS functions" OFF)
-option(DYNAMIC_ARCH "Include support for multiple CPU targets, with automatic selection at runtime (x86/x86_64 only)" OFF)
-option(DYNAMIC_OLDER "Include specific support for older cpu models (Penryn,Dunnington,Atom,Nano,Opteron) with DYNAMIC_ARCH" OFF)
+option(DYNAMIC_ARCH "Include support for multiple CPU targets, with automatic selection at runtime (x86/x86_64, aarch64 or ppc only)" OFF)
+option(DYNAMIC_OLDER "Include specific support for older x86 cpu models (Penryn,Dunnington,Atom,Nano,Opteron) with DYNAMIC_ARCH" OFF)
 option(BUILD_RELAPACK "Build with ReLAPACK (recursive implementation of several LAPACK functions on top of standard LAPACK)" OFF)
+if(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+option(NO_AFFINITY "Disable support for CPU affinity masks to avoid binding processes from e.g. R or numpy/scipy to a single core" ON)
+else()
+set(NO_AFFINITY 1)
+endif()
 
 # Add a prefix or suffix to all exported symbol names in the shared library.
 # Avoids conflicts with other BLAS libraries, especially when using
@@ -42,6 +47,19 @@ endif()
 
 #######
 
+if(MSVC AND MSVC_STATIC_CRT)
+    set(CompilerFlags
+            CMAKE_CXX_FLAGS
+            CMAKE_CXX_FLAGS_DEBUG
+            CMAKE_CXX_FLAGS_RELEASE
+            CMAKE_C_FLAGS
+            CMAKE_C_FLAGS_DEBUG
+            CMAKE_C_FLAGS_RELEASE
+            )
+    foreach(CompilerFlag ${CompilerFlags})
+      string(REPLACE "/MD" "/MT" ${CompilerFlag} "${${CompilerFlag}}")
+    endforeach()
+endif()
 
 message(WARNING "CMake support is experimental. It does not yet support all build options and may not produce the same Makefiles that OpenBLAS ships with.")
 
@@ -62,10 +80,10 @@ endif ()
 
 set(SUBDIRS	${BLASDIRS})
 if (NOT NO_LAPACK)
-  list(APPEND SUBDIRS lapack)
   if(BUILD_RELAPACK)
     list(APPEND SUBDIRS relapack/src)
   endif()
+  list(APPEND SUBDIRS lapack)
 endif ()
 
 # set which float types we want to build for
@@ -134,7 +152,7 @@ endif ()
 
 # Only generate .def for dll on MSVC and always produce pdb files for debug and release
 if(MSVC)
-  if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} LESS 3.4)
+  if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} VERSION_LESS 3.4)
     set(OpenBLAS_DEF_FILE "${PROJECT_BINARY_DIR}/openblas.def")
   endif()
   set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /Zi")
@@ -149,15 +167,9 @@ if (${DYNAMIC_ARCH})
   endforeach()
 endif ()
 
-# Only build shared libs for MSVC
-if (MSVC)
-  set(BUILD_SHARED_LIBS ON)
-endif()
-
-
 # add objects to the openblas lib
 add_library(${OpenBLAS_LIBNAME} ${LA_SOURCES} ${LAPACKE_SOURCES} ${RELA_SOURCES} ${TARGET_OBJS} ${OpenBLAS_DEF_FILE})
-target_include_directories(${OpenBLAS_LIBNAME} INTERFACE $<INSTALL_INTERFACE:include>)
+target_include_directories(${OpenBLAS_LIBNAME} INTERFACE $<INSTALL_INTERFACE:include/openblas${SUFFIX64}>)
 
 # Android needs to explicitly link against libm
 if(ANDROID)
@@ -166,7 +178,7 @@ endif()
 
 # Handle MSVC exports
 if(MSVC AND BUILD_SHARED_LIBS)
-  if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} LESS 3.4)
+  if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} VERSION_LESS 3.4)
     include("${PROJECT_SOURCE_DIR}/cmake/export.cmake")
   else()
     # Creates verbose .def file (51KB vs 18KB)
@@ -199,7 +211,8 @@ if (USE_THREAD)
   target_link_libraries(${OpenBLAS_LIBNAME} ${CMAKE_THREAD_LIBS_INIT})
 endif()
 
-if (MSVC OR NOT NOFORTRAN)
+#if (MSVC OR NOT NOFORTRAN)
+if (NOT NO_CBLAS)
   # Broken without fortran on unix
   add_subdirectory(utest)
 endif()
@@ -217,6 +230,14 @@ set_target_properties(${OpenBLAS_LIBNAME} PROPERTIES
   SOVERSION ${OpenBLAS_MAJOR_VERSION}
 )
 
+if (BUILD_SHARED_LIBS AND BUILD_RELAPACK)
+  if (NOT MSVC)
+    target_link_libraries(${OpenBLAS_LIBNAME} "-Wl,-allow-multiple-definition")
+  else()
+    target_link_libraries(${OpenBLAS_LIBNAME} "/FORCE:MULTIPLE")
+  endif()
+endif()
+
 if (BUILD_SHARED_LIBS AND NOT ${SYMBOLPREFIX}${SYMBOLSUFIX} STREQUAL "")
 if (NOT DEFINED ARCH)
   set(ARCH_IN "x86_64")
@@ -314,7 +335,7 @@ install (FILES ${OPENBLAS_CONFIG_H} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
 if(NOT NOFORTRAN)
   message(STATUS "Generating f77blas.h in ${CMAKE_INSTALL_INCLUDEDIR}")
 
-  set(F77BLAS_H ${CMAKE_BINARY_DIR}/f77blas.h)
+  set(F77BLAS_H ${CMAKE_BINARY_DIR}/generated/f77blas.h)
   file(WRITE  ${F77BLAS_H} "#ifndef OPENBLAS_F77BLAS_H\n")
   file(APPEND ${F77BLAS_H} "#define OPENBLAS_F77BLAS_H\n")
   file(APPEND ${F77BLAS_H} "#include \"openblas_config.h\"\n")
@@ -327,10 +348,11 @@ endif()
 if(NOT NO_CBLAS)
 	message (STATUS "Generating cblas.h in ${CMAKE_INSTALL_INCLUDEDIR}")
 
+	set(CBLAS_H ${CMAKE_BINARY_DIR}/generated/cblas.h)
 	file(READ ${CMAKE_CURRENT_SOURCE_DIR}/cblas.h CBLAS_H_CONTENTS)
 	string(REPLACE "common" "openblas_config" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
-	file(WRITE ${CMAKE_BINARY_DIR}/cblas.tmp "${CBLAS_H_CONTENTS_NEW}")
-	install (FILES ${CMAKE_BINARY_DIR}/cblas.tmp DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} RENAME cblas.h)
+	file(WRITE ${CBLAS_H} "${CBLAS_H_CONTENTS_NEW}")
+	install (FILES ${CBLAS_H} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
 endif()
 
 if(NOT NO_LAPACKE)

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -167,4 +167,7 @@ In chronological order:
   * [2017-02-26] ztrmm kernel for IBM z13
   * [2017-03-13] strmm and ctrmm kernel for IBM z13
   * [2017-09-01] initial Blas Level-1,2 (double precision) for IBM z13
-
+  * [2018-03-07] added missing Blas Level 1-2  (double precision) simd codes
+  * [2019-02-01] added missing Blas Level-1,2 (single precision)  simd codes
+  * [2019-03-14] power9 dgemm/dtrmm kernel
+  * [2019-04-29] power9 sgemm/strmm kernel