Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge develop in preparation of 0.3.6 release #2100

Merged
merged 259 commits into from
Apr 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
259 commits
Select commit Hold shift + click to select a range
ed01f49
Merge pull request #1946 from martin-frbg/issue1908
martin-frbg Jan 4, 2019
94cd946
[ZARCH] fix cgemv_n_4.c
maamountki Jan 4, 2019
ae1d1f7
Query AVX2 and AVX512 capability for runtime cpu selection
martin-frbg Jan 5, 2019
0afaae4
Query AVX2 and AVX512VL capability in x86 cpu detection
martin-frbg Jan 5, 2019
68eb314
Add xcr0 (os support) check
martin-frbg Jan 5, 2019
e1574fa
Add xcr0 (os support) check
martin-frbg Jan 5, 2019
31ed19e
Add message for SkylakeX and KNL fallbacks to Haswell
martin-frbg Jan 5, 2019
191677b
Add travis_wait to the OSX brew install phase
martin-frbg Jan 8, 2019
cf5d48e
Update OSX environment to Sierra
martin-frbg Jan 8, 2019
1650311
Bump xcode to 8.3
martin-frbg Jan 8, 2019
8d99dba
Merge pull request #1949 from martin-frbg/issue1947
martin-frbg Jan 8, 2019
3eafcfa
[ZARCH] fix cgemv_n_4.c
maamountki Jan 9, 2019
e7455f5
[ZARCH] fix dsdot.c
maamountki Jan 9, 2019
c2ffef8
[ZARCH] fix data prefetch type in ddot
maamountki Jan 9, 2019
be66f5d
[ZARCH] fix data prefetch type in sdot
maamountki Jan 9, 2019
ad2c386
Move TLS key deletion to openblas_quit
martin-frbg Jan 9, 2019
21c0f2a
Merge pull request #1957 from martin-frbg/issue1954
martin-frbg Jan 10, 2019
67432b2
[ZARCH] fix cgemv_n_4.c
maamountki Jan 11, 2019
5d89d6b
[ZARCH] fix sgemv_n_4.c
maamountki Jan 11, 2019
ecc31b7
Update dgemv_t_4.c
maamountki Jan 11, 2019
b731e82
Update sgemv_t_4.c
maamountki Jan 11, 2019
621dedb
[ZARCH] Update cgemv_t_4.c
maamountki Jan 11, 2019
406f835
[ZARCH] update cgemv_n_4.c
maamountki Jan 11, 2019
1a7925b
[ZARCH] Update dgemv_n_4.c
maamountki Jan 11, 2019
0040148
Fix missing braces in support_avx()
martin-frbg Jan 14, 2019
dbc9a06
Fix missing braces in support_av() call
martin-frbg Jan 14, 2019
b815a04
[ZARCH] fix a bug in max/min functions
maamountki Jan 15, 2019
29dc728
Add support for Hygon Dhyana
Jan 16, 2019
def0385
init
brada4 Jan 16, 2019
b70fd23
disable NaN checks before BLAS calls dsolve.R
brada4 Jan 16, 2019
2777a7f
disable NaN checks before BLAS calls dsolve.R (shorter config part)
brada4 Jan 16, 2019
1e3ada6
Merge pull request #1960 from cnjsdfcy/Hygon
martin-frbg Jan 16, 2019
7af8b21
disable NaN checks before BLAS calls dsolve.R (shorter formula)
brada4 Jan 16, 2019
3afceb6
disable NaN checks before BLAS calls deig.R
brada4 Jan 16, 2019
478d3c4
disable NaN checks before BLAS calls deig.R (shorten matrix def)
brada4 Jan 16, 2019
3e601bd
disable NaN checks before BLAS calls dgemm.R
brada4 Jan 16, 2019
8c3386b
Added missing Blas1 single fp {saxpy, caxpy, cdot, crot(refactored ve…
quickwritereader Jan 16, 2019
a034e65
Merge branch 'develop' into develop
quickwritereader Jan 16, 2019
256eb58
Merge pull request #1963 from quickwritereader/develop
martin-frbg Jan 16, 2019
43a4572
crot fix
quickwritereader Jan 17, 2019
3e9fd63
Bump xcode version to 10.1 to make sure it handles AVX512
martin-frbg Jan 17, 2019
24e697e
Merge pull request #1970 from quickwritereader/develop
martin-frbg Jan 17, 2019
d5e6940
Fix declaration of input arguments in the x86_64 microkernels for DOT…
martin-frbg Jan 17, 2019
b495e54
Fix declaration of input arguments in the x86_64 SCAL microkernels (#…
martin-frbg Jan 18, 2019
32b0f11
Fix declaration of input arguments in the Sandybridge GER microkernel…
martin-frbg Jan 18, 2019
cda81cf
Shift transition to multithreading towards larger matrix sizes
martin-frbg Jan 18, 2019
bbfdd6c
Increase Zen SWITCH_RATIO to 16
martin-frbg Jan 19, 2019
83b5c6b
Fix compilation with NO_AVX=1 set
martin-frbg Jan 20, 2019
010d59b
Merge pull request #1973 from martin-frbg/issue1464
martin-frbg Jan 20, 2019
b111829
[ZARCH] Update max/min functions
maamountki Jan 21, 2019
63bbd7b
Better support for MSVC/Windows in CMake
Jan 21, 2019
f0d834b
Use VERSION_LESS for comparisons involving software version numbers
martin-frbg Jan 22, 2019
2428880
Adjust test script for correct deployment
Jan 22, 2019
21eda8b
Report SkylakeX as Haswell if compiler does not support AVX512
martin-frbg Jan 22, 2019
b56b34a
Syntax fix
martin-frbg Jan 22, 2019
16494cb
Merge pull request #1980 from martin-frbg/issue1979
martin-frbg Jan 22, 2019
8533aca
Avoid penalizing tall skinny matrices
martin-frbg Jan 23, 2019
e908ac2
Fix include directory of exported targets
Jan 23, 2019
3f7bb87
Merge pull request #1971 from martin-frbg/trsm-threshold
martin-frbg Jan 24, 2019
e882b23
Correct naming of getrf_parallel object
martin-frbg Jan 25, 2019
36b844a
Change ARMV8 target to ARMV7 when BINARY32 is set
martin-frbg Jan 26, 2019
58dd7e4
Change ARMV8 target to ARMV7 for BINARY=32
martin-frbg Jan 26, 2019
89b60da
Merge pull request #1987 from martin-frbg/issue1961
martin-frbg Jan 26, 2019
0f24b39
Reword/expand comments in Makefile.rule
TiborGY Jan 27, 2019
ea1716c
Update Makefile.rule
TiborGY Jan 27, 2019
a529c71
Merge pull request #1962 from brada4/r
martin-frbg Jan 28, 2019
7d47f0a
Merge pull request #1978 from danielgindi/feature/msvc_cmake
martin-frbg Jan 28, 2019
3d155cf
Merge pull request #1981 from edisongustavo/develop
martin-frbg Jan 28, 2019
5be61f4
Merge pull request #1985 from martin-frbg/issue1984
martin-frbg Jan 28, 2019
c8ef9fb
[ZARCH] Fix bug in iamax/iamin/imax/imin
maamountki Jan 28, 2019
04873bb
[ZARCH] Undo the last commit
maamountki Jan 28, 2019
c7143c1
[ZARCH] Fix iamax/imax single precision
maamountki Jan 28, 2019
dc4d3bc
[ZARCH] Fix icamax/icamin
maamountki Jan 29, 2019
fcd814a
[ZARCH] Fix bug in max/min functions
maamountki Jan 29, 2019
eaf20f0
Remove ztest
maamountki Jan 31, 2019
808410c
Fix wrong comparison that made IMIN identical to IMAX
martin-frbg Jan 31, 2019
86a824c
Fix wrong comparison that made IMIN identical to IMAX
martin-frbg Jan 31, 2019
48b9b94
[ZARCH] Improve loading performance for camax/icamax
maamountki Jan 31, 2019
29416cb
[ZARCH] Add Z13 version for max/min functions
maamountki Jan 31, 2019
8212472
Merge branch 'develop' into z14
maamountki Jan 31, 2019
42df9ef
Merge pull request #1991 from maamountki/z14
martin-frbg Jan 31, 2019
1249ee1
Add Z14 target
martin-frbg Jan 31, 2019
bdc73a4
Add parameters for Z14
martin-frbg Jan 31, 2019
72d3e7c
Add FORCE Z14
martin-frbg Jan 31, 2019
4b512f8
Add cache sizes for Z14
martin-frbg Jan 31, 2019
885a3c4
USE_TRMM on Z14
martin-frbg Jan 31, 2019
265142e
Fix typo in the zarch min/max kernels
martin-frbg Jan 31, 2019
877023e
Fix precision of zarch DSDOT
martin-frbg Jan 31, 2019
cce574c
Improve the z14 SGEMVT kernel
martin-frbg Jan 31, 2019
282230c
Merge pull request #1993 from martin-frbg/aarnes-zarch
martin-frbg Jan 31, 2019
1f4b61f
Delete misplaced file sgemv_t_4.c
martin-frbg Feb 1, 2019
874df65
Fix incorrect sgemv results for IBM z14
martin-frbg Feb 1, 2019
4abc375
sgemv cgemv pairs
quickwritereader Feb 1, 2019
f9c5023
Merge pull request #1994 from quickwritereader/develop
martin-frbg Feb 1, 2019
cd9ea45
NBMAX=4096 for gemvn, added sgemvn 8x8 for future
quickwritereader Feb 4, 2019
498ac98
Note for unused kernels
quickwritereader Feb 4, 2019
729e925
Merge pull request #1996 from quickwritereader/develop
martin-frbg Feb 4, 2019
a38aa56
Merge pull request #1 from xianyi/develop
maamountki Feb 5, 2019
81daf6b
[ZARCH] Format source code, Fix constraints
maamountki Feb 5, 2019
6152648
[ZARCH] Fix copy constraint
maamountki Feb 5, 2019
f4b82d7
Include complex rather than complex.h in C++ contexts
martin-frbg Feb 5, 2019
817fe98
Merge pull request #1998 from martin-frbg/issue1992
martin-frbg Feb 5, 2019
11a43e8
[ZARCH] Set alignment hint for vl/vst
maamountki Feb 5, 2019
1391fc4
fix second instance of complex.h for c++ as well
martin-frbg Feb 5, 2019
d70ae3a
Make c_check robust against old or incomplete perl installations
martin-frbg Feb 5, 2019
f10408a
Merge pull request #1999 from martin-frbg/issue1996-2
martin-frbg Feb 5, 2019
5952e58
Support DYNAMIC_LIST option in cmake
martin-frbg Feb 5, 2019
af6e225
Merge pull request #2000 from martin-frbg/issue1989
martin-frbg Feb 5, 2019
641767f
Merge pull request #2001 from martin-frbg/cmake-dynlist
martin-frbg Feb 6, 2019
7039770
[ZARCH] Undo the last commit
maamountki Feb 6, 2019
69edc5b
Restore dropped patches in the non-TLS branch of memory.c (#2004)
martin-frbg Feb 7, 2019
03a2bf2
Fix potential memory leak in cpu enumeration on Linux (#2008)
martin-frbg Feb 10, 2019
77fe700
[ZARCH] Fix constraints and source code formatting
maamountki Feb 11, 2019
f583674
[ZARCH] Fix cgemv_t_4
maamountki Feb 12, 2019
dc6ac9e
Fix declaration of input arguments in the x86_64 s/dGEMV_T and s/dGE…
martin-frbg Feb 12, 2019
91481a3
Fix declaration of input arguments in inline assembly
martin-frbg Feb 12, 2019
b824fa7
Fix declaration of assembly arguments in SSYMV and DSYMV microkernels
martin-frbg Feb 12, 2019
ab1630f
Fix declaration of arguments in inline assembly
martin-frbg Feb 12, 2019
63d7bad
Merge pull request #2010 from martin-frbg/issue2009
martin-frbg Feb 12, 2019
bec54ae
[ZARCH] Fix caxpy
maamountki Feb 13, 2019
0a54c98
[ZARCH] Modify constraints
maamountki Feb 13, 2019
76bb74f
Merge pull request #2012 from maamountki/z14
martin-frbg Feb 13, 2019
f9d67bb
Fix out-of-bounds memory access in gemm_beta
martin-frbg Feb 13, 2019
718efce
Fix out-of-bounds memory access in gemm_beta
martin-frbg Feb 13, 2019
056917d
Merge pull request #2013 from martin-frbg/issue2011
martin-frbg Feb 14, 2019
b55c586
Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly fu…
martin-frbg Feb 14, 2019
69a97ca
dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3
bartoldeman Feb 14, 2019
cd5a59b
Merge pull request #2018 from bartoldeman/fix-dgemv-znver1-tree-vecto…
martin-frbg Feb 14, 2019
46e415b
Save and restore input argument 8 (lda4)
martin-frbg Feb 14, 2019
adb419e
With the Intel compiler on Linux, prefer ifort for the final link step
martin-frbg Feb 14, 2019
d3e4725
Merge pull request #2020 from martin-frbg/issue1956
martin-frbg Feb 15, 2019
4255a58
Rename operands to put lda on the input/output constraint list
martin-frbg Feb 15, 2019
1c6da2d
Merge pull request #2019 from martin-frbg/gcc9fixes
martin-frbg Feb 15, 2019
c26c0b7
Fix wrong constraints in inline assembly
martin-frbg Feb 15, 2019
f209fc7
Update Makefile.rule
TiborGY Feb 16, 2019
d752799
Merge pull request #2021 from martin-frbg/gcc9fixes2
martin-frbg Feb 16, 2019
9d8be15
Fix inline assembly constraints
martin-frbg Feb 16, 2019
e976557
Fix inline assembly constraints
martin-frbg Feb 16, 2019
efb9038
Fix inline assembly constraints
martin-frbg Feb 16, 2019
8242b1f
Fix inline assembly constraints
martin-frbg Feb 16, 2019
f9bb76d
Fix inline assembly constraints in Bulldozer TRSM kernels
martin-frbg Feb 16, 2019
5608999
fix the the
TiborGY Feb 16, 2019
aec9054
Merge pull request #1988 from TiborGY/patch-1
martin-frbg Feb 17, 2019
1860c94
Merge pull request #2023 from martin-frbg/gcc9fixes3
martin-frbg Feb 17, 2019
e12cdf5
Merge pull request #2024 from martin-frbg/gcc9fixes4
martin-frbg Feb 17, 2019
78d9910
Correct range_n limiting
martin-frbg Feb 19, 2019
e29b0cf
Allow multithreading TRMV again
martin-frbg Feb 19, 2019
45333d5
Fix error introduced during cleanup
martin-frbg Feb 19, 2019
343b301
Reduce list of kernels in the dynamic arch build
martin-frbg Feb 20, 2019
e5df595
init
brada4 Feb 24, 2019
6eee1be
move fix to right place
brada4 Feb 24, 2019
0db9c03
Merge pull request #2028 from brada4/mv
martin-frbg Feb 24, 2019
918a0cc
Fix missing -c option in AVX512 test
martin-frbg Feb 25, 2019
fd34820
Fix AVX512 test always returning false due to missing compiler option
martin-frbg Feb 25, 2019
d66214c
Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX
martin-frbg Feb 28, 2019
2ffb727
Keep xcode8.3 for osx BINARY=32 build
martin-frbg Feb 28, 2019
4c321ae
Merge pull request #2034 from martin-frbg/issue2033
martin-frbg Feb 28, 2019
c4868d1
Make sure that AVX512 is disabled in 32bit builds
martin-frbg Mar 1, 2019
edb8143
Merge pull request #2037 from martin-frbg/issue2033-2
martin-frbg Mar 1, 2019
2542792
Improve handling of NO_STATIC and NO_SHARED
martin-frbg Mar 2, 2019
e5c316c
init
brada4 Mar 3, 2019
e4a79be
address warning introed with #1814 et al
brada4 Mar 3, 2019
af480b0
Restore locking optimizations for OpenMP case
martin-frbg Mar 3, 2019
783ba80
HiSilicon tsv110 CPUs optimization branch
maomao194313 Mar 4, 2019
53f482e
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 Mar 4, 2019
760842d
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 Mar 4, 2019
fb4dae7
add TARGET support for HiSilicon tsv110 CPUs
maomao194313 Mar 4, 2019
6c83b87
Merge pull request #2040 from martin-frbg/locks2002
martin-frbg Mar 4, 2019
12f2b76
Merge pull request #2038 from martin-frbg/issue2035
martin-frbg Mar 4, 2019
10d841d
Merge pull request #2026 from martin-frbg/trmv_threads
martin-frbg Mar 4, 2019
e4864a8
Fix module definition conflicts between LAPACK and ReLAPACK
martin-frbg Mar 4, 2019
d7b2c53
Merge pull request #2039 from brada4/meminit
martin-frbg Mar 5, 2019
651ab01
Merge pull request #2044 from martin-frbg/issue2043
martin-frbg Mar 5, 2019
11cfd0b
Do not compile in AVX512 check if AVX support is disabled
martin-frbg Mar 5, 2019
4741ce8
Merge pull request #2045 from martin-frbg/2033-3
martin-frbg Mar 6, 2019
4290afd
ctest.c : add __POWERPC__ for PowerMac
kencu Mar 7, 2019
db3dc9e
Merge pull request #2046 from kencu/powermac
martin-frbg Mar 7, 2019
b7f59da
Fix crash in sgemm SSE/nano kernel on x86_64
Celelibi Mar 7, 2019
8d3d29e
Merge pull request #2049 from Celelibi/fix_crash_sgemm_sse_x64
martin-frbg Mar 7, 2019
b0c714e
param.h : enable defines for PPC970 on DarwinOS
kencu Mar 7, 2019
f7a0646
common_power.h: force DCBT_ARG 0 on PPC970 Darwin
kencu Mar 7, 2019
5b95534
Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1
martin-frbg Mar 9, 2019
946ec6c
Merge pull request #2050 from kencu/PowerMacFix
martin-frbg Mar 9, 2019
f18ab6c
Merge pull request #2051 from martin-frbg/issue2048
martin-frbg Mar 9, 2019
f074d7d
make DYNAMIC_ARCH=1 package work on TSV110.
maomao194313 Mar 12, 2019
7e3eb9b
make DYNAMIC_ARCH=1 package work on TSV110
maomao194313 Mar 12, 2019
b1393c7
Add Intel Denverton
martin-frbg Mar 12, 2019
04f2226
Add Intel Denverton
martin-frbg Mar 12, 2019
3ce28fb
Merge pull request #2055 from martin-frbg/atomid
martin-frbg Mar 12, 2019
03d7110
Merge pull request #2042 from maomao194313/develop
martin-frbg Mar 12, 2019
c3e30b2
Change 64-bit detection as explained in #2056
xsacha Mar 13, 2019
4fc17d0
Trivial typo fix
martin-frbg Mar 13, 2019
e608d4f
Disable the AVX512 DGEMM kernel (again)
martin-frbg Mar 13, 2019
1006ff8
Use POSIX getenv on Cygwin
embray Mar 15, 2019
a542557
Merge pull request #2060 from embray/cygwin/readenv
martin-frbg Mar 16, 2019
dff4a19
Merge pull request #2058 from xsacha/patch-3
martin-frbg Mar 16, 2019
4ad694e
Fix for #2063: The DllMain used in Cygwin did not run the thread memory
embray Mar 18, 2019
8ba9e2a
Also call CloseHandle on each thread, as well as on the event so as t…
embray Mar 19, 2019
8502030
Merge pull request #2064 from embray/cygwin/use-tls-thread-memory-cle…
martin-frbg Mar 19, 2019
b043a59
AIX asm syntax changes needed for shared object creation
ayappanec Mar 25, 2019
3ae122e
Merge pull request #2069 from aixoss/aix-asm-change
martin-frbg Mar 25, 2019
853a18b
power9 makefile. dgemm based on power8 kernel with following changes …
quickwritereader Mar 14, 2019
7c51cc8
Merge branch 'develop' into develop
martin-frbg Mar 29, 2019
4dec151
Merge pull request #2070 from quickwritereader/develop
martin-frbg Mar 29, 2019
4f9d3e4
Expose CBLAS interfaces for I?MIN and I?MAX
martin-frbg Mar 30, 2019
3d1e36d
Build CBLAS interfaces for I?MIN and I?MAX
martin-frbg Mar 30, 2019
c19a449
Merge pull request #2071 from martin-frbg/issue2068
martin-frbg Mar 30, 2019
32c7063
Merge pull request #2061 from martin-frbg/martin-frbg-patch-1
martin-frbg Mar 30, 2019
5c42287
Add declarations for ?sum and cblas_?sum
martin-frbg Mar 30, 2019
79cfc24
Add interface for ?sum (derived from ?asum)
martin-frbg Mar 30, 2019
b9f4943
Add ?sum
martin-frbg Mar 30, 2019
c3cfc69
Add implementations of ssum/dsum and csum/zsum
martin-frbg Mar 30, 2019
94ab4e6
Add ARM implementations of ?sum
martin-frbg Mar 30, 2019
3e3ccb9
Add ARM64 implementations of ?sum
martin-frbg Mar 30, 2019
f8b82bc
Add ia64 implementation of ?sum
martin-frbg Mar 30, 2019
cdbe0f0
Add MIPS implementation of ?sum
martin-frbg Mar 30, 2019
688fa92
Add MIPS64 implementation of ?sum
martin-frbg Mar 30, 2019
706dfe2
Add POWER implementation of ?sum
martin-frbg Mar 30, 2019
70f2a4e
Add SPARC implementation of ?sum
martin-frbg Mar 30, 2019
e3bc83f
Add x86 implementation of ?sum
martin-frbg Mar 30, 2019
9d717cb
Add x86_64 implementation of ?sum
martin-frbg Mar 30, 2019
246ca29
Add ZARCH implementation of ?sum
martin-frbg Mar 30, 2019
1679de5
Detect 32bit environment on 64bit ARM hardware
martin-frbg Mar 31, 2019
d17da6c
Add cmake defaults for ?sum kernels
martin-frbg Mar 31, 2019
100d94f
Add ?sum
martin-frbg Mar 31, 2019
c04a729
Add ?sum definitions for generic kernel
martin-frbg Mar 31, 2019
7f4e36d
Merge pull request #2073 from martin-frbg/issue2056-2
martin-frbg Mar 31, 2019
21d146a
Add declarations for ?sum
martin-frbg Mar 31, 2019
9229d68
Add -lm and disable EXPRECISION support on *BSD
martin-frbg Apr 2, 2019
e06b843
Merge pull request #2080 from martin-frbg/issue2075
martin-frbg Apr 2, 2019
bcdf1d4
Add in runtime CPU detection for POWER.
RashmicaG Apr 9, 2019
744779d
Merge pull request #2084 from RashmicaG/develop
martin-frbg Apr 14, 2019
40e53e5
snprintf define consolidated to common.h
Apr 23, 2019
ccfb7ea
Merge pull request #2072 from martin-frbg/sum
martin-frbg Apr 23, 2019
6b41eb9
Merge pull request #2092 from jeffbaylor/snprintf_with_MSC_VER
martin-frbg Apr 23, 2019
9a19616
Support INTERFACE64=1
martin-frbg Apr 27, 2019
798c448
Add support for INTERFACE64 and fix XERBLA calls
martin-frbg Apr 27, 2019
bbd9d98
Merge pull request #2094 from martin-frbg/issue2066
martin-frbg Apr 27, 2019
0bd956f
Correct length of name string in xerbla call
martin-frbg Apr 27, 2019
2aad88d
Avoid out-of-bounds accesses in LAPACK EIG tests
martin-frbg Apr 27, 2019
268c28d
Merge pull request #2095 from martin-frbg/trsm
martin-frbg Apr 28, 2019
91943b7
Merge pull request #2096 from martin-frbg/eig-testing
martin-frbg Apr 28, 2019
11530b7
Correct INFO=4 condition
martin-frbg Apr 28, 2019
2cd463e
Disable reallocation of work array in xSYTRF
martin-frbg Apr 28, 2019
452859f
Merge pull request #2097 from martin-frbg/rela-getrf
martin-frbg Apr 28, 2019
5b03981
Merge pull request #2098 from martin-frbg/rela-malloc
martin-frbg Apr 28, 2019
1036299
Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF
martin-frbg Apr 28, 2019
9c4edd3
Merge pull request #2099 from martin-frbg/rela-gbtrf
martin-frbg Apr 29, 2019
9763f87
Update Changelog with changes from 0.3.6
martin-frbg Apr 29, 2019
97d5034
Merge branch 'release-0.3.0' into develop
martin-frbg Apr 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ matrix:

- &test-macos
os: osx
osx_image: xcode8
osx_image: xcode10.1
before_script:
- COMMON_FLAGS="DYNAMIC_ARCH=1 TARGET=NEHALEM NUM_THREADS=32"
- brew update
Expand All @@ -160,6 +160,7 @@ matrix:
- BTYPE="BINARY=64 INTERFACE64=1"

- <<: *test-macos
osx_image: xcode8.3
env:
- BTYPE="BINARY=32"

Expand Down
44 changes: 30 additions & 14 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ cmake_minimum_required(VERSION 2.8.5)
project(OpenBLAS C ASM)
set(OpenBLAS_MAJOR_VERSION 0)
set(OpenBLAS_MINOR_VERSION 3)
set(OpenBLAS_PATCH_VERSION 5)
set(OpenBLAS_PATCH_VERSION 6)
set(OpenBLAS_VERSION "${OpenBLAS_MAJOR_VERSION}.${OpenBLAS_MINOR_VERSION}.${OpenBLAS_PATCH_VERSION}")

# Adhere to GNU filesystem layout conventions
Expand Down Expand Up @@ -42,6 +42,19 @@ endif()

#######

if(MSVC AND MSVC_STATIC_CRT)
set(CompilerFlags
CMAKE_CXX_FLAGS
CMAKE_CXX_FLAGS_DEBUG
CMAKE_CXX_FLAGS_RELEASE
CMAKE_C_FLAGS
CMAKE_C_FLAGS_DEBUG
CMAKE_C_FLAGS_RELEASE
)
foreach(CompilerFlag ${CompilerFlags})
string(REPLACE "/MD" "/MT" ${CompilerFlag} "${${CompilerFlag}}")
endforeach()
endif()

message(WARNING "CMake support is experimental. It does not yet support all build options and may not produce the same Makefiles that OpenBLAS ships with.")

Expand All @@ -62,10 +75,10 @@ endif ()

set(SUBDIRS ${BLASDIRS})
if (NOT NO_LAPACK)
list(APPEND SUBDIRS lapack)
if(BUILD_RELAPACK)
list(APPEND SUBDIRS relapack/src)
endif()
list(APPEND SUBDIRS lapack)
endif ()

# set which float types we want to build for
Expand Down Expand Up @@ -134,7 +147,7 @@ endif ()

# Only generate .def for dll on MSVC and always produce pdb files for debug and release
if(MSVC)
if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} LESS 3.4)
if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} VERSION_LESS 3.4)
set(OpenBLAS_DEF_FILE "${PROJECT_BINARY_DIR}/openblas.def")
endif()
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /Zi")
Expand All @@ -149,15 +162,9 @@ if (${DYNAMIC_ARCH})
endforeach()
endif ()

# Only build shared libs for MSVC
if (MSVC)
set(BUILD_SHARED_LIBS ON)
endif()


# add objects to the openblas lib
add_library(${OpenBLAS_LIBNAME} ${LA_SOURCES} ${LAPACKE_SOURCES} ${RELA_SOURCES} ${TARGET_OBJS} ${OpenBLAS_DEF_FILE})
target_include_directories(${OpenBLAS_LIBNAME} INTERFACE $<INSTALL_INTERFACE:include>)
target_include_directories(${OpenBLAS_LIBNAME} INTERFACE $<INSTALL_INTERFACE:include/openblas${SUFFIX64}>)

# Android needs to explicitly link against libm
if(ANDROID)
Expand All @@ -166,7 +173,7 @@ endif()

# Handle MSVC exports
if(MSVC AND BUILD_SHARED_LIBS)
if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} LESS 3.4)
if (${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION} VERSION_LESS 3.4)
include("${PROJECT_SOURCE_DIR}/cmake/export.cmake")
else()
# Creates verbose .def file (51KB vs 18KB)
Expand Down Expand Up @@ -217,6 +224,14 @@ set_target_properties(${OpenBLAS_LIBNAME} PROPERTIES
SOVERSION ${OpenBLAS_MAJOR_VERSION}
)

if (BUILD_SHARED_LIBS AND BUILD_RELAPACK)
if (NOT MSVC)
target_link_libraries(${OpenBLAS_LIBNAME} "-Wl,-allow-multiple-definition")
else()
target_link_libraries(${OpenBLAS_LIBNAME} "/FORCE:MULTIPLE")
endif()
endif()

if (BUILD_SHARED_LIBS AND NOT ${SYMBOLPREFIX}${SYMBOLSUFIX} STREQUAL "")
if (NOT DEFINED ARCH)
set(ARCH_IN "x86_64")
Expand Down Expand Up @@ -314,7 +329,7 @@ install (FILES ${OPENBLAS_CONFIG_H} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
if(NOT NOFORTRAN)
message(STATUS "Generating f77blas.h in ${CMAKE_INSTALL_INCLUDEDIR}")

set(F77BLAS_H ${CMAKE_BINARY_DIR}/f77blas.h)
set(F77BLAS_H ${CMAKE_BINARY_DIR}/generated/f77blas.h)
file(WRITE ${F77BLAS_H} "#ifndef OPENBLAS_F77BLAS_H\n")
file(APPEND ${F77BLAS_H} "#define OPENBLAS_F77BLAS_H\n")
file(APPEND ${F77BLAS_H} "#include \"openblas_config.h\"\n")
Expand All @@ -327,10 +342,11 @@ endif()
if(NOT NO_CBLAS)
message (STATUS "Generating cblas.h in ${CMAKE_INSTALL_INCLUDEDIR}")

set(CBLAS_H ${CMAKE_BINARY_DIR}/generated/cblas.h)
file(READ ${CMAKE_CURRENT_SOURCE_DIR}/cblas.h CBLAS_H_CONTENTS)
string(REPLACE "common" "openblas_config" CBLAS_H_CONTENTS_NEW "${CBLAS_H_CONTENTS}")
file(WRITE ${CMAKE_BINARY_DIR}/cblas.tmp "${CBLAS_H_CONTENTS_NEW}")
install (FILES ${CMAKE_BINARY_DIR}/cblas.tmp DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} RENAME cblas.h)
file(WRITE ${CBLAS_H} "${CBLAS_H_CONTENTS_NEW}")
install (FILES ${CBLAS_H} DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
endif()

if(NOT NO_LAPACKE)
Expand Down
78 changes: 78 additions & 0 deletions Changelog.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,82 @@
OpenBLAS ChangeLog
====================================================================
Version 0.3.6
29-Apr-2019

common:
* the build tools now check that a given cpu TARGET is actually valid
* the build-time check of system features (c_check) has been made
less dependent on particular perl features (this should mainly
benefit building on Windows)
* several problem with the ReLAPACK integration were fixed,
including INTERFACE64 support and building a shared library
* building with CMAKE on BSD systems was improved
* a non-absolute SUM function was added based on the
existing optimized code for ASUM
* CBLAS interfaces to the IxMIN and IxMAX functions were added
* a name clash between LAPACKE and BOOST headers was resolved
* CMAKE builds with OpenMP failed to include the appropriate getrf_parallel
kernels
* a crash on thread (key) deletion with the USE_TLS=1 memory management
option was fixed
* restored several earlier fixes, in particular for OpenMP performance,
building on BSD, and calling fork on CYGWIN, which had inadvertently
been dropped in the 0.3.3 rewrite of the memory management code.

x86_64:
* the AVX512 DGEMM kernel has been disabled again due to unsolved problems
* building with old versions of MSVC was fixed
* it is now possible to build a static library on Windows with CMAKE
* accessing environment variables on CYGWIN at run time was fixed
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
* Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected
* building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported
with CMAKE as well
* building for DYNAMIC_ARCH with GENERIC as the default target is now supported
* a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed
* assembly bugs involving undeclared modification of input operands were fixed
in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem,
Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause
test failures or segfaults when compiled with recent versions of gcc from 8 onward.
* a similar bug was fixed in the blas_quickdivide code used to split workloads
in most functions
* a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX
* fixed building on SkylakeX systems when either the compiler or the (emulated) operating
environment does not support AVX512
* improved GEMM performance on ZEN targets

x86:
* build failures caused by the recently added checks for AVX512 were fixed
* an inline assembly bug involving undeclared modification of an input argument was
fixed in the blas_quickdivide code used to split workloads in most functions
* a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX

MIPS32:
* a bug in the IMIN implementation made it return the result of IMAX

POWER:
* single precision BLAS1/2 functions have received optimized POWER8 kernels
* POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel
* building on PPC970 systems under OSX Leopard or Tiger is now supported
* out-of-bounds memory accesses in the gemm_beta microkernels were fixed
* building a shared library on AIX is now supported for POWER6
* DYNAMIC_ARCH support has been added for POWER6 and newer

ARMv7:
* corrected xDOT behaviour with zero INC_X or INC_Y
* a bug in the IMIN implementation made it return the result of IMAX

ARMv8:
* added support for HiSilicon TSV110 cpus
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
* cross-compilation with CMAKE now works again
* a bug in the IMIN implementation made it return the result of IMAX
* ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7

IBM Z:
* optimized microkernels for single precicion BLAS1/2 functions have been added
for both Z13 and Z14

====================================================================
Version 0.3.5
31-Dec-2018
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ endif
@echo

shared :
ifndef NO_SHARED
ifneq ($(NO_SHARED), 1)
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
@$(MAKE) -C exports so
@ln -fs $(LIBSONAME) $(LIBPREFIX).so
Expand Down
5 changes: 5 additions & 0 deletions Makefile.arm64
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,8 @@ ifeq ($(CORE), THUNDERX2T99)
CCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
FCOMMON_OPT += -march=armv8.1-a -mtune=thunderx2t99
endif

ifeq ($(CORE), TSV110)
CCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
FCOMMON_OPT += -march=armv8.2-a -mtune=tsv110
endif
10 changes: 5 additions & 5 deletions Makefile.install
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,14 @@ ifndef NO_LAPACKE
endif

#for install static library
ifndef NO_STATIC
ifneq ($(NO_STATIC),1)
@echo Copying the static library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
@install -pm644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX)
endif
#for install shared library
ifndef NO_SHARED
ifneq ($(NO_SHARED),1)
@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
ifeq ($(OSNAME), $(filter $(OSNAME),Linux SunOS Android Haiku))
@install -pm755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
Expand Down Expand Up @@ -106,14 +106,14 @@ ifndef NO_LAPACKE
endif

#for install static library
ifndef NO_STATIC
ifneq ($(NO_STATIC),1)
@echo Copying the static library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
@installbsd -c -m 644 $(LIBNAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
ln -fs $(LIBNAME) $(LIBPREFIX).$(LIBSUFFIX)
endif
#for install shared library
ifndef NO_SHARED
ifneq ($(NO_SHARED),1)
@echo Copying the shared library to $(DESTDIR)$(OPENBLAS_LIBRARY_DIR)
@installbsd -c -m 755 $(LIBSONAME) "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)"
@cd "$(DESTDIR)$(OPENBLAS_LIBRARY_DIR)" ; \
Expand All @@ -138,7 +138,7 @@ endif
@echo "SET(OpenBLAS_VERSION \"${VERSION}\")" > "$(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)"
@echo "SET(OpenBLAS_INCLUDE_DIRS ${OPENBLAS_INCLUDE_DIR})" >> "$(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)"

ifndef NO_SHARED
ifneq ($(NO_SHARED),1)
#ifeq logical or
ifeq ($(OSNAME), $(filter $(OSNAME),Linux FreeBSD NetBSD OpenBSD DragonFly))
@echo "SET(OpenBLAS_LIBRARIES ${OPENBLAS_LIBRARY_DIR}/$(LIBPREFIX).so)" >> "$(DESTDIR)$(OPENBLAS_CMAKE_DIR)/$(OPENBLAS_CMAKE_CONFIG)"
Expand Down
10 changes: 9 additions & 1 deletion Makefile.power
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,15 @@ else
USE_OPENMP = 1
endif


ifeq ($(CORE), POWER9)
ifeq ($(USE_OPENMP), 1)
COMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -DUSE_OPENMP -fno-fast-math -fopenmp
else
COMMON_OPT += -Ofast -mcpu=power9 -mtune=power9 -mvsx -malign-power -fno-fast-math
FCOMMON_OPT += -O2 -frecursive -mcpu=power9 -mtune=power9 -malign-power -fno-fast-math
endif
endif

ifeq ($(CORE), POWER8)
ifeq ($(USE_OPENMP), 1)
Expand Down
Loading