cblas_isamin on a dataset containing 0.0 and a non-1 incX returns 7 #2116

RossBrunton · 2019-05-07T11:12:27Z

The following example shows the problem:

int a = 30;
std::vector<float> x(a * 5);
for (int i = 0; i < a * 5; i ++) {
  x[i] = i + 1;
}
x[10 * 5] = 0;
std::cout << cblas_isamin(a, x.data(), 5) << "\n";

This prints out 7, rather than 10 as I would expect. Changing the 10 to any value larger than 7 also prints 7. Changing incX to 1, or removing the x[10 * 5] = 0 line causes this to output 0 as expected.

I'm using commit 6a8b42 .

The text was updated successfully, but these errors were encountered:

martin-frbg · 2019-05-07T12:52:05Z

Oh joy, this is probably a historical bug from GotoBLAS... I assume you are running this on x86_64 hardware ?

RossBrunton · 2019-05-07T13:18:54Z

This is x86_64, yes.

martin-frbg · 2019-05-07T13:32:51Z

Quick solution seems to be to change the ISAMINKERNEL entry in kernel/x86_64/KERNEL from iamax_sse.S to iamax.S - have not checked yet if related functions are similarly affected.

martin-frbg · 2019-05-07T20:42:52Z

Seems both isamin and ismin give wrong results for non-1 incX, although they use the same code as their (working) -max counterparts with only the comparison inverted.

* Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116

* With the Intel compiler on Linux, prefer ifort for the final link step icc has known problems with mixed-language builds that ifort can handle just fine. Fixes OpenMathLib#1956 * Rename operands to put lda on the input/output constraint list * Fix wrong constraints in inline assembly for OpenMathLib#2009 * Fix inline assembly constraints rework indices to allow marking argument lda4 as input and output. For OpenMathLib#2009 * Fix inline assembly constraints rework indices to allow marking argument lda as input and output. * Fix inline assembly constraints * Fix inline assembly constraints * Fix inline assembly constraints in Bulldozer TRSM kernels rework indices to allow marking i,as and bs as both input and output (marked operand n1 as well for simplicity). For OpenMathLib#2009 * Correct range_n limiting same bug as seen in OpenMathLib#1388, somehow missed in corresponding PR OpenMathLib#1389 * Allow multithreading TRMV again revert workaround introduced for issue OpenMathLib#1332 as the actual cause appears to be my incorrect fix from OpenMathLib#1262 (see OpenMathLib#1388) * Fix error introduced during cleanup * Reduce list of kernels in the dynamic arch build to make compilation complete reliably within the 1h limit again * init * move fix to right place * Fix missing -c option in AVX512 test * Fix AVX512 test always returning false due to missing compiler option * Make x86_32 imply NO_AVX2, NO_AVX512 in addition to NO_AVX fixes OpenMathLib#2033 * Keep xcode8.3 for osx BINARY=32 build as xcode10 deprecated i386 * Make sure that AVX512 is disabled in 32bit builds for OpenMathLib#2033 * Improve handling of NO_STATIC and NO_SHARED to avoid surprises from defining either as zero. Fixes OpenMathLib#2035 by addressing some concerns from OpenMathLib#1422 * init * address warning introed with OpenMathLib#1814 et al * Restore locking optimizations for OpenMP case restore another accidentally dropped part of OpenMathLib#1468 that was missed in OpenMathLib#2004 to address performance regression reported in OpenMathLib#1461 * HiSilicon tsv110 CPUs optimization branch add HiSilicon tsv110 CPUs optimization branch * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * add TARGET support for HiSilicon tsv110 CPUs * Fix module definition conflicts between LAPACK and ReLAPACK for OpenMathLib#2043 * Do not compile in AVX512 check if AVX support is disabled xgetbv is function depends on NO_AVX being undefined - we could change that too, but that combo is unlikely to work anyway * ctest.c : add __POWERPC__ for PowerMac * Fix crash in sgemm SSE/nano kernel on x86_64 Fix bug OpenMathLib#2047. Signed-off-by: Celelibi <celelibi@gmail.com> * param.h : enable defines for PPC970 on DarwinOS fixes: gemm.c: In function 'sgemm_': ../common_param.h:981:18: error: 'SGEMM_DEFAULT_P' undeclared (first use in this function) #define SGEMM_P SGEMM_DEFAULT_P ^ * common_power.h: force DCBT_ARG 0 on PPC970 Darwin without this, we see ../kernel/power/gemv_n.S:427:Parameter syntax error and many more similar entries that relates to this assembly command dcbt 8, r24, r18 this change makes the DCBT_ARG = 0 and openblas builds through to completion on PowerMac 970 Tests pass * Make TARGET=GENERIC compatible with DYNAMIC_ARCH=1 for issue OpenMathLib#2048 * make DYNAMIC_ARCH=1 package work on TSV110. * make DYNAMIC_ARCH=1 package work on TSV110 * Add Intel Denverton for OpenMathLib#2048 * Add Intel Denverton * Change 64-bit detection as explained in OpenMathLib#2056 * Trivial typo fix as suggested in OpenMathLib#2022 * Disable the AVX512 DGEMM kernel (again) Due to as yet unresolved errors seen in OpenMathLib#1955 and OpenMathLib#2029 * Use POSIX getenv on Cygwin The Windows-native GetEnvironmentVariable cannot be relied on, as Cygwin does not always copy environment variables set through Cygwin to the Windows environment block, particularly after fork(). * Fix for OpenMathLib#2063: The DllMain used in Cygwin did not run the thread memory pool cleanup upon THREAD_DETACH which is needed when compiled with USE_TLS=1. * Also call CloseHandle on each thread, as well as on the event so as to not leak thread handles. * AIX asm syntax changes needed for shared object creation * power9 makefile. dgemm based on power8 kernel with following changes : 32x unrolled 16x4 kernel and 8x4 kernel using (lxv stxv butterfly rank1 update). improvement from 17 to 22-23gflops. dtrmm cases were added into dgemm itself * Expose CBLAS interfaces for I?MIN and I?MAX * Build CBLAS interfaces for I?MIN and I?MAX * Add declarations for ?sum and cblas_?sum * Add interface for ?sum (derived from ?asum) * Add ?sum * Add implementations of ssum/dsum and csum/zsum as trivial copies of asum/zsasum with the fabs calls replaced by fmov to preserve code structure * Add ARM implementations of ?sum (trivial copies of the respective ?asum with the fabs calls removed) * Add ARM64 implementations of ?sum as trivial copies of the respective ?asum kernels with the fabs calls removed * Add ia64 implementation of ?sum as trivial copy of asum with the fabs calls removed * Add MIPS implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add MIPS64 implementation of ?sum as trivial copy of ?asum with the fabs replaced by mov to preserve code structure * Add POWER implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmr to preserve code structure * Add SPARC implementation of ?sum as trivial copy of ?asum with the fabs replaced by fmov to preserve code structure * Add x86 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add x86_64 implementation of ?sum as trivial copy of ?asum with the fabs calls removed * Add ZARCH implementation of ?sum as trivial copies of the respective ?asum kernels with the ABS and vflpsb calls removed * Detect 32bit environment on 64bit ARM hardware for OpenMathLib#2056, using same approach as OpenMathLib#2058 * Add cmake defaults for ?sum kernels * Add ?sum * Add ?sum definitions for generic kernel * Add declarations for ?sum * Add -lm and disable EXPRECISION support on *BSD fixes OpenMathLib#2075 * Add in runtime CPU detection for POWER. * snprintf define consolidated to common.h * Support INTERFACE64=1 * Add support for INTERFACE64 and fix XERBLA calls 1. Replaced all instances of "int" with "blasint" 2. Added string length as "hidden" third parameter in calls to fortran XERBLA * Correct length of name string in xerbla call * Avoid out-of-bounds accesses in LAPACK EIG tests see Reference-LAPACK/lapack#333 * Correct INFO=4 condition * Disable reallocation of work array in xSYTRF as it appears to cause memory management problems (seen in the LAPACK tests) * Disable repeated recursion on Ab_BR in ReLAPACK xGBTRF due to crashes in LAPACK tests * sgemm/strmm * Update Changelog with changes from 0.3.6 * Increment version to 0.3.7.dev * Increment version to 0.3.7.dev * Misc. typo fixes Found via `codespell -q 3 -w -L ith,als,dum,nd,amin,nto,wis,ba -S ./relapack,./kernel,./lapack-netlib` * Correct argument of CPU_ISSET for glibc <2.5 fixes OpenMathLib#2104 * conflict resolve * Revert reference/ fixes * Revert Changelog.txt typos * Disable the SkyLakeX DGEMMITCOPY kernel as well as a stopgap measure for numpy/numpy#13401 as mentioned in OpenMathLib#1955 * Disable DGEMMINCOPY as well for now OpenMathLib#1955 * init * Fix errors in cpu enumeration with glibc 2.6 for OpenMathLib#2114 * Change two http links to https Closes OpenMathLib#2109 * remove redundant code OpenMathLib#2113 * Set up CI with Azure Pipelines [skip ci] * TST: add native POWER8 to CI * add native POWER8 testing to Travis CI matrix with ppc64le os entry * Update link to IBM MASS library, update cpu support status * first try migrating one of the arm builds from travis * fix tabbing in azure commands * Update azure-pipelines.yml take out offending lines (although stolen from https://github.com/conda-forge/opencv-feedstock azure-pipelines fiie) * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * Update azure-pipelines.yml * DOC: Add Azure CI status badge * Add ARMV6 build to azure CI setup (OpenMathLib#2122) using aytekinar's Alpine image and docker script from the Travis setup [skip ci] * TST: Azure manylinux1 & clean-up * remove some of the steps & comments from the original Azure yml template * modify the trigger section to use develop since OpenBLAS primarily uses this branch; use the same batching behavior as downstream projects NumPy/ SciPy * remove Travis emulated ARMv6 gcc build because this now happens in Azure * use documented Ubuntu vmImage name for Azure and add in a manylinux1 test run to the matrix [skip appveyor] * Add NO_AFFINITY to available options on Linux, and set it to ON to match the gmake default. Fixes second part of OpenMathLib#2114 * Replace ISMIN and ISAMIN kernels on all x86_64 platforms (OpenMathLib#2125) * Mark iamax_sse.S as unsuitable for MIN due to issue OpenMathLib#2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for OpenMathLib#2116 * Move ARMv8 gcc build from Travis to Azure * Move ARMv8 gcc build from Travis to Azure * Update .travis.yml * Test drone CI * install make * remove sudo * Install gcc * Install perl * Install gfortran and add a clang job * gfortran->gcc-gfortran * Switch to ubuntu and parallel jobs * apt update * Fix typo * update yes * no need of gcc in clang build * Add a cmake build as well * Add cmake builds and print options * build without lapack on cmake * parallel build * See if ubuntu 19.04 fixes the ICE * Remove qemu armv8 builds * arm32 build * Fix typo * TST: add SkylakeX AVX512 CI test * adapt the C-level reproducer code for some recent SkylakeX AVX512 kernel issues, provided by Isuru Fernando and modified by Martin Kroeker, for usage in the utest suite * add an Intel SDE SkylakeX emulation utest run to the Azure CI matrix; a custom Docker build was required because Ubuntu image provided by Azure does not support AVX512VL instructions * Add option USE_LOCKING for single-threaded build with locking support for calling from concurrent threads * Add option USE_LOCKING for single-threaded build with locking support * Add option USE_LOCKING for SMP-like locking in USE_THREAD=0 builds * Add option USE_LOCKING but keep default settings intact * Remove unrelated change * Do not try ancient PGI hacks with recent versions of that compiler should fix OpenMathLib#2139 * Build and run utests in any case, they do their own checks for fortran availability * Add softfp support in min/max kernels fix for OpenMathLib#1912 * Revert "Add softfp support in min/max kernels" * Separate implementations of AMAX and IAMAX on arm As noted in OpenMathLib#1912 and comment on OpenMathLib#1942, the combined implementation happens to "do the right thing" on hardfp, but cannot return both value and index on softfp where they would have to share the return register * Ensure correct output for DAMAX with softfp * Use generic kernels for complex (I)AMAX to support softfp * improved zgemm power9 based on power8 * upload thread safety test folder * hook up c++ thread safety test (main Makefile) * add c++ thread test option to Makefile.rule * Document NO_AVX512 for OpenMathLib#2151 * sgemm pipeline improved, zgemm rewritten without inner packs, ABI lxvx v20 fixed with vs52 * Fix detection of AVX512 capable compilers in getarch 21eda8b introduced a check in getarch.c to test if the compiler is capable of AVX512. This check currently fails, since the used __AVX2__ macro is only defined if getarch itself was compiled with AVX2/AVX512 support. Make sure this is the case by building getarch with -march=native on x86_64. It is only supposed to run on the build host anyway. * c_check: Unlink correct file * power9 zgemm ztrmm optimized * conflict resolve * Add gfortran workaround for ABI violations in LAPACKE for OpenMathLib#2154 (see gcc bug 90329) * Add gfortran workaround for ABI violations for OpenMathLib#2154 (see gcc bug 90329) * Add gfortran workaround for potential ABI violation for OpenMathLib#2154 * Update fc.cmake * Remove any inadvertent use of -march=native from DYNAMIC_ARCH builds from OpenMathLib#2143, -march=native precludes use of more specific options like -march=skylake-avx512 in individual kernels, and defeats the purpose of dynamic arch anyway. * Avoid unintentional activation of TLS code via USE_TLS=0 fixes OpenMathLib#2149 * Do not force gcc options on non-gcc compilers fixes compile failure with pgi 18.10 as reported on OpenBLAS-users * Update Makefile.x86_64 * Zero ecx with a mov instruction PGI assembler does not like the initialization in the constraints. * Fix mov syntax * new sgemm 8x16 * Update dtrmm_kernel_16x4_power8.S * PGI compiler does not like -march=native * Fix build on FreeBSD/powerpc64. Signed-off-by: Piotr Kubaj <pkubaj@anongoth.pl> * Fix build for PPC970 on FreeBSD pt. 1 FreeBSD needs DCBT_ARG=0 as well. * Fix build for PPC970 on FreeBSD pt.2 FreeBSD needs those macros too. * cgemm/ctrmm power9 * Utest needs CBLAS but not necessarily FORTRAN * Add mingw builds to Appveyor config * Add getarch flags to disable AVX on x86 (and other small fixes to match Makefile behaviour) * Make disabling DYNAMIC_ARCH on unsupported systems work needs to be unset in the cache for the change to have any effect * Mingw32 needs leading underscore on object names (also copy BUNDERSCORE settings for FORTRAN from the corresponding Makefile)

The was a typo in iamax_sse.S where one of the comparison was cmpeqps instead of cmpeqss. That misdetected index for sequences where the minimum value was 0.

martin-frbg mentioned this issue May 9, 2019

Replace ISMIN and ISAMIN kernels on all x86_64 platforms #2125

Merged

martin-frbg closed this as completed in #2125 May 9, 2019

martin-frbg added a commit that referenced this issue May 9, 2019

Replace ISMIN and ISAMIN kernels on all x86_64 platforms (#2125)

9ea30f3

* Mark iamax_sse.S as unsuitable for MIN due to issue #2116 * Use iamax.S rather than iamax_sse.S for ISMIN/ISAMIN on all x86_64 as workaround for #2116

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cblas_isamin on a dataset containing 0.0 and a non-1 incX returns 7 #2116

cblas_isamin on a dataset containing 0.0 and a non-1 incX returns 7 #2116

RossBrunton commented May 7, 2019

martin-frbg commented May 7, 2019

RossBrunton commented May 7, 2019

martin-frbg commented May 7, 2019

martin-frbg commented May 7, 2019

cblas_isamin on a dataset containing 0.0 and a non-1 incX returns 7 #2116

cblas_isamin on a dataset containing 0.0 and a non-1 incX returns 7 #2116

Comments

RossBrunton commented May 7, 2019

martin-frbg commented May 7, 2019

RossBrunton commented May 7, 2019

martin-frbg commented May 7, 2019

martin-frbg commented May 7, 2019