CLBlast 1.6.0
CLBlast version 1.6.0. Changes since previous release (version 1.5.3):
- Improved performance on Qualcomm Adreno GPUs:
- Unique database entries for specific Adreno devices
- Toggle OpenCL kernel compilation options for Adreno
- New preprocessor directive RELAX_WORKGROUP_SIZE
- Fixed a bug in handling of #undef in CLBlast loop unrolling and array-to-register mapping functions
- Fixed a bug in XAMAX/XAMIN routines related to inadvertently including the increment and offset in the result
- Fixed a bug in XAMAX/XAMIN routines that would cause only the real part of a complex number to be taken into account
- Fixed a bug that caused tests to not properly do integer-output testing (for XAMAX/XAMIN)
- Fixes a minor issue with the expected input buffer size in the TRMV/TBMV/TPMV/TRSV routines
- Fixes an issue with crashes on Android related to calling clReleaseProgram
- Fixes two small issues in the plotting script
- Fixed a documentation bug in the 'ld' requirements
- Enabled Github Actions CI builds for testing and releasing
- Various minor fixes and enhancements
- Added tuned parameters for various devices (see doc/tuning.md)